zhimin-z commited on
Commit
244b6ac
·
1 Parent(s): 4b78e58
Files changed (3) hide show
  1. README.md +39 -20
  2. app.py +208 -13
  3. msr.py +410 -258
README.md CHANGED
@@ -11,32 +11,35 @@ pinned: false
11
  short_description: Track GitHub issue statistics for SWE assistants
12
  ---
13
 
14
- # SWE Assistant Issue Leaderboard
15
 
16
- SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution performance.
17
 
18
- No benchmarks. No sandboxes. Just real issues that got resolved.
19
 
20
  ## Why This Exists
21
 
22
- Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many were completed? Is the assistant improving?
23
 
24
- If an assistant can consistently resolve issues across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
28
  Key metrics from the last 180 days:
29
 
30
  **Leaderboard Table**
 
 
31
  - **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on)
32
- - **Closed Issues**: Issues that were closed
33
  - **Resolved Issues**: Closed issues marked as completed
34
- - **Resolved Rate**: Percentage of closed issues successfully resolved
35
  - **Resolved Wanted Issues**: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
 
36
 
37
  **Monthly Trends**
38
- - Resolved rate trends (line plots)
39
- - Issue volume over time (bar charts)
 
40
 
41
  **Issues Wanted**
42
  - Long-standing open issues (30+ days) with fix-needed labels (e.g. `bug`, `enhancement`) from tracked organizations (Apache, GitHub, Hugging Face)
@@ -46,7 +49,7 @@ We focus on 180 days to highlight current capabilities and active assistants.
46
  ## How It Works
47
 
48
  **Data Collection**
49
- We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking two types of issues:
50
 
51
  1. **Agent-Assigned Issues**:
52
  - Issues opened or assigned to the assistant (`IssuesEvent`)
@@ -57,20 +60,25 @@ We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking t
57
  - Pull requests created by assistants that reference these issues
58
  - Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed
59
 
 
 
 
 
 
60
  **Regular Updates**
61
  Leaderboard refreshes weekly (Friday at 00:00 UTC).
62
 
63
  **Community Submissions**
64
- Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_metadata` and results in `SWE-Arena/leaderboard_metadata`. All submissions are validated via GitHub API.
65
 
66
  ## Using the Leaderboard
67
 
68
  ### Browsing
69
  **Leaderboard Tab**:
70
  - Searchable table (by assistant name or website)
71
- - Filterable columns (by resolved rate)
72
- - Monthly charts (resolution trends and activity)
73
- - View both agent-assigned metrics and wanted issue resolutions
74
 
75
  **Issues Wanted Tab**:
76
  - Browse long-standing open issues (30+ days) from major open-source projects
@@ -88,17 +96,26 @@ Submissions are validated and data loads within seconds.
88
 
89
  ## Understanding the Metrics
90
 
91
- **Resolved Rate**
92
  Percentage of closed issues successfully completed:
93
 
94
  ```
95
- Resolved Rate = resolved issues ÷ closed issues × 100
96
  ```
97
 
98
  An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.
99
 
100
  Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.
101
 
 
 
 
 
 
 
 
 
 
102
  **Resolved Wanted Issues**
103
  Long-standing issues (30+ days old) from major open-source projects that the assistant resolved. An issue qualifies when:
104
  1. It's from a tracked organization (Apache, GitHub, Hugging Face)
@@ -113,24 +130,26 @@ This metric highlights assistants' ability to tackle challenging, community-iden
113
  Issues that have been open for 30+ days represent real challenges the community has struggled to address. These are harder than typical issues and demonstrate an assistant's problem-solving capabilities.
114
 
115
  **Monthly Trends**
116
- - **Line plots**: Resolved rate changes over time
117
- - **Bar charts**: Issue volume per month
118
 
119
  Patterns to watch:
120
  - Consistent high rates = effective problem-solving
121
  - Increasing trends = improving assistants
122
  - High volume + good rates = productivity + effectiveness
123
  - High wanted issue resolution = ability to tackle challenging community problems
 
124
 
125
  ## What's Next
126
 
127
  Planned improvements:
128
  - Repository-based analysis
129
  - Extended metrics (comment activity, response time, code complexity)
130
- - Resolution time tracking from issue creation to PR merge
131
- - Issue category patterns and difficulty assessment
132
  - Expanded organization and label tracking for wanted issues
133
  - Integration with additional high-impact open-source organizations
 
134
 
135
  ## Questions or Issues?
136
 
 
11
  short_description: Track GitHub issue statistics for SWE assistants
12
  ---
13
 
14
+ # SWE Assistant Issue & Discussion Leaderboard
15
 
16
+ SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance.
17
 
18
+ No benchmarks. No sandboxes. Just real issues and discussions that got resolved.
19
 
20
  ## Why This Exists
21
 
22
+ Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving?
23
 
24
+ If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can.
25
 
26
  ## What We Track
27
 
28
  Key metrics from the last 180 days:
29
 
30
  **Leaderboard Table**
31
+ - **Issue Resolved Rate (%)**: Percentage of closed issues successfully resolved
32
+ - **Discussion Resolved Rate (%)**: Percentage of discussions successfully resolved (answered or closed)
33
  - **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on)
34
+ - **Total Discussions**: Discussions the assistant created
35
  - **Resolved Issues**: Closed issues marked as completed
 
36
  - **Resolved Wanted Issues**: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
37
+ - **Resolved Discussions**: Discussions that have been answered or closed
38
 
39
  **Monthly Trends**
40
+ - Issue resolved rate trends (line plots)
41
+ - Discussion resolved rate trends (line plots)
42
+ - Issue and discussion volume over time (bar charts)
43
 
44
  **Issues Wanted**
45
  - Long-standing open issues (30+ days) with fix-needed labels (e.g. `bug`, `enhancement`) from tracked organizations (Apache, GitHub, Hugging Face)
 
49
  ## How It Works
50
 
51
  **Data Collection**
52
+ We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking three types of activities:
53
 
54
  1. **Agent-Assigned Issues**:
55
  - Issues opened or assigned to the assistant (`IssuesEvent`)
 
60
  - Pull requests created by assistants that reference these issues
61
  - Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed
62
 
63
+ 3. **Discussions**:
64
+ - GitHub Discussions created by the assistant (`DiscussionEvent`)
65
+ - Tracked from organizations: Apache, GitHub, Hugging Face
66
+ - A discussion is "resolved" when it has an answer chosen or is marked as answered
67
+
68
  **Regular Updates**
69
  Leaderboard refreshes weekly (Friday at 00:00 UTC).
70
 
71
  **Community Submissions**
72
+ Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_metadata` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
73
 
74
  ## Using the Leaderboard
75
 
76
  ### Browsing
77
  **Leaderboard Tab**:
78
  - Searchable table (by assistant name or website)
79
+ - Filterable columns (by issue resolved rate, discussion resolved rate)
80
+ - Monthly charts (issue and discussion resolution trends and activity)
81
+ - View agent-assigned metrics, wanted issue resolutions, and discussion metrics
82
 
83
  **Issues Wanted Tab**:
84
  - Browse long-standing open issues (30+ days) from major open-source projects
 
96
 
97
  ## Understanding the Metrics
98
 
99
+ **Issue Resolved Rate**
100
  Percentage of closed issues successfully completed:
101
 
102
  ```
103
+ Issue Resolved Rate = resolved issues ÷ closed issues × 100
104
  ```
105
 
106
  An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.
107
 
108
  Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.
109
 
110
+ **Discussion Resolved Rate**
111
+ Percentage of discussions successfully resolved:
112
+
113
+ ```
114
+ Discussion Resolved Rate = resolved discussions ÷ total discussions × 100
115
+ ```
116
+
117
+ A discussion is "resolved" when it has an answer chosen (`answer_chosen_at` is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions.
118
+
119
  **Resolved Wanted Issues**
120
  Long-standing issues (30+ days old) from major open-source projects that the assistant resolved. An issue qualifies when:
121
  1. It's from a tracked organization (Apache, GitHub, Hugging Face)
 
130
  Issues that have been open for 30+ days represent real challenges the community has struggled to address. These are harder than typical issues and demonstrate an assistant's problem-solving capabilities.
131
 
132
  **Monthly Trends**
133
+ - **Line plots**: Issue and discussion resolved rate changes over time
134
+ - **Bar charts**: Issue and discussion volume per month
135
 
136
  Patterns to watch:
137
  - Consistent high rates = effective problem-solving
138
  - Increasing trends = improving assistants
139
  - High volume + good rates = productivity + effectiveness
140
  - High wanted issue resolution = ability to tackle challenging community problems
141
+ - High discussion resolution = effective community engagement and knowledge sharing
142
 
143
  ## What's Next
144
 
145
  Planned improvements:
146
  - Repository-based analysis
147
  - Extended metrics (comment activity, response time, code complexity)
148
+ - Resolution time tracking from issue creation to PR merge and discussion creation to resolution
149
+ - Issue and discussion category patterns and difficulty assessment
150
  - Expanded organization and label tracking for wanted issues
151
  - Integration with additional high-impact open-source organizations
152
+ - Discussion quality metrics (helpfulness, community engagement)
153
 
154
  ## Questions or Issues?
155
 
app.py CHANGED
@@ -27,7 +27,7 @@ load_dotenv()
27
  AGENTS_REPO = "SWE-Arena/bot_metadata" # HuggingFace dataset for agent metadata
28
  AGENTS_REPO_LOCAL_PATH = os.path.expanduser("~/bot_metadata") # Local git clone path
29
  LEADERBOARD_FILENAME = f"{os.getenv('COMPOSE_PROJECT_NAME')}.json"
30
- LEADERBOARD_REPO = "SWE-Arena/leaderboard_metadata" # HuggingFace dataset for leaderboard data
31
  LONGSTANDING_GAP_DAYS = 30 # Minimum days for an issue to be considered long-standing
32
  GIT_SYNC_TIMEOUT = 300 # 5 minutes timeout for git pull
33
  MAX_RETRIES = 5
@@ -35,10 +35,13 @@ MAX_RETRIES = 5
35
  LEADERBOARD_COLUMNS = [
36
  ("Agent Name", "string"),
37
  ("Website", "string"),
 
 
38
  ("Total Issues", "number"),
 
39
  ("Resolved Issues", "number"),
40
- ("Resolved Rate (%)", "number"),
41
  ("Resolved Wanted Issues", "number"),
 
42
  ]
43
 
44
  # =============================================================================
@@ -507,6 +510,177 @@ def create_monthly_metrics_plot(top_n=5):
507
  return fig
508
 
509
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
510
  def get_leaderboard_dataframe():
511
  """
512
  Load leaderboard from saved dataset and convert to pandas DataFrame for display.
@@ -543,14 +717,17 @@ def get_leaderboard_dataframe():
543
  filtered_count += 1
544
  continue
545
 
546
- # Only include display-relevant fields
547
  rows.append([
548
  data.get('name', 'Unknown'),
549
  data.get('website', 'N/A'),
550
- total_issues,
551
- data.get('resolved_issues', 0),
552
- data.get('resolved_rate', 0.0),
553
- data.get('resolved_wanted_issues', 0),
 
 
 
554
  ])
555
 
556
  print(f"Filtered out {filtered_count} agents with 0 issues")
@@ -561,7 +738,11 @@ def get_leaderboard_dataframe():
561
  df = pd.DataFrame(rows, columns=column_names)
562
 
563
  # Ensure numeric types
564
- numeric_cols = ["Total Issues", "Resolved Issues", "Resolved Rate (%)", "Resolved Wanted Issues"]
 
 
 
 
565
  for col in numeric_cols:
566
  if col in df.columns:
567
  df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
@@ -726,9 +907,9 @@ print(f"On startup: Loads cached data from HuggingFace on demand")
726
  print(f"{'='*80}\n")
727
 
728
  # Create Gradio interface
729
- with gr.Blocks(title="SWE Agent Issue Leaderboard", theme=gr.themes.Soft()) as app:
730
- gr.Markdown("# SWE Agent Issue Leaderboard")
731
- gr.Markdown(f"Track and compare GitHub issue resolution statistics for SWE agents")
732
 
733
  with gr.Tabs():
734
 
@@ -741,12 +922,12 @@ with gr.Blocks(title="SWE Agent Issue Leaderboard", theme=gr.themes.Soft()) as a
741
  search_columns=["Agent Name", "Website"],
742
  filter_columns=[
743
  ColumnFilter(
744
- "Resolved Rate (%)",
745
  min=0,
746
  max=100,
747
  default=[0, 100],
748
  type="slider",
749
- label="Resolved Rate (%)"
750
  )
751
  ]
752
  )
@@ -772,6 +953,20 @@ with gr.Blocks(title="SWE Agent Issue Leaderboard", theme=gr.themes.Soft()) as a
772
  outputs=[monthly_metrics_plot]
773
  )
774
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
775
 
776
  # Issues Wanted Tab
777
  with gr.Tab("Issues Wanted"):
 
27
  AGENTS_REPO = "SWE-Arena/bot_metadata" # HuggingFace dataset for agent metadata
28
  AGENTS_REPO_LOCAL_PATH = os.path.expanduser("~/bot_metadata") # Local git clone path
29
  LEADERBOARD_FILENAME = f"{os.getenv('COMPOSE_PROJECT_NAME')}.json"
30
+ LEADERBOARD_REPO = "SWE-Arena/leaderboard_data" # HuggingFace dataset for leaderboard data
31
  LONGSTANDING_GAP_DAYS = 30 # Minimum days for an issue to be considered long-standing
32
  GIT_SYNC_TIMEOUT = 300 # 5 minutes timeout for git pull
33
  MAX_RETRIES = 5
 
35
  LEADERBOARD_COLUMNS = [
36
  ("Agent Name", "string"),
37
  ("Website", "string"),
38
+ ("Issue Resolved Rate (%)", "number"),
39
+ ("Discussion Resolved Rate (%)", "number"),
40
  ("Total Issues", "number"),
41
+ ("Total Discussions", "number"),
42
  ("Resolved Issues", "number"),
 
43
  ("Resolved Wanted Issues", "number"),
44
+ ("Resolved Discussions", "number"),
45
  ]
46
 
47
  # =============================================================================
 
510
  return fig
511
 
512
 
513
+ def create_discussion_monthly_metrics_plot(top_n=5):
514
+ """
515
+ Create a Plotly figure with dual y-axes showing discussion metrics:
516
+ - Left y-axis: Discussion Resolved Rate (%) as line curves
517
+ - Right y-axis: Total Discussions created as bar charts
518
+
519
+ Each agent gets a unique color for both their line and bars.
520
+
521
+ Args:
522
+ top_n: Number of top agents to show (default: 5)
523
+ """
524
+ # Load from saved dataset
525
+ saved_data = load_leaderboard_data_from_hf()
526
+
527
+ if not saved_data or 'discussion_monthly_metrics' not in saved_data:
528
+ # Return an empty figure with a message
529
+ fig = go.Figure()
530
+ fig.add_annotation(
531
+ text="No discussion data available for visualization",
532
+ xref="paper", yref="paper",
533
+ x=0.5, y=0.5, showarrow=False,
534
+ font=dict(size=16)
535
+ )
536
+ fig.update_layout(
537
+ title=None,
538
+ xaxis_title=None,
539
+ height=500
540
+ )
541
+ return fig
542
+
543
+ metrics = saved_data['discussion_monthly_metrics']
544
+ print(f"Loaded discussion monthly metrics from saved dataset")
545
+
546
+ # Apply top_n filter if specified
547
+ if top_n is not None and top_n > 0 and metrics.get('agents'):
548
+ # Calculate total discussions for each agent
549
+ agent_totals = []
550
+ for agent_name in metrics['agents']:
551
+ agent_data = metrics['data'].get(agent_name, {})
552
+ total_discussions = sum(agent_data.get('total_discussions', []))
553
+ agent_totals.append((agent_name, total_discussions))
554
+
555
+ # Sort by total discussions and take top N
556
+ agent_totals.sort(key=lambda x: x[1], reverse=True)
557
+ top_agents = [agent_name for agent_name, _ in agent_totals[:top_n]]
558
+
559
+ # Filter metrics to only include top agents
560
+ metrics = {
561
+ 'agents': top_agents,
562
+ 'months': metrics['months'],
563
+ 'data': {agent: metrics['data'][agent] for agent in top_agents if agent in metrics['data']}
564
+ }
565
+
566
+ if not metrics['agents'] or not metrics['months']:
567
+ # Return an empty figure with a message
568
+ fig = go.Figure()
569
+ fig.add_annotation(
570
+ text="No discussion data available for visualization",
571
+ xref="paper", yref="paper",
572
+ x=0.5, y=0.5, showarrow=False,
573
+ font=dict(size=16)
574
+ )
575
+ fig.update_layout(
576
+ title=None,
577
+ xaxis_title=None,
578
+ height=500
579
+ )
580
+ return fig
581
+
582
+ # Create figure with secondary y-axis
583
+ fig = make_subplots(specs=[[{"secondary_y": True}]])
584
+
585
+ # Generate unique colors for many agents using HSL color space
586
+ def generate_color(index, total):
587
+ """Generate distinct colors using HSL color space for better distribution"""
588
+ hue = (index * 360 / total) % 360
589
+ saturation = 70 + (index % 3) * 10 # Vary saturation slightly
590
+ lightness = 45 + (index % 2) * 10 # Vary lightness slightly
591
+ return f'hsl({hue}, {saturation}%, {lightness}%)'
592
+
593
+ agents = metrics['agents']
594
+ months = metrics['months']
595
+ data = metrics['data']
596
+
597
+ # Generate colors for all agents
598
+ agent_colors = {agent: generate_color(idx, len(agents)) for idx, agent in enumerate(agents)}
599
+
600
+ # Add traces for each agent
601
+ for idx, agent_name in enumerate(agents):
602
+ color = agent_colors[agent_name]
603
+ agent_data = data[agent_name]
604
+
605
+ # Add line trace for resolved rate (left y-axis)
606
+ resolved_rates = agent_data['resolved_rates']
607
+ # Filter out None values for plotting
608
+ x_resolved = [month for month, rate in zip(months, resolved_rates) if rate is not None]
609
+ y_resolved = [rate for rate in resolved_rates if rate is not None]
610
+
611
+ if x_resolved and y_resolved: # Only add trace if there's data
612
+ fig.add_trace(
613
+ go.Scatter(
614
+ x=x_resolved,
615
+ y=y_resolved,
616
+ name=agent_name,
617
+ mode='lines+markers',
618
+ line=dict(color=color, width=2),
619
+ marker=dict(size=8),
620
+ legendgroup=agent_name,
621
+ showlegend=(top_n is not None and top_n <= 10), # Show legend for top N agents
622
+ hovertemplate='<b>Agent: %{fullData.name}</b><br>' +
623
+ 'Month: %{x}<br>' +
624
+ 'Discussion Resolved Rate: %{y:.2f}%<br>' +
625
+ '<extra></extra>'
626
+ ),
627
+ secondary_y=False
628
+ )
629
+
630
+ # Add bar trace for total discussions (right y-axis)
631
+ # Only show bars for months where agent has discussions
632
+ x_bars = []
633
+ y_bars = []
634
+ for month, count in zip(months, agent_data['total_discussions']):
635
+ if count > 0: # Only include months with discussions
636
+ x_bars.append(month)
637
+ y_bars.append(count)
638
+
639
+ if x_bars and y_bars: # Only add trace if there's data
640
+ fig.add_trace(
641
+ go.Bar(
642
+ x=x_bars,
643
+ y=y_bars,
644
+ name=agent_name,
645
+ marker=dict(color=color, opacity=0.6),
646
+ legendgroup=agent_name,
647
+ showlegend=False, # Hide duplicate legend entry (already shown in Scatter)
648
+ hovertemplate='<b>Agent: %{fullData.name}</b><br>' +
649
+ 'Month: %{x}<br>' +
650
+ 'Total Discussions: %{y}<br>' +
651
+ '<extra></extra>',
652
+ offsetgroup=agent_name # Group bars by agent for proper spacing
653
+ ),
654
+ secondary_y=True
655
+ )
656
+
657
+ # Update axes labels
658
+ fig.update_xaxes(title_text=None)
659
+ fig.update_yaxes(
660
+ title_text="<b>Discussion Resolved Rate (%)</b>",
661
+ range=[0, 100],
662
+ secondary_y=False,
663
+ showticklabels=True,
664
+ tickmode='linear',
665
+ dtick=10,
666
+ showgrid=True
667
+ )
668
+ fig.update_yaxes(title_text="<b>Total Discussions</b>", secondary_y=True)
669
+
670
+ # Update layout
671
+ show_legend = (top_n is not None and top_n <= 10)
672
+ fig.update_layout(
673
+ title=None,
674
+ hovermode='closest', # Show individual agent info on hover
675
+ barmode='group',
676
+ height=600,
677
+ showlegend=show_legend,
678
+ margin=dict(l=50, r=150 if show_legend else 50, t=50, b=50) # More right margin when legend is shown
679
+ )
680
+
681
+ return fig
682
+
683
+
684
  def get_leaderboard_dataframe():
685
  """
686
  Load leaderboard from saved dataset and convert to pandas DataFrame for display.
 
717
  filtered_count += 1
718
  continue
719
 
720
+ # Only include display-relevant fields (new column order)
721
  rows.append([
722
  data.get('name', 'Unknown'),
723
  data.get('website', 'N/A'),
724
+ data.get('resolved_rate', 0.0), # Issue Resolved Rate (%)
725
+ data.get('discussion_resolved_rate', 0.0), # Discussion Resolved Rate (%)
726
+ total_issues, # Total Issues
727
+ data.get('total_discussions', 0), # Total Discussions
728
+ data.get('resolved_issues', 0), # Resolved Issues
729
+ data.get('resolved_wanted_issues', 0), # Resolved Wanted Issues
730
+ data.get('resolved_discussions', 0), # Resolved Discussions
731
  ])
732
 
733
  print(f"Filtered out {filtered_count} agents with 0 issues")
 
738
  df = pd.DataFrame(rows, columns=column_names)
739
 
740
  # Ensure numeric types
741
+ numeric_cols = [
742
+ "Issue Resolved Rate (%)", "Discussion Resolved Rate (%)",
743
+ "Total Issues", "Total Discussions",
744
+ "Resolved Issues", "Resolved Wanted Issues", "Resolved Discussions"
745
+ ]
746
  for col in numeric_cols:
747
  if col in df.columns:
748
  df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
 
907
  print(f"{'='*80}\n")
908
 
909
  # Create Gradio interface
910
+ with gr.Blocks(title="SWE Agent Issue & Discussion Leaderboard", theme=gr.themes.Soft()) as app:
911
+ gr.Markdown("# SWE Agent Issue & Discussion Leaderboard")
912
+ gr.Markdown(f"Track and compare GitHub issue and discussion resolution statistics for SWE agents")
913
 
914
  with gr.Tabs():
915
 
 
922
  search_columns=["Agent Name", "Website"],
923
  filter_columns=[
924
  ColumnFilter(
925
+ "Issue Resolved Rate (%)",
926
  min=0,
927
  max=100,
928
  default=[0, 100],
929
  type="slider",
930
+ label="Issue Resolved Rate (%)"
931
  )
932
  ]
933
  )
 
953
  outputs=[monthly_metrics_plot]
954
  )
955
 
956
+ # Discussion Monthly Metrics Section
957
+ gr.Markdown("---") # Divider
958
+ gr.Markdown("### Discussion Performance - Top 5 Agents")
959
+ gr.Markdown("*Shows discussion resolution trends and volumes for the most active agents*")
960
+
961
+ discussion_metrics_plot = gr.Plot(label="Discussion Monthly Metrics")
962
+
963
+ # Load discussion monthly metrics when app starts
964
+ app.load(
965
+ fn=lambda: create_discussion_monthly_metrics_plot(),
966
+ inputs=[],
967
+ outputs=[discussion_metrics_plot]
968
+ )
969
+
970
 
971
  # Issues Wanted Tab
972
  with gr.Tab("Issues Wanted"):
msr.py CHANGED
@@ -30,7 +30,7 @@ AGENTS_REPO_LOCAL_PATH = os.path.expanduser("~/bot_metadata") # Local git clone
30
  DUCKDB_CACHE_FILE = "cache.duckdb"
31
  GHARCHIVE_DATA_LOCAL_PATH = os.path.expanduser("~/gharchive/data")
32
  LEADERBOARD_FILENAME = f"{os.getenv('COMPOSE_PROJECT_NAME')}.json"
33
- LEADERBOARD_REPO = "SWE-Arena/leaderboard_metadata"
34
  LEADERBOARD_TIME_FRAME_DAYS = 180
35
  LONGSTANDING_GAP_DAYS = 30 # Minimum days for an issue to be considered long-standing
36
 
@@ -355,181 +355,22 @@ def generate_file_path_patterns(start_date, end_date, data_dir=GHARCHIVE_DATA_LO
355
 
356
 
357
  # =============================================================================
358
- # STREAMING BATCH PROCESSING FOR ISSUES
359
  # =============================================================================
360
 
361
- def fetch_all_issue_metadata_streaming(conn, identifiers, start_date, end_date):
362
  """
363
- OPTIMIZED: Fetch issue metadata using streaming batch processing.
 
 
 
364
 
365
- Only tracks issues assigned to the agents.
366
-
367
- Processes GHArchive files in BATCH_SIZE_DAYS chunks to limit memory usage.
368
- Instead of loading 180 days (4,344 files) at once, processes 7 days at a time.
369
-
370
- This prevents OOM errors by:
371
- 1. Only keeping ~168 hourly files in memory per batch (vs 4,344)
372
- 2. Incrementally building the results dictionary
373
- 3. Allowing DuckDB to garbage collect after each batch
374
-
375
- Args:
376
- conn: DuckDB connection instance
377
- identifiers: List of GitHub usernames/bot identifiers (~1500)
378
- start_date: Start datetime (timezone-aware)
379
- end_date: End datetime (timezone-aware)
380
-
381
- Returns:
382
- Dictionary mapping agent identifier to list of issue metadata
383
- """
384
- identifier_list = ', '.join([f"'{id}'" for id in identifiers])
385
- metadata_by_agent = defaultdict(list)
386
-
387
- # Calculate total batches
388
- total_days = (end_date - start_date).days
389
- total_batches = (total_days // BATCH_SIZE_DAYS) + 1
390
-
391
- # Process in configurable batches
392
- current_date = start_date
393
- batch_num = 0
394
- total_issues = 0
395
-
396
- print(f" Streaming {total_batches} batches of {BATCH_SIZE_DAYS}-day intervals...")
397
-
398
- while current_date <= end_date:
399
- batch_num += 1
400
- batch_end = min(current_date + timedelta(days=BATCH_SIZE_DAYS - 1), end_date)
401
-
402
- # Get file patterns for THIS BATCH ONLY (not all 180 days)
403
- file_patterns = generate_file_path_patterns(current_date, batch_end)
404
-
405
- if not file_patterns:
406
- print(f" Batch {batch_num}/{total_batches}: {current_date.date()} to {batch_end.date()} - NO DATA")
407
- current_date = batch_end + timedelta(days=1)
408
- continue
409
-
410
- # Progress indicator
411
- print(f" Batch {batch_num}/{total_batches}: {current_date.date()} to {batch_end.date()} ({len(file_patterns)} files)... ", end="", flush=True)
412
-
413
- # Build file patterns SQL for THIS BATCH
414
- file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
415
-
416
- # Query for this batch
417
- # Note: For IssuesEvent, we use the issue user/assignee as author
418
- # For IssueCommentEvent, we use the commenter as author
419
- # IMPORTANT: We collect events from this batch's time range, but filter to only
420
- # include issues that were CREATED within the overall timeframe (start_date).
421
- # This prevents including old issues that just happen to have recent events.
422
- # We still check their closed_at status (which may be outside the timeframe).
423
- query = f"""
424
- WITH issue_events AS (
425
- SELECT
426
- CONCAT(
427
- REPLACE(repo.url, 'api.github.com/repos/', 'github.com/'),
428
- '/issues/',
429
- CAST(payload.issue.number AS VARCHAR)
430
- ) as url,
431
- CASE
432
- WHEN type = 'IssuesEvent' THEN
433
- COALESCE(
434
- CASE WHEN payload.issue.user.login IN ({identifier_list}) THEN payload.issue.user.login END,
435
- payload.issue.assignee.login,
436
- (SELECT a.login
437
- FROM (SELECT UNNEST(payload.issue.assignees) as a)
438
- WHERE a.login IN ({identifier_list})
439
- LIMIT 1)
440
- )
441
- WHEN type = 'IssueCommentEvent' THEN
442
- payload.comment.user.login
443
- ELSE NULL
444
- END as agent_identifier,
445
- created_at as event_time,
446
- payload.issue.created_at as issue_created_at,
447
- payload.issue.closed_at as issue_closed_at,
448
- payload.issue.state_reason as state_reason
449
- FROM read_json({file_patterns_sql}, union_by_name=true, filename=true, compression='gzip', format='newline_delimited', ignore_errors=true, maximum_object_size=2147483648)
450
- WHERE
451
- type IN ('IssuesEvent', 'IssueCommentEvent')
452
- AND payload.issue.number IS NOT NULL
453
- AND payload.issue.pull_request IS NULL
454
- AND (
455
- (type = 'IssuesEvent'
456
- AND (
457
- payload.issue.user.login IN ({identifier_list})
458
- OR payload.issue.assignee.login IN ({identifier_list})
459
- OR EXISTS (
460
- SELECT 1 FROM (SELECT UNNEST(payload.issue.assignees) as a)
461
- WHERE a.login IN ({identifier_list})
462
- )
463
- ))
464
- OR (type = 'IssueCommentEvent' AND payload.comment.user.login IN ({identifier_list}))
465
- )
466
- ),
467
- issue_timeline AS (
468
- SELECT
469
- url,
470
- agent_identifier,
471
- MIN(issue_created_at) as created_at,
472
- MAX(issue_closed_at) as closed_at,
473
- MAX(state_reason) as state_reason
474
- FROM issue_events
475
- GROUP BY url, agent_identifier
476
- )
477
- SELECT url, agent_identifier, created_at, closed_at, state_reason
478
- FROM issue_timeline
479
- WHERE agent_identifier IS NOT NULL
480
- AND created_at IS NOT NULL
481
- AND created_at >= '{start_date.isoformat()}'
482
- """
483
-
484
- try:
485
- results = conn.execute(query).fetchall()
486
- batch_issues = 0
487
-
488
- # Add results to accumulating dictionary
489
- for row in results:
490
- url = row[0]
491
- agent_identifier = row[1]
492
- created_at = normalize_date_format(row[2]) if row[2] else None
493
- closed_at = normalize_date_format(row[3]) if row[3] else None
494
- state_reason = row[4]
495
-
496
- if not url or not agent_identifier:
497
- continue
498
-
499
- issue_metadata = {
500
- 'url': url,
501
- 'created_at': created_at,
502
- 'closed_at': closed_at,
503
- 'state_reason': state_reason,
504
- }
505
-
506
- metadata_by_agent[agent_identifier].append(issue_metadata)
507
- batch_issues += 1
508
- total_issues += 1
509
-
510
- print(f"✓ {batch_issues} issues found")
511
-
512
- except Exception as e:
513
- print(f"\n ✗ Batch {batch_num} error: {str(e)}")
514
- traceback.print_exc()
515
-
516
- # Move to next batch
517
- current_date = batch_end + timedelta(days=1)
518
-
519
- # Final summary
520
- agents_with_data = sum(1 for issues in metadata_by_agent.values() if issues)
521
- print(f"\n ✓ Complete: {total_issues} issues found for {agents_with_data}/{len(identifiers)} agents")
522
-
523
- return dict(metadata_by_agent)
524
-
525
-
526
- def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_date):
527
- """
528
- UNIFIED: Fetch both agent-assigned issues AND wanted issues using streaming batch processing.
529
-
530
- Tracks TWO types of issues:
531
  1. Agent-assigned issues: Issues where agents are assigned to or commented on
532
  2. Wanted issues: Long-standing issues from tracked orgs linked to merged PRs by agents
 
 
 
533
 
534
  Args:
535
  conn: DuckDB connection instance
@@ -538,18 +379,20 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
538
  end_date: End datetime (timezone-aware)
539
 
540
  Returns:
541
- Dictionary with three keys:
542
  - 'agent_issues': {agent_id: [issue_metadata]} for agent-assigned issues
543
  - 'wanted_open': [open_wanted_issues] for long-standing open issues
544
  - 'wanted_resolved': {agent_id: [resolved_wanted]} for resolved wanted issues
 
545
  """
546
- # First, get agent-assigned issues using existing function
547
- print(f" [1/2] Fetching agent-assigned/commented issues...")
548
- agent_issues = fetch_all_issue_metadata_streaming(conn, identifiers, start_date, end_date)
549
-
550
- # Now fetch wanted issues
551
- print(f"\n [2/2] Fetching wanted issues from tracked orgs...")
552
  identifier_set = set(identifiers)
 
 
 
 
 
 
553
 
554
  # Storage for wanted issues
555
  all_issues = {} # issue_url -> issue_metadata
@@ -557,6 +400,9 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
557
  pr_creators = {} # pr_url -> creator login
558
  pr_merged_at = {} # pr_url -> merged_at timestamp
559
 
 
 
 
560
  # Calculate total batches
561
  total_days = (end_date - start_date).days
562
  total_batches = (total_days // BATCH_SIZE_DAYS) + 1
@@ -565,7 +411,7 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
565
  current_date = start_date
566
  batch_num = 0
567
 
568
- print(f" Streaming {total_batches} batches for wanted issues...")
569
 
570
  while current_date <= end_date:
571
  batch_num += 1
@@ -586,42 +432,212 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
586
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
587
 
588
  try:
589
- # Create temp view from file read (done ONCE per batch)
590
- conn.execute(f"""
591
- CREATE OR REPLACE TEMP VIEW batch_data AS
592
- SELECT *
593
- FROM read_json({file_patterns_sql}, union_by_name=true, filename=true, compression='gzip', format='newline_delimited', ignore_errors=true, maximum_object_size=2147483648)
594
- """)
595
-
596
- # Query 1: Fetch all issues (NOT PRs) from tracked orgs
597
- issue_query = """
598
  SELECT
599
- json_extract_string(payload, '$.issue.html_url') as issue_url,
600
  json_extract_string(repo, '$.name') as repo_name,
601
- json_extract_string(payload, '$.issue.title') as title,
 
 
 
602
  json_extract_string(payload, '$.issue.number') as issue_number,
603
- MIN(json_extract_string(payload, '$.issue.created_at')) as created_at,
604
- MAX(json_extract_string(payload, '$.issue.closed_at')) as closed_at,
605
- json_extract(payload, '$.issue.labels') as labels
606
- FROM batch_data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
607
  WHERE
608
- type IN ('IssuesEvent', 'IssueCommentEvent')
609
- AND json_extract_string(payload, '$.issue.pull_request') IS NULL
610
- AND json_extract_string(payload, '$.issue.html_url') IS NOT NULL
611
- GROUP BY issue_url, repo_name, title, issue_number, labels
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
612
  """
613
 
614
- issue_results = conn.execute(issue_query).fetchall()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
615
 
616
- # Filter issues by tracked orgs and collect them
617
- for row in issue_results:
618
- issue_url = row[0]
 
 
 
 
 
 
 
 
 
 
619
  repo_name = row[1]
620
- title = row[2]
621
- issue_number = row[3]
622
- created_at = row[4]
623
- closed_at = row[5]
624
- labels_json = row[6]
 
625
 
626
  if not issue_url or not repo_name:
627
  continue
@@ -667,38 +683,13 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
667
  'labels': label_names
668
  }
669
 
670
- # Query 2: Find PRs from both IssueCommentEvent and PullRequestEvent
671
- pr_query = """
672
- SELECT DISTINCT
673
- COALESCE(
674
- json_extract_string(payload, '$.issue.html_url'),
675
- json_extract_string(payload, '$.pull_request.html_url')
676
- ) as pr_url,
677
- COALESCE(
678
- json_extract_string(payload, '$.issue.user.login'),
679
- json_extract_string(payload, '$.pull_request.user.login')
680
- ) as pr_creator,
681
- COALESCE(
682
- json_extract_string(payload, '$.issue.pull_request.merged_at'),
683
- json_extract_string(payload, '$.pull_request.merged_at')
684
- ) as merged_at,
685
- COALESCE(
686
- json_extract_string(payload, '$.issue.body'),
687
- json_extract_string(payload, '$.pull_request.body')
688
- ) as pr_body
689
- FROM batch_data
690
- WHERE
691
- (type = 'IssueCommentEvent' AND json_extract_string(payload, '$.issue.pull_request') IS NOT NULL)
692
- OR type = 'PullRequestEvent'
693
- """
694
-
695
- pr_results = conn.execute(pr_query).fetchall()
696
-
697
- for row in pr_results:
698
- pr_url = row[0]
699
- pr_creator = row[1]
700
- merged_at = row[2]
701
- pr_body = row[3]
702
 
703
  if not pr_url or not pr_creator:
704
  continue
@@ -725,19 +716,76 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
725
  else:
726
  issue_to_prs[ref].add(pr_url)
727
 
728
- print(f"✓ {len(issue_results)} issues, {len(pr_results)} PRs")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
729
 
730
- # Clean up temp view after batch processing
731
- conn.execute("DROP VIEW IF EXISTS batch_data")
732
 
733
  except Exception as e:
734
  print(f"\n ✗ Batch {batch_num} error: {str(e)}")
735
  traceback.print_exc()
736
- # Clean up temp view even on error
737
- try:
738
- conn.execute("DROP VIEW IF EXISTS batch_data")
739
- except:
740
- pass
741
 
742
  # Move to next batch
743
  current_date = batch_end + timedelta(days=1)
@@ -814,13 +862,16 @@ def fetch_unified_issue_metadata_streaming(conn, identifiers, start_date, end_da
814
  except:
815
  pass
816
 
 
817
  print(f" ✓ Found {len(wanted_open)} long-standing open wanted issues")
818
  print(f" ✓ Found {sum(len(issues) for issues in wanted_resolved.values())} resolved wanted issues across {len(wanted_resolved)} agents")
 
819
 
820
  return {
821
- 'agent_issues': agent_issues,
822
  'wanted_open': wanted_open,
823
- 'wanted_resolved': dict(wanted_resolved)
 
824
  }
825
 
826
 
@@ -1020,13 +1071,94 @@ def calculate_monthly_metrics_by_agent(all_metadata_dict, agents):
1020
  }
1021
 
1022
 
1023
- def construct_leaderboard_from_metadata(all_metadata_dict, agents, wanted_resolved_dict=None):
1024
- """Construct leaderboard from in-memory issue metadata.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1025
 
1026
  Args:
1027
  all_metadata_dict: Dictionary mapping agent ID to list of issue metadata (agent-assigned issues)
1028
  agents: List of agent metadata
1029
  wanted_resolved_dict: Optional dictionary mapping agent ID to list of resolved wanted issues
 
1030
  """
1031
  if not agents:
1032
  print("Error: No agents found")
@@ -1035,6 +1167,9 @@ def construct_leaderboard_from_metadata(all_metadata_dict, agents, wanted_resolv
1035
  if wanted_resolved_dict is None:
1036
  wanted_resolved_dict = {}
1037
 
 
 
 
1038
  cache_dict = {}
1039
 
1040
  for agent in agents:
@@ -1047,19 +1182,24 @@ def construct_leaderboard_from_metadata(all_metadata_dict, agents, wanted_resolv
1047
  # Add wanted issues count
1048
  resolved_wanted = len(wanted_resolved_dict.get(identifier, []))
1049
 
 
 
 
 
1050
  cache_dict[identifier] = {
1051
  'name': agent_name,
1052
  'website': agent.get('website', 'N/A'),
1053
  'github_identifier': identifier,
1054
  **stats,
1055
- 'resolved_wanted_issues': resolved_wanted
 
1056
  }
1057
 
1058
  return cache_dict
1059
 
1060
 
1061
- def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics, wanted_issues=None):
1062
- """Save leaderboard data, monthly metrics, and wanted issues to HuggingFace dataset."""
1063
  try:
1064
  token = get_hf_token()
1065
  if not token:
@@ -1070,6 +1210,9 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics, wanted_issues
1070
  if wanted_issues is None:
1071
  wanted_issues = []
1072
 
 
 
 
1073
  combined_data = {
1074
  'metadata': {
1075
  'last_updated': datetime.now(timezone.utc).isoformat(),
@@ -1080,7 +1223,8 @@ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics, wanted_issues
1080
  },
1081
  'leaderboard': leaderboard_dict,
1082
  'monthly_metrics': monthly_metrics,
1083
- 'wanted_issues': wanted_issues
 
1084
  }
1085
 
1086
  with open(LEADERBOARD_FILENAME, 'w') as f:
@@ -1144,14 +1288,15 @@ def mine_all_agents():
1144
  start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
1145
 
1146
  try:
1147
- # USE UNIFIED STREAMING FUNCTION FOR BOTH ISSUE TYPES
1148
- results = fetch_unified_issue_metadata_streaming(
1149
  conn, identifiers, start_date, end_date
1150
  )
1151
 
1152
  agent_issues = results['agent_issues']
1153
  wanted_open = results['wanted_open']
1154
  wanted_resolved = results['wanted_resolved']
 
1155
 
1156
  except Exception as e:
1157
  print(f"Error during DuckDB fetch: {str(e)}")
@@ -1163,9 +1308,16 @@ def mine_all_agents():
1163
  print(f"\n[4/4] Saving leaderboard...")
1164
 
1165
  try:
1166
- leaderboard_dict = construct_leaderboard_from_metadata(agent_issues, agents, wanted_resolved)
 
 
1167
  monthly_metrics = calculate_monthly_metrics_by_agent(agent_issues, agents)
1168
- save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics, wanted_open)
 
 
 
 
 
1169
 
1170
  except Exception as e:
1171
  print(f"Error saving leaderboard: {str(e)}")
 
30
  DUCKDB_CACHE_FILE = "cache.duckdb"
31
  GHARCHIVE_DATA_LOCAL_PATH = os.path.expanduser("~/gharchive/data")
32
  LEADERBOARD_FILENAME = f"{os.getenv('COMPOSE_PROJECT_NAME')}.json"
33
+ LEADERBOARD_REPO = "SWE-Arena/leaderboard_data"
34
  LEADERBOARD_TIME_FRAME_DAYS = 180
35
  LONGSTANDING_GAP_DAYS = 30 # Minimum days for an issue to be considered long-standing
36
 
 
355
 
356
 
357
  # =============================================================================
358
+ # STREAMING BATCH PROCESSING - UNIFIED QUERY FOR ALL METADATA
359
  # =============================================================================
360
 
361
+ def fetch_all_metadata_streaming(conn, identifiers, start_date, end_date):
362
  """
363
+ UNIFIED QUERY: Fetches ALL metadata types in ONE query per batch:
364
+ - IssuesEvent, IssueCommentEvent (for agent-assigned issues AND wanted issues)
365
+ - PullRequestEvent (for wanted issue tracking)
366
+ - DiscussionEvent (for discussion tracking)
367
 
368
+ Then post-processes in Python to separate into:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
369
  1. Agent-assigned issues: Issues where agents are assigned to or commented on
370
  2. Wanted issues: Long-standing issues from tracked orgs linked to merged PRs by agents
371
+ 3. Discussions: GitHub discussions created by agents
372
+
373
+ This approach is more efficient than running separate queries for each category.
374
 
375
  Args:
376
  conn: DuckDB connection instance
 
379
  end_date: End datetime (timezone-aware)
380
 
381
  Returns:
382
+ Dictionary with four keys:
383
  - 'agent_issues': {agent_id: [issue_metadata]} for agent-assigned issues
384
  - 'wanted_open': [open_wanted_issues] for long-standing open issues
385
  - 'wanted_resolved': {agent_id: [resolved_wanted]} for resolved wanted issues
386
+ - 'agent_discussions': {agent_id: [discussion_metadata]} for agent discussions
387
  """
388
+ print(f" Fetching ALL metadata (issues, PRs, discussions) with unified query...")
 
 
 
 
 
389
  identifier_set = set(identifiers)
390
+ identifier_list = ', '.join([f"'{id}'" for id in identifiers])
391
+ tracked_orgs_list = ', '.join([f"'{org}'" for org in TRACKED_ORGS])
392
+
393
+ # Storage for agent-assigned issues
394
+ agent_issues = defaultdict(list) # agent_id -> [issue_metadata]
395
+ agent_issue_urls = defaultdict(set) # agent_id -> set of issue URLs (for deduplication)
396
 
397
  # Storage for wanted issues
398
  all_issues = {} # issue_url -> issue_metadata
 
400
  pr_creators = {} # pr_url -> creator login
401
  pr_merged_at = {} # pr_url -> merged_at timestamp
402
 
403
+ # Storage for discussions
404
+ discussions_by_agent = defaultdict(list)
405
+
406
  # Calculate total batches
407
  total_days = (end_date - start_date).days
408
  total_batches = (total_days // BATCH_SIZE_DAYS) + 1
 
411
  current_date = start_date
412
  batch_num = 0
413
 
414
+ print(f" Streaming {total_batches} batches with unified query...")
415
 
416
  while current_date <= end_date:
417
  batch_num += 1
 
432
  file_patterns_sql = '[' + ', '.join([f"'{fp}'" for fp in file_patterns]) + ']'
433
 
434
  try:
435
+ # UNIFIED QUERY: Fetch ALL event types in ONE query
436
+ # Post-process in Python to separate into agent-assigned issues, wanted issues, PRs, and discussions
437
+ unified_query = f"""
 
 
 
 
 
 
438
  SELECT
439
+ type,
440
  json_extract_string(repo, '$.name') as repo_name,
441
+ json_extract_string(repo, '$.url') as repo_url,
442
+ -- Issue fields
443
+ json_extract_string(payload, '$.issue.html_url') as issue_url,
444
+ json_extract_string(payload, '$.issue.title') as issue_title,
445
  json_extract_string(payload, '$.issue.number') as issue_number,
446
+ json_extract_string(payload, '$.issue.created_at') as issue_created_at,
447
+ json_extract_string(payload, '$.issue.closed_at') as issue_closed_at,
448
+ json_extract(payload, '$.issue.labels') as issue_labels,
449
+ json_extract_string(payload, '$.issue.pull_request') as is_pull_request,
450
+ json_extract_string(payload, '$.issue.state_reason') as issue_state_reason,
451
+ -- Actor/assignee fields for agent assignment
452
+ json_extract_string(payload, '$.issue.user.login') as issue_creator,
453
+ json_extract_string(payload, '$.issue.assignee.login') as issue_assignee,
454
+ json_extract(payload, '$.issue.assignees') as issue_assignees,
455
+ json_extract_string(payload, '$.comment.user.login') as commenter,
456
+ -- PR fields
457
+ COALESCE(
458
+ json_extract_string(payload, '$.issue.html_url'),
459
+ json_extract_string(payload, '$.pull_request.html_url')
460
+ ) as pr_url,
461
+ COALESCE(
462
+ json_extract_string(payload, '$.issue.user.login'),
463
+ json_extract_string(payload, '$.pull_request.user.login')
464
+ ) as pr_creator,
465
+ COALESCE(
466
+ json_extract_string(payload, '$.issue.pull_request.merged_at'),
467
+ json_extract_string(payload, '$.pull_request.merged_at')
468
+ ) as pr_merged_at,
469
+ COALESCE(
470
+ json_extract_string(payload, '$.issue.body'),
471
+ json_extract_string(payload, '$.pull_request.body')
472
+ ) as pr_body,
473
+ -- Discussion fields
474
+ json_extract_string(payload, '$.discussion.html_url') as discussion_url,
475
+ json_extract_string(payload, '$.discussion.user.login') as discussion_creator,
476
+ json_extract_string(payload, '$.discussion.created_at') as discussion_created_at,
477
+ json_extract_string(payload, '$.discussion.answer_chosen_at') as discussion_closed_at,
478
+ json_extract_string(payload, '$.discussion.state_reason') as discussion_state_reason,
479
+ json_extract_string(payload, '$.action') as action
480
+ FROM read_json({file_patterns_sql}, union_by_name=true, filename=true, compression='gzip', format='newline_delimited', ignore_errors=true, maximum_object_size=2147483648)
481
  WHERE
482
+ type IN ('IssuesEvent', 'IssueCommentEvent', 'PullRequestEvent', 'DiscussionEvent')
483
+ AND (
484
+ -- Agent-assigned issues: agent is creator, assignee, or commenter
485
+ (type = 'IssuesEvent' AND (
486
+ json_extract_string(payload, '$.issue.user.login') IN ({identifier_list})
487
+ OR json_extract_string(payload, '$.issue.assignee.login') IN ({identifier_list})
488
+ OR EXISTS (
489
+ SELECT 1 FROM (SELECT UNNEST(json_extract(payload, '$.issue.assignees')) as a)
490
+ WHERE json_extract_string(a, '$.login') IN ({identifier_list})
491
+ )
492
+ OR SPLIT_PART(json_extract_string(repo, '$.name'), '/', 1) IN ({tracked_orgs_list})
493
+ ))
494
+ -- Issue comments: agent is commenter OR tracked org
495
+ OR (type = 'IssueCommentEvent' AND (
496
+ json_extract_string(payload, '$.comment.user.login') IN ({identifier_list})
497
+ OR SPLIT_PART(json_extract_string(repo, '$.name'), '/', 1) IN ({tracked_orgs_list})
498
+ ))
499
+ -- PRs: agent is creator OR tracked org (for wanted issue tracking)
500
+ OR (type = 'PullRequestEvent' AND (
501
+ json_extract_string(payload, '$.pull_request.user.login') IN ({identifier_list})
502
+ OR SPLIT_PART(json_extract_string(repo, '$.name'), '/', 1) IN ({tracked_orgs_list})
503
+ ))
504
+ -- Discussions: agent is creator AND tracked org
505
+ OR (type = 'DiscussionEvent'
506
+ AND json_extract_string(payload, '$.discussion.user.login') IN ({identifier_list})
507
+ AND SPLIT_PART(json_extract_string(repo, '$.name'), '/', 1) IN ({tracked_orgs_list})
508
+ )
509
+ )
510
  """
511
 
512
+ all_results = conn.execute(unified_query).fetchall()
513
+
514
+ # Post-process results to separate into different categories
515
+ # Row structure: [type, repo_name, repo_url, issue_url, issue_title, issue_number,
516
+ # issue_created_at, issue_closed_at, issue_labels, is_pull_request,
517
+ # issue_state_reason, issue_creator, issue_assignee, issue_assignees,
518
+ # commenter, pr_url, pr_creator, pr_merged_at, pr_body,
519
+ # discussion_url, discussion_creator, discussion_created_at,
520
+ # discussion_closed_at, discussion_state_reason, action]
521
+
522
+ issue_events = [] # For wanted tracking
523
+ pr_events = [] # For wanted tracking
524
+ discussion_events = [] # For discussion tracking
525
+ agent_issue_events = [] # For agent-assigned issues
526
+
527
+ for row in all_results:
528
+ event_type = row[0]
529
+ is_pr = row[9] # is_pull_request field
530
+
531
+ if event_type in ('IssuesEvent', 'IssueCommentEvent'):
532
+ if not is_pr: # It's an issue, not a PR
533
+ # Check if this is an agent-assigned issue
534
+ issue_creator = row[11]
535
+ issue_assignee = row[12]
536
+ issue_assignees_json = row[13]
537
+ commenter = row[14]
538
+
539
+ agent_identifier = None
540
+
541
+ if event_type == 'IssuesEvent':
542
+ # Check if issue creator, assignee, or any assignees match our identifiers
543
+ if issue_creator in identifier_set:
544
+ agent_identifier = issue_creator
545
+ elif issue_assignee in identifier_set:
546
+ agent_identifier = issue_assignee
547
+ else:
548
+ # Check assignees array
549
+ try:
550
+ if issue_assignees_json:
551
+ if isinstance(issue_assignees_json, str):
552
+ assignees_data = json.loads(issue_assignees_json)
553
+ else:
554
+ assignees_data = issue_assignees_json
555
+
556
+ if isinstance(assignees_data, list):
557
+ for assignee_obj in assignees_data:
558
+ if isinstance(assignee_obj, dict):
559
+ assignee_login = assignee_obj.get('login')
560
+ if assignee_login in identifier_set:
561
+ agent_identifier = assignee_login
562
+ break
563
+ except (json.JSONDecodeError, TypeError):
564
+ pass
565
+
566
+ elif event_type == 'IssueCommentEvent':
567
+ # Check if commenter is an agent
568
+ if commenter in identifier_set:
569
+ agent_identifier = commenter
570
+
571
+ # Add to appropriate list
572
+ if agent_identifier:
573
+ agent_issue_events.append((row, agent_identifier))
574
+
575
+ # Always add to issue_events for wanted tracking (if from tracked orgs)
576
+ issue_events.append(row)
577
+ else:
578
+ # It's a PR
579
+ pr_events.append(row)
580
+
581
+ elif event_type == 'PullRequestEvent':
582
+ pr_events.append(row)
583
+
584
+ elif event_type == 'DiscussionEvent':
585
+ discussion_events.append(row)
586
+
587
+ # Process agent-assigned issues
588
+ for row, agent_identifier in agent_issue_events:
589
+ # Row indices: repo_url=2, issue_url=3, issue_created_at=6, issue_closed_at=7, issue_state_reason=10
590
+ repo_url = row[2]
591
+ issue_url = row[3]
592
+ created_at = row[6]
593
+ closed_at = row[7]
594
+ state_reason = row[10]
595
+
596
+ if not issue_url or not agent_identifier:
597
+ continue
598
+
599
+ # Build full URL from repo_url if needed
600
+ if repo_url and '/issues/' not in issue_url:
601
+ issue_number = row[5]
602
+ full_url = f"{repo_url.replace('api.github.com/repos/', 'github.com/')}/issues/{issue_number}"
603
+ else:
604
+ full_url = issue_url
605
+
606
+ # Only include issues created within timeframe
607
+ if created_at:
608
+ try:
609
+ created_dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
610
+ if created_dt < start_date:
611
+ continue
612
+ except:
613
+ continue
614
+
615
+ # Deduplicate: only add if we haven't seen this issue for this agent
616
+ if full_url in agent_issue_urls[agent_identifier]:
617
+ continue
618
+
619
+ agent_issue_urls[agent_identifier].add(full_url)
620
 
621
+ issue_metadata = {
622
+ 'url': full_url,
623
+ 'created_at': normalize_date_format(created_at),
624
+ 'closed_at': normalize_date_format(closed_at) if closed_at else None,
625
+ 'state_reason': state_reason,
626
+ }
627
+
628
+ agent_issues[agent_identifier].append(issue_metadata)
629
+
630
+ # Process issues for wanted tracking
631
+ for row in issue_events:
632
+ # Row indices: repo_name=1, issue_url=3, issue_title=4, issue_number=5,
633
+ # issue_created_at=6, issue_closed_at=7, issue_labels=8
634
  repo_name = row[1]
635
+ issue_url = row[3]
636
+ title = row[4]
637
+ issue_number = row[5]
638
+ created_at = row[6]
639
+ closed_at = row[7]
640
+ labels_json = row[8]
641
 
642
  if not issue_url or not repo_name:
643
  continue
 
683
  'labels': label_names
684
  }
685
 
686
+ # Process PRs for wanted tracking
687
+ for row in pr_events:
688
+ # Row indices: pr_url=15, pr_creator=16, pr_merged_at=17, pr_body=18
689
+ pr_url = row[15]
690
+ pr_creator = row[16]
691
+ merged_at = row[17]
692
+ pr_body = row[18]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
693
 
694
  if not pr_url or not pr_creator:
695
  continue
 
716
  else:
717
  issue_to_prs[ref].add(pr_url)
718
 
719
+ # Process discussions
720
+ for row in discussion_events:
721
+ # Row indices: repo_name=1, discussion_url=19, discussion_creator=20,
722
+ # discussion_created_at=21, discussion_closed_at=22,
723
+ # discussion_state_reason=23, action=24
724
+ repo_name = row[1]
725
+ discussion_url = row[19]
726
+ discussion_creator = row[20]
727
+ discussion_created_at = row[21]
728
+ discussion_closed_at = row[22]
729
+ discussion_state_reason = row[23]
730
+ action = row[24]
731
+
732
+ if not discussion_url or not repo_name:
733
+ continue
734
+
735
+ # Extract org from repo_name
736
+ parts = repo_name.split('/')
737
+ if len(parts) != 2:
738
+ continue
739
+ org = parts[0]
740
+
741
+ # Filter by tracked orgs
742
+ if org not in TRACKED_ORGS:
743
+ continue
744
+
745
+ # Parse discussion creation date to filter by time window
746
+ created_dt = None
747
+ if discussion_created_at:
748
+ try:
749
+ created_dt = datetime.fromisoformat(discussion_created_at.replace('Z', '+00:00'))
750
+ # Only track discussions created on or after start_date
751
+ if created_dt < start_date:
752
+ continue
753
+ except:
754
+ continue
755
+
756
+ # Group by creator (agent identifier)
757
+ # Only track discussions from our agent identifiers
758
+ if discussion_creator not in identifier_set:
759
+ continue
760
+
761
+ # Determine discussion state
762
+ # A discussion is "resolved" if it has an answer chosen OR is marked answered
763
+ is_resolved = False
764
+ if discussion_closed_at:
765
+ is_resolved = True
766
+ elif discussion_state_reason and 'answered' in discussion_state_reason.lower():
767
+ is_resolved = True
768
+
769
+ # Store discussion metadata
770
+ discussion_meta = {
771
+ 'url': discussion_url,
772
+ 'repo': repo_name,
773
+ 'created_at': normalize_date_format(discussion_created_at),
774
+ 'closed_at': normalize_date_format(discussion_closed_at) if discussion_closed_at else None,
775
+ 'state': 'resolved' if is_resolved else 'open',
776
+ 'state_reason': discussion_state_reason
777
+ }
778
+
779
+ # Group by agent
780
+ if discussion_creator not in discussions_by_agent:
781
+ discussions_by_agent[discussion_creator] = []
782
+ discussions_by_agent[discussion_creator].append(discussion_meta)
783
 
784
+ print(f"✓ {len(agent_issue_events)} agent issues, {len(issue_events)} wanted issues, {len(pr_events)} PRs, {len(discussion_events)} discussions")
 
785
 
786
  except Exception as e:
787
  print(f"\n ✗ Batch {batch_num} error: {str(e)}")
788
  traceback.print_exc()
 
 
 
 
 
789
 
790
  # Move to next batch
791
  current_date = batch_end + timedelta(days=1)
 
862
  except:
863
  pass
864
 
865
+ print(f" ✓ Found {sum(len(issues) for issues in agent_issues.values())} agent-assigned issues across {len(agent_issues)} agents")
866
  print(f" ✓ Found {len(wanted_open)} long-standing open wanted issues")
867
  print(f" ✓ Found {sum(len(issues) for issues in wanted_resolved.values())} resolved wanted issues across {len(wanted_resolved)} agents")
868
+ print(f" ✓ Found {sum(len(discussions) for discussions in discussions_by_agent.values())} discussions across {len(discussions_by_agent)} agents")
869
 
870
  return {
871
+ 'agent_issues': dict(agent_issues),
872
  'wanted_open': wanted_open,
873
+ 'wanted_resolved': dict(wanted_resolved),
874
+ 'agent_discussions': dict(discussions_by_agent)
875
  }
876
 
877
 
 
1071
  }
1072
 
1073
 
1074
+ def calculate_discussion_stats_from_metadata(metadata_list):
1075
+ """Calculate statistics from a list of discussion metadata."""
1076
+ total_discussions = len(metadata_list)
1077
+ resolved = sum(1 for discussion_meta in metadata_list if discussion_meta.get('state') == 'resolved')
1078
+
1079
+ # Resolved rate = resolved / total * 100
1080
+ resolved_rate = (resolved / total_discussions * 100) if total_discussions > 0 else 0
1081
+
1082
+ return {
1083
+ 'total_discussions': total_discussions,
1084
+ 'resolved_discussions': resolved,
1085
+ 'discussion_resolved_rate': round(resolved_rate, 2),
1086
+ }
1087
+
1088
+
1089
+ def calculate_monthly_metrics_by_agent_discussions(all_discussions_dict, agents):
1090
+ """Calculate monthly metrics for discussions for all agents for visualization."""
1091
+ identifier_to_name = {agent.get('github_identifier'): agent.get('name') for agent in agents if agent.get('github_identifier')}
1092
+
1093
+ if not all_discussions_dict:
1094
+ return {'agents': [], 'months': [], 'data': {}}
1095
+
1096
+ agent_month_data = defaultdict(lambda: defaultdict(list))
1097
+
1098
+ for agent_identifier, metadata_list in all_discussions_dict.items():
1099
+ for discussion_meta in metadata_list:
1100
+ created_at = discussion_meta.get('created_at')
1101
+
1102
+ if not created_at:
1103
+ continue
1104
+
1105
+ agent_name = identifier_to_name.get(agent_identifier, agent_identifier)
1106
+
1107
+ try:
1108
+ dt = datetime.fromisoformat(created_at.replace('Z', '+00:00'))
1109
+ month_key = f"{dt.year}-{dt.month:02d}"
1110
+ agent_month_data[agent_name][month_key].append(discussion_meta)
1111
+ except Exception as e:
1112
+ print(f"Warning: Could not parse discussion date '{created_at}': {e}")
1113
+ continue
1114
+
1115
+ all_months = set()
1116
+ for agent_data in agent_month_data.values():
1117
+ all_months.update(agent_data.keys())
1118
+ months = sorted(list(all_months))
1119
+
1120
+ result_data = {}
1121
+ for agent_name, month_dict in agent_month_data.items():
1122
+ resolved_rates = []
1123
+ total_discussions_list = []
1124
+ resolved_discussions_list = []
1125
+
1126
+ for month in months:
1127
+ discussions_in_month = month_dict.get(month, [])
1128
+
1129
+ resolved_count = sum(1 for discussion in discussions_in_month if discussion.get('state') == 'resolved')
1130
+ total_count = len(discussions_in_month)
1131
+
1132
+ # Resolved rate = resolved / total * 100
1133
+ resolved_rate = (resolved_count / total_count * 100) if total_count > 0 else None
1134
+
1135
+ resolved_rates.append(resolved_rate)
1136
+ total_discussions_list.append(total_count)
1137
+ resolved_discussions_list.append(resolved_count)
1138
+
1139
+ result_data[agent_name] = {
1140
+ 'resolved_rates': resolved_rates,
1141
+ 'total_discussions': total_discussions_list,
1142
+ 'resolved_discussions': resolved_discussions_list
1143
+ }
1144
+
1145
+ agents_list = sorted(list(agent_month_data.keys()))
1146
+
1147
+ return {
1148
+ 'agents': agents_list,
1149
+ 'months': months,
1150
+ 'data': result_data
1151
+ }
1152
+
1153
+
1154
+ def construct_leaderboard_from_metadata(all_metadata_dict, agents, wanted_resolved_dict=None, discussions_dict=None):
1155
+ """Construct leaderboard from in-memory issue metadata and discussion metadata.
1156
 
1157
  Args:
1158
  all_metadata_dict: Dictionary mapping agent ID to list of issue metadata (agent-assigned issues)
1159
  agents: List of agent metadata
1160
  wanted_resolved_dict: Optional dictionary mapping agent ID to list of resolved wanted issues
1161
+ discussions_dict: Optional dictionary mapping agent ID to list of discussion metadata
1162
  """
1163
  if not agents:
1164
  print("Error: No agents found")
 
1167
  if wanted_resolved_dict is None:
1168
  wanted_resolved_dict = {}
1169
 
1170
+ if discussions_dict is None:
1171
+ discussions_dict = {}
1172
+
1173
  cache_dict = {}
1174
 
1175
  for agent in agents:
 
1182
  # Add wanted issues count
1183
  resolved_wanted = len(wanted_resolved_dict.get(identifier, []))
1184
 
1185
+ # Add discussion stats
1186
+ discussion_metadata = discussions_dict.get(identifier, [])
1187
+ discussion_stats = calculate_discussion_stats_from_metadata(discussion_metadata)
1188
+
1189
  cache_dict[identifier] = {
1190
  'name': agent_name,
1191
  'website': agent.get('website', 'N/A'),
1192
  'github_identifier': identifier,
1193
  **stats,
1194
+ 'resolved_wanted_issues': resolved_wanted,
1195
+ **discussion_stats
1196
  }
1197
 
1198
  return cache_dict
1199
 
1200
 
1201
+ def save_leaderboard_data_to_hf(leaderboard_dict, monthly_metrics, wanted_issues=None, discussion_monthly_metrics=None):
1202
+ """Save leaderboard data, monthly metrics, wanted issues, and discussion metrics to HuggingFace dataset."""
1203
  try:
1204
  token = get_hf_token()
1205
  if not token:
 
1210
  if wanted_issues is None:
1211
  wanted_issues = []
1212
 
1213
+ if discussion_monthly_metrics is None:
1214
+ discussion_monthly_metrics = {'agents': [], 'months': [], 'data': {}}
1215
+
1216
  combined_data = {
1217
  'metadata': {
1218
  'last_updated': datetime.now(timezone.utc).isoformat(),
 
1223
  },
1224
  'leaderboard': leaderboard_dict,
1225
  'monthly_metrics': monthly_metrics,
1226
+ 'wanted_issues': wanted_issues,
1227
+ 'discussion_monthly_metrics': discussion_monthly_metrics
1228
  }
1229
 
1230
  with open(LEADERBOARD_FILENAME, 'w') as f:
 
1288
  start_date = end_date - timedelta(days=LEADERBOARD_TIME_FRAME_DAYS)
1289
 
1290
  try:
1291
+ # USE UNIFIED STREAMING FUNCTION FOR ISSUES, WANTED, AND DISCUSSIONS
1292
+ results = fetch_all_metadata_streaming(
1293
  conn, identifiers, start_date, end_date
1294
  )
1295
 
1296
  agent_issues = results['agent_issues']
1297
  wanted_open = results['wanted_open']
1298
  wanted_resolved = results['wanted_resolved']
1299
+ agent_discussions = results['agent_discussions']
1300
 
1301
  except Exception as e:
1302
  print(f"Error during DuckDB fetch: {str(e)}")
 
1308
  print(f"\n[4/4] Saving leaderboard...")
1309
 
1310
  try:
1311
+ leaderboard_dict = construct_leaderboard_from_metadata(
1312
+ agent_issues, agents, wanted_resolved, agent_discussions
1313
+ )
1314
  monthly_metrics = calculate_monthly_metrics_by_agent(agent_issues, agents)
1315
+ discussion_monthly_metrics = calculate_monthly_metrics_by_agent_discussions(
1316
+ agent_discussions, agents
1317
+ )
1318
+ save_leaderboard_data_to_hf(
1319
+ leaderboard_dict, monthly_metrics, wanted_open, discussion_monthly_metrics
1320
+ )
1321
 
1322
  except Exception as e:
1323
  print(f"Error saving leaderboard: {str(e)}")