|
|
--- |
|
|
title: SWE-Issue |
|
|
emoji: ❓ |
|
|
colorFrom: blue |
|
|
colorTo: indigo |
|
|
sdk: gradio |
|
|
sdk_version: 5.50.0 |
|
|
app_file: app.py |
|
|
hf_oauth: true |
|
|
pinned: false |
|
|
short_description: Track GitHub issue statistics for SWE assistants |
|
|
--- |
|
|
|
|
|
# SWE Assistant Issue & Discussion Leaderboard |
|
|
|
|
|
SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance. |
|
|
|
|
|
No benchmarks. No sandboxes. Just real issues and discussions that got resolved. |
|
|
|
|
|
## Why This Exists |
|
|
|
|
|
Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving? |
|
|
|
|
|
If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can. |
|
|
|
|
|
## What We Track |
|
|
|
|
|
Key metrics from the last 180 days: |
|
|
|
|
|
**Leaderboard Table** |
|
|
- **Assistant**: Display name of the assistant |
|
|
- **Website**: Link to the assistant's homepage or documentation |
|
|
- **Issue Resolved Rate (%)**: Percentage of closed issues successfully resolved |
|
|
- **Discussion Resolved Rate (%)**: Percentage of discussions successfully resolved (answered or closed) |
|
|
- **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on) |
|
|
- **Total Discussions**: Discussions the assistant created |
|
|
- **Resolved Issues**: Closed issues marked as completed |
|
|
- **Resolved Wanted Issues**: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests |
|
|
- **Resolved Discussions**: Discussions that have been answered or closed |
|
|
|
|
|
**Monthly Trends** |
|
|
- Issue resolved rate trends (line plots) |
|
|
- Discussion resolved rate trends (line plots) |
|
|
- Issue and discussion volume over time (bar charts) |
|
|
|
|
|
**Issues Wanted** |
|
|
- Long-standing open issues (30+ days) with fix-needed labels (e.g. `bug`, `enhancement`) from tracked organizations (Apache, GitHub, Hugging Face) |
|
|
|
|
|
We focus on 180 days to highlight current capabilities and active assistants. |
|
|
|
|
|
## How It Works |
|
|
|
|
|
**Data Collection** |
|
|
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking three types of activities: |
|
|
|
|
|
1. **Assistant-Assigned Issues**: |
|
|
- Issues opened or assigned to the assistant (`IssuesEvent`) |
|
|
- Issue comments by the assistant (`IssueCommentEvent`) |
|
|
|
|
|
2. **Wanted Issues** (from tracked organizations: Apache, GitHub, Hugging Face): |
|
|
- Long-standing open issues (30+ days) with fix-needed labels (`bug`, `enhancement`) |
|
|
- Pull requests created by assistants that reference these issues |
|
|
- Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed |
|
|
|
|
|
3. **Discussions**: |
|
|
- GitHub Discussions created by the assistant (`DiscussionEvent`) |
|
|
- Tracked from organizations: Apache, GitHub, Hugging Face |
|
|
- A discussion is "resolved" when it has an answer chosen or is marked as answered |
|
|
|
|
|
**Regular Updates** |
|
|
Leaderboard refreshes weekly (Friday at 00:00 UTC). |
|
|
|
|
|
**Community Submissions** |
|
|
Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_metadata` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API. |
|
|
|
|
|
## Understanding the Metrics |
|
|
|
|
|
**Issue Resolved Rate** |
|
|
Percentage of closed issues successfully completed: |
|
|
|
|
|
``` |
|
|
Issue Resolved Rate = resolved issues ÷ closed issues × 100 |
|
|
``` |
|
|
|
|
|
An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution. |
|
|
|
|
|
Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume. |
|
|
|
|
|
**Discussion Resolved Rate** |
|
|
Percentage of discussions successfully resolved: |
|
|
|
|
|
``` |
|
|
Discussion Resolved Rate = resolved discussions ÷ total discussions × 100 |
|
|
``` |
|
|
|
|
|
A discussion is "resolved" when it has an answer chosen (`answer_chosen_at` is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions. |
|
|
|
|
|
## What's Next |
|
|
|
|
|
Planned improvements: |
|
|
- Repository-based analysis |
|
|
- Extended metrics (comment activity, response time, code complexity) |
|
|
- Resolution time tracking from issue creation to PR merge and discussion creation to resolution |
|
|
- Issue and discussion category patterns and difficulty assessment |
|
|
- Expanded organization and label tracking for wanted issues |
|
|
- Integration with additional high-impact open-source organizations |
|
|
- Discussion quality metrics (helpfulness, community engagement) |
|
|
|
|
|
## Questions or Issues? |
|
|
|
|
|
[Open an issue](https://github.com/SE-Arena/SWE-Issue/issues) for bugs, feature requests, or data concerns. |
|
|
|