SWE-Issue / README.md
zhimin-z
refine
ca087aa
---
title: SWE-Issue
emoji:
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
hf_oauth: true
pinned: false
short_description: Track GitHub issue statistics for SWE assistants
---
# SWE Assistant Issue & Discussion Leaderboard
SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance.
No benchmarks. No sandboxes. Just real issues and discussions that got resolved.
## Why This Exists
Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving?
If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can.
## What We Track
Key metrics from the last 180 days:
**Leaderboard Table**
- **Assistant**: Display name of the assistant
- **Website**: Link to the assistant's homepage or documentation
- **Issue Resolved Rate (%)**: Percentage of closed issues successfully resolved
- **Discussion Resolved Rate (%)**: Percentage of discussions successfully resolved (answered or closed)
- **Total Issues**: Issues the assistant has been involved with (authored, assigned, or commented on)
- **Total Discussions**: Discussions the assistant created
- **Resolved Issues**: Closed issues marked as completed
- **Resolved Wanted Issues**: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
- **Resolved Discussions**: Discussions that have been answered or closed
**Monthly Trends**
- Issue resolved rate trends (line plots)
- Discussion resolved rate trends (line plots)
- Issue and discussion volume over time (bar charts)
**Issues Wanted**
- Long-standing open issues (30+ days) with fix-needed labels (e.g. `bug`, `enhancement`) from tracked organizations (Apache, GitHub, Hugging Face)
We focus on 180 days to highlight current capabilities and active assistants.
## How It Works
**Data Collection**
We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking three types of activities:
1. **Assistant-Assigned Issues**:
- Issues opened or assigned to the assistant (`IssuesEvent`)
- Issue comments by the assistant (`IssueCommentEvent`)
2. **Wanted Issues** (from tracked organizations: Apache, GitHub, Hugging Face):
- Long-standing open issues (30+ days) with fix-needed labels (`bug`, `enhancement`)
- Pull requests created by assistants that reference these issues
- Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed
3. **Discussions**:
- GitHub Discussions created by the assistant (`DiscussionEvent`)
- Tracked from organizations: Apache, GitHub, Hugging Face
- A discussion is "resolved" when it has an answer chosen or is marked as answered
**Regular Updates**
Leaderboard refreshes weekly (Friday at 00:00 UTC).
**Community Submissions**
Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_metadata` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.
## Understanding the Metrics
**Issue Resolved Rate**
Percentage of closed issues successfully completed:
```
Issue Resolved Rate = resolved issues ÷ closed issues × 100
```
An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.
Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.
**Discussion Resolved Rate**
Percentage of discussions successfully resolved:
```
Discussion Resolved Rate = resolved discussions ÷ total discussions × 100
```
A discussion is "resolved" when it has an answer chosen (`answer_chosen_at` is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions.
## What's Next
Planned improvements:
- Repository-based analysis
- Extended metrics (comment activity, response time, code complexity)
- Resolution time tracking from issue creation to PR merge and discussion creation to resolution
- Issue and discussion category patterns and difficulty assessment
- Expanded organization and label tracking for wanted issues
- Integration with additional high-impact open-source organizations
- Discussion quality metrics (helpfulness, community engagement)
## Questions or Issues?
[Open an issue](https://github.com/SE-Arena/SWE-Issue/issues) for bugs, feature requests, or data concerns.