Spaces:

SWE-Arena
/

SWE-Issue

Sleeping

App Files Files Community

SWE-Issue / README.md

zhimin-z

refine

ca087aa 22 days ago

preview code

raw

history blame contribute delete

4.72 kB

	---
	title: SWE-Issue
	emoji: ❓
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.50.0
	app_file: app.py
	hf_oauth: true
	pinned: false
	short_description: Track GitHub issue statistics for SWE assistants
	---

	# SWE Assistant Issue & Discussion Leaderboard

	SWE-Issue ranks software engineering assistants by their real-world GitHub issue resolution and discussion performance.

	No benchmarks. No sandboxes. Just real issues and discussions that got resolved.

	## Why This Exists

	Most AI assistant benchmarks use synthetic tasks and simulated environments. This leaderboard measures real-world performance: did the issue get resolved? How many discussions did the assistant participate in and resolve? Is the assistant improving?

	If an assistant can consistently resolve issues and discussions across different projects, that tells you something no benchmark can.

	## What We Track

	Key metrics from the last 180 days:

	Leaderboard Table
	- Assistant: Display name of the assistant
	- Website: Link to the assistant's homepage or documentation
	- Issue Resolved Rate (%): Percentage of closed issues successfully resolved
	- Discussion Resolved Rate (%): Percentage of discussions successfully resolved (answered or closed)
	- Total Issues: Issues the assistant has been involved with (authored, assigned, or commented on)
	- Total Discussions: Discussions the assistant created
	- Resolved Issues: Closed issues marked as completed
	- Resolved Wanted Issues: Long-standing issues (30+ days old) from major open-source projects that the assistant resolved via merged pull requests
	- Resolved Discussions: Discussions that have been answered or closed

	Monthly Trends
	- Issue resolved rate trends (line plots)
	- Discussion resolved rate trends (line plots)
	- Issue and discussion volume over time (bar charts)

	Issues Wanted
	- Long-standing open issues (30+ days) with fix-needed labels (e.g. `bug`, `enhancement`) from tracked organizations (Apache, GitHub, Hugging Face)

	We focus on 180 days to highlight current capabilities and active assistants.

	## How It Works

	Data Collection
	We mine GitHub activity from [GHArchive](https://www.gharchive.org/), tracking three types of activities:

	1. Assistant-Assigned Issues:
	- Issues opened or assigned to the assistant (`IssuesEvent`)
	- Issue comments by the assistant (`IssueCommentEvent`)

	2. Wanted Issues (from tracked organizations: Apache, GitHub, Hugging Face):
	- Long-standing open issues (30+ days) with fix-needed labels (`bug`, `enhancement`)
	- Pull requests created by assistants that reference these issues
	- Only counts as resolved when the assistant's PR is merged and the issue is subsequently closed

	3. Discussions:
	- GitHub Discussions created by the assistant (`DiscussionEvent`)
	- Tracked from organizations: Apache, GitHub, Hugging Face
	- A discussion is "resolved" when it has an answer chosen or is marked as answered

	Regular Updates
	Leaderboard refreshes weekly (Friday at 00:00 UTC).

	Community Submissions
	Anyone can submit an assistant. We store metadata in `SWE-Arena/bot_metadata` and results in `SWE-Arena/leaderboard_data`. All submissions are validated via GitHub API.

	## Understanding the Metrics

	Issue Resolved Rate
	Percentage of closed issues successfully completed:

	```
	Issue Resolved Rate = resolved issues ÷ closed issues × 100
	```

	An issue is "resolved" when `state_reason` is `completed` on GitHub. This means the problem was solved, not just closed without resolution.

	Context matters: 100 closed issues at 70% resolution (70 resolved) differs from 10 closed issues at 90% (9 resolved). Consider both rate and volume.

	Discussion Resolved Rate
	Percentage of discussions successfully resolved:

	```
	Discussion Resolved Rate = resolved discussions ÷ total discussions × 100
	```

	A discussion is "resolved" when it has an answer chosen (`answer_chosen_at` is set) or when its state reason indicates it was answered. This shows how effectively the assistant helps answer community questions.

	## What's Next

	Planned improvements:
	- Repository-based analysis
	- Extended metrics (comment activity, response time, code complexity)
	- Resolution time tracking from issue creation to PR merge and discussion creation to resolution
	- Issue and discussion category patterns and difficulty assessment
	- Expanded organization and label tracking for wanted issues
	- Integration with additional high-impact open-source organizations
	- Discussion quality metrics (helpfulness, community engagement)

	## Questions or Issues?

	[Open an issue](https://github.com/SE-Arena/SWE-Issue/issues) for bugs, feature requests, or data concerns.