Update README.md
Browse files
README.md
CHANGED
|
@@ -11,21 +11,22 @@ We are thrilled to introduce Seed-Coder, a powerful, transparent, and parameter-
|
|
| 11 |
- Transparent: We openly share detailed insights into our model-centric data pipeline, including methods for curating GitHub data, commits data, and code-related web data.
|
| 12 |
- Powerful: Seed-Coder achieves state-of-the-art performance among open-source models of comparable size across a diverse range of coding tasks.
|
| 13 |
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## Highlight
|
| 16 |
|
| 17 |
|
| 18 |
-
|
| 19 |
-
- Pretrained on a
|
| 20 |
-
- Excels at
|
| 21 |
-
- Robust performance across
|
| 22 |
-
-
|
| 23 |
|
| 24 |
Seed-Coder-8B-Base serves as the foundation for Seed-Coder-8B-Instruct and Seed-Coder-8B-reasoning.
|
| 25 |
|
| 26 |
-
|
| 27 |
-
<img width="100%" src="imgs/seed-coder_intro_performance.jpg">
|
| 28 |
-
</p>
|
| 29 |
|
| 30 |
## Model Downloads
|
| 31 |
| Model Name | Length | Download | Notes |
|
|
|
|
| 11 |
- Transparent: We openly share detailed insights into our model-centric data pipeline, including methods for curating GitHub data, commits data, and code-related web data.
|
| 12 |
- Powerful: Seed-Coder achieves state-of-the-art performance among open-source models of comparable size across a diverse range of coding tasks.
|
| 13 |
|
| 14 |
+
<p align="center">
|
| 15 |
+
<img width="100%" src="imgs/seed-coder_intro_performance.jpg">
|
| 16 |
+
</p>
|
| 17 |
|
| 18 |
## Highlight
|
| 19 |
|
| 20 |
|
| 21 |
+
Seed-Coder-8B-Base is an 8-billion-parameter foundation model tailored for code understanding and generation. It is designed to provide developers with a powerful, general-purpose code model capable of handling a wide range of coding tasks. It features:
|
| 22 |
+
- Pretrained on a massively curated corpus, filtered using **LLM-based techniques** to ensure high-quality real-world code, resulting in cleaner and more effective learning signals.
|
| 23 |
+
- Excels at code completion and supports Fill-in-the-Middle (FIM) tasks, enabling it to predict missing code spans given partial contexts.
|
| 24 |
+
- Robust performance across various programming languages, making it ideal for downstream finetuning or direct use in code generation systems.
|
| 25 |
+
- Long-context support up to 32K tokens, enabling it to handle large codebases, multi-file projects, and extended editing tasks.
|
| 26 |
|
| 27 |
Seed-Coder-8B-Base serves as the foundation for Seed-Coder-8B-Instruct and Seed-Coder-8B-reasoning.
|
| 28 |
|
| 29 |
+
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Model Downloads
|
| 32 |
| Model Name | Length | Download | Notes |
|