Synthetic data derived from finepdfs
MultiSynt
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
MultiSynt is a collaborative initiative between OpenEuroLLM and EuroLLM focused on developing high-quality multilingual synthetic datasets for language model pretraining. By combining expertise from both organizations, MultiSynt aims to advance the creation of multilingual synthetic training data that supports diverse European languages to enable more inclusive AI development across languages.
models
32
MultiSynt/2B-1TT-tower9b-mixture
Updated
MultiSynt/2B-1TT-native-mixture
Updated
MultiSynt/nemotron-cc-portuguese-opus
Updated
•
19
MultiSynt/nemotron-cc-basque-opus
Updated
•
51
MultiSynt/nemotron-cc-polish-opus
Updated
•
13
MultiSynt/nemotron-cc-polish-tower9b
Updated
•
22
MultiSynt/nemotron-cc-french-opus
Updated
•
15
MultiSynt/nemotron-cc-french-tower9b
Updated
•
23
MultiSynt/nemotron-cc-norwegian-tower9b
Updated
•
49
MultiSynt/nemotron-cc-danish-opus
Updated
•
33
datasets
6
MultiSynt/MT-Nemotron-CC
Viewer
•
Updated
•
15.6B
•
414
MultiSynt/MT-HPLT2c
Viewer
•
Updated
•
1.76B
•
368
MultiSynt/MT-Reasoning
Viewer
•
Updated
•
82M
•
102
MultiSynt/MT-Reasoning-Prompts
Viewer
•
Updated
•
399M
•
258
MultiSynt/nemotron-cc-spanish-opus-qe
Viewer
•
Updated
•
3.29B
•
67
MultiSynt/finepdfs-summaries
Viewer
•
Updated
•
1.57B
•
301
•
1