Update README.md
Browse files
README.md
CHANGED
|
@@ -81,4 +81,18 @@ The pre-training and fine-tuning were conducted on 512 NVIDIA Ampere (64GB) GPUs
|
|
| 81 |
|Multi-layer loss | yes |
|
| 82 |
|
| 83 |
## Licence
|
| 84 |
-
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|Multi-layer loss | yes |
|
| 82 |
|
| 83 |
## Licence
|
| 84 |
+
The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement).
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
# Citation
|
| 88 |
+
```
|
| 89 |
+
@article{gurioli2025modeltrainallhierarchical,
|
| 90 |
+
title={One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings},
|
| 91 |
+
author={Andrea Gurioli and Federico Pennino and João Monteiro and Maurizio Gabbrielli},
|
| 92 |
+
year={2025},
|
| 93 |
+
eprint={2503.03008},
|
| 94 |
+
archivePrefix={arXiv},
|
| 95 |
+
primaryClass={cs.CL},
|
| 96 |
+
url={https://arxiv.org/abs/2503.03008},
|
| 97 |
+
}
|
| 98 |
+
```
|