view post Post 361 PatchDNA, a DNA foundation model based on Meta's BLT tokenization strategy https://www.biorxiv.org/content/10.1101/2025.11.28.691095v1 See translation 🚀 1 1 + Reply
view post Post 2458 MLEB is the largest, most diverse, and most comprehensive benchmark for legal text embedding models. https://huggingface.co/blog/isaacus/introducing-mleb See translation 🚀 5 5 🔥 4 4 ❤️ 4 4 ➕ 3 3 🤗 3 3 😎 3 3 🧠 3 3 🤯 3 3 + Reply
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring Paper • 2501.02045 • Published Jan 3 • 23
view post Post 455 Bio LLMs train on many genomes, but can we encode differences within a species? TomatoTomato adds pangenome tokens to represent a domestic tomato and a wild tomato in one sequence 🍅 🧬 monsoon-nlp/tomatotomato-gLM2-150M-v0.1 See translation 🚀 1 1 + Reply
view post Post 7060 We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago! See translation 6 replies · 🚀 18 18 👍 9 9 🔥 6 6 + Reply