IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning Paper • 2509.22621 • Published Sep 26 • 8
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks Paper • 2509.25671 • Published Sep 30 • 6
mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8 • 12
mmBERT: a modern multilingual encoder Collection mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9 • 49
On the Theoretical Limitations of Embedding-Based Retrieval Paper • 2508.21038 • Published Aug 28 • 20
Encoders vs Decoders: the Ettin Suite Collection A collection of SOTA, open-data, paired encoder-only and decoder only models ranging from 17M params to 1B. See the paper at https://arxiv.org/abs/250 • 32 items • Updated Jul 16 • 25
The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure Paper • 2506.22724 • Published Jun 28 • 10
Certified Mitigation of Worst-Case LLM Copyright Infringement Paper • 2504.16046 • Published Apr 22 • 13
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 26
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25 • 28
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Paper • 2502.13962 • Published Feb 19 • 28