OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation Paper • 2512.08294 • Published 2 days ago • 16
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 6 days ago • 36
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 9 days ago • 30
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published 14 days ago • 28
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21 • 21
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km Paper • 2510.09606 • Published Oct 10 • 17
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines Paper • 2509.21320 • Published Sep 25 • 101
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Paper • 2506.03106 • Published Jun 3 • 6
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? Paper • 2505.21374 • Published May 27 • 27
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Paper • 2505.21327 • Published May 27 • 83
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Paper • 2505.16938 • Published May 22 • 120
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper • 2505.17018 • Published May 22 • 15
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 26
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? Paper • 2410.01623 • Published Oct 2, 2024 • 4