Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout Paper • 2511.20649 • Published 12 days ago • 43
Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models Paper • 2506.09229 • Published Jun 10 • 5
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning Paper • 2510.02240 • Published Oct 2 • 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published Oct 14 • 1
GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer Paper • 2510.16136 • Published Oct 17 • 3
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21 • 64
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7 • 53
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections Paper • 2509.24817 • Published Sep 29 • 8
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control Paper • 2503.14492 • Published Mar 18 • 20
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Paper • 2506.09042 • Published Jun 10 • 2
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation Paper • 2510.04290 • Published Oct 5 • 16
DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing Paper • 2510.02253 • Published Oct 2 • 14
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5 • 4