Latent Diffusion Model without Variational Autoencoder Paper β’ 2510.15301 β’ Published Oct 17 β’ 48
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper β’ 2510.15742 β’ Published Oct 17 β’ 50
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper β’ 2510.05684 β’ Published Oct 7 β’ 141
Lynx: Towards High-Fidelity Personalized Video Generation Paper β’ 2509.15496 β’ Published Sep 19 β’ 12
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching Paper β’ 2506.23552 β’ Published Jun 30 β’ 11
Running on Zero MCP Featured 314 Chain-of-Zoom π 314 Extreme Super-Resolution via Scale Autoregression
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper β’ 2506.08279 β’ Published Jun 9 β’ 27
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper β’ 2506.08279 β’ Published Jun 9 β’ 27 β’ 2
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper β’ 2506.08279 β’ Published Jun 9 β’ 27
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper β’ 2504.00557 β’ Published Apr 1 β’ 15 β’ 2
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper β’ 2504.00557 β’ Published Apr 1 β’ 15
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation Paper β’ 2503.09641 β’ Published Mar 12 β’ 41
SANA-Sprint Collection πSANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation β’ 6 items β’ Updated Sep 13 β’ 43
Running Featured 606 The Tokenizer Playground π 606 Experiment with and compare different tokenizers
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper β’ 2412.17739 β’ Published Dec 23, 2024 β’ 41
FastVLM: Efficient Vision Encoding for Vision Language Models Paper β’ 2412.13303 β’ Published Dec 17, 2024 β’ 72