ICCV2023

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

kaanakan authored a paper 6 days ago

Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

ldkong authored a paper about 1 month ago

3EED: Ground Everything Everywhere in 3D

juliemdc authored a paper about 1 month ago

T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning

View all activity

kaanakan

authored a paper 6 days ago

Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

Paper • 2511.20649 • Published 12 days ago • 43

ldkong

authored a paper about 1 month ago

3EED: Ground Everything Everywhere in 3D

Paper • 2511.01755 • Published Nov 3 • 10

mpark

authored 2 papers about 1 month ago

Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models

Paper • 2506.09229 • Published Jun 10 • 5

ACG: Action Coherence Guidance for Flow-based VLA models

Paper • 2510.22201 • Published Oct 25 • 36

ldkong

authored 3 papers about 1 month ago

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Paper • 2510.02240 • Published Oct 2 • 17

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23 • 55

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Paper • 2510.12422 • Published Oct 14 • 1

BryanW

authored a paper about 1 month ago

From Masks to Worlds: A Hitchhiker's Guide to World Models

Paper • 2510.20668 • Published Oct 23 • 6

sayandsarkar

authored a paper about 2 months ago

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Paper • 2510.16136 • Published Oct 17 • 3

BryanW

authored 2 papers about 2 months ago

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Paper • 2506.17612 • Published Jun 21 • 64

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7 • 53

Yuliang

authored 3 papers about 2 months ago

TTT3R: 3D Reconstruction as Test-Time Training

Paper • 2509.26645 • Published Sep 30 • 14

Human3R: Everyone Everywhere All at Once

Paper • 2510.06219 • Published Oct 7 • 10

UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections

Paper • 2509.24817 • Published Sep 29 • 8

jayw

authored 4 papers 2 months ago

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Paper • 2503.14492 • Published Mar 18 • 20

Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models

Paper • 2506.09042 • Published Jun 10 • 2

CVPR 2023 Text Guided Video Editing Competition

Paper • 2310.16003 • Published Oct 24, 2023

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

Paper • 2510.04290 • Published Oct 5 • 16

Shilin-LU

authored 2 papers 2 months ago

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing

Paper • 2510.02253 • Published Oct 2 • 14

Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

Paper • 2508.03404 • Published Aug 5 • 4