Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 30 days ago • 104
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper • 2508.16763 • Published Aug 22 • 2
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning Paper • 2508.09804 • Published Aug 13
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 30 days ago • 104
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 30 days ago • 104
WebMMU Collection WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation • 2 items • Updated Sep 16 • 2
CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics Paper • 2506.08835 • Published Jun 10