VLM-FO1-Models A collection of VLM-FO1 models omlab/VLM-FO1_Qwen2.5-VL-3B-v01 Object Detection • 4B • Updated 14 days ago • 1.98k • 13
VLM-R1-models A collection of VLM-R1 Models omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps Zero-Shot Object Detection • 4B • Updated Apr 14 • 662 • 23 omlab/VLM-R1-Qwen2.5VL-3B-Math-0305 Visual Question Answering • 4B • Updated Apr 14 • 107 • 8 omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321 Image-Text-to-Text • 4B • Updated Jul 18 • 162 • 24
Remote Sensing Referring Expression Understanding REU task for RS. omlab/VRSBench-FS Viewer • Updated Oct 2 • 16.6k • 93 omlab/NWPU-FS Viewer • Updated Oct 2 • 39 • 29 omlab/EarthReason-FS Viewer • Updated Oct 2 • 3.39k • 55 omlab/Cross_DIOR-RSVG Viewer • Updated Oct 2 • 7.42k • 24
Multimodal Research ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2 OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2 VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2
VLM-FO1-Models A collection of VLM-FO1 models omlab/VLM-FO1_Qwen2.5-VL-3B-v01 Object Detection • 4B • Updated 14 days ago • 1.98k • 13
Remote Sensing Referring Expression Understanding REU task for RS. omlab/VRSBench-FS Viewer • Updated Oct 2 • 16.6k • 93 omlab/NWPU-FS Viewer • Updated Oct 2 • 39 • 29 omlab/EarthReason-FS Viewer • Updated Oct 2 • 3.39k • 55 omlab/Cross_DIOR-RSVG Viewer • Updated Oct 2 • 7.42k • 24
VLM-R1-models A collection of VLM-R1 Models omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps Zero-Shot Object Detection • 4B • Updated Apr 14 • 662 • 23 omlab/VLM-R1-Qwen2.5VL-3B-Math-0305 Visual Question Answering • 4B • Updated Apr 14 • 107 • 8 omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321 Image-Text-to-Text • 4B • Updated Jul 18 • 162 • 24
Multimodal Research ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2 OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2 VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2