multimodal llms Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 19 jadechoghari/Ferret-UI-Gemma2b Image-Text-to-Text • 3B • Updated Oct 18, 2024 • 305 • 50
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 19
multimodal llms Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 19 jadechoghari/Ferret-UI-Gemma2b Image-Text-to-Text • 3B • Updated Oct 18, 2024 • 305 • 50
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding Paper • 2306.02858 • Published Jun 5, 2023 • 19