VLMS
updated
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text
• 13B • Updated
• 587
• 37
Image-to-Text
• Updated
• 27
Image-Text-to-Text
• 1B • Updated
• 666
• 57
ViGoR: Improving Visual Grounding of Large Vision Language Models with
Fine-Grained Reward Modeling
Paper
• 2402.06118
• Published
• 15
LEGO:Language Enhanced Multi-modal Grounding Model
Paper
• 2401.06071
• Published
• 12
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
Models
Paper
• 2403.18814
• Published
• 48
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal
Language Models
Paper
• 2403.16999
• Published
• 5
Salesforce/instructblip-vicuna-7b
Image-Text-to-Text
• 8B • Updated
• 15.2k
• 99
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published
• 33
List Items One by One: A New Data Source and Learning Paradigm for
Multimodal LLMs
Paper
• 2404.16375
• Published
• 18
Needle In A Multimodal Haystack
Paper
• 2406.07230
• Published
• 54