Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2411.10440

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 86
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 156
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Jul 21, 2025 • 131

The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5, 2025
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 134

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Paper • 2501.04686 • Published Jan 8, 2025 • 53
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 23, 2025 • 41
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Jul 21, 2025 • 131
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26, 2025 • 47

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 31
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

Multimodal Reasoning

A collection for Multimodal Reasoning Models and Benchmarks.

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published Feb 22, 2025 • 18
rippleripple/MMIR

Viewer • Updated Feb 25, 2025 • 534 • 118 • 2
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Jul 21, 2025 • 131
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Paper • 2502.09620 • Published Feb 13, 2025 • 26
The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5, 2025

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 31
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 86
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 156
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Jul 21, 2025 • 131

Multimodal Reasoning

A collection for Multimodal Reasoning Models and Benchmarks.

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published Feb 22, 2025 • 18
rippleripple/MMIR

Viewer • Updated Feb 25, 2025 • 534 • 118 • 2
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Jul 21, 2025 • 131
GRIT: Teaching MLLMs to Think with Images

Paper • 2505.15879 • Published May 21, 2025 • 13

The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5, 2025
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 134

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Paper • 2502.09620 • Published Feb 13, 2025 • 26
The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 104
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5, 2025

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Paper • 2501.04686 • Published Jan 8, 2025 • 53
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 23, 2025 • 41
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Jul 21, 2025 • 131
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26, 2025 • 47

Reasoning, Thinking, RL and Test-Time Scaling

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

Previous
1
2
3
...
5
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs