video-SALMONN 2 Collection video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. • 11 items • Updated Mar 21 • 1
JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments Paper • 2602.18527 • Published Feb 20 • 2
JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments Paper • 2602.18527 • Published Feb 20 • 2