TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models
Paper
• 2512.02014 • Published
• 73
None defined yet.
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs