CoVT: Chain-of-Visual-Thought
Collection
Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated
• 6
Checkpoint of https://huggingface.co/papers/2511.19418.
This CoVT checkpoint is aligned with 4 Depth tokens.
These task-specific tokens are integrated into the model’s embedding space to enhance depth-awareness.