| --- |
| base_model: Qwen/Qwen2-VL-7B-Instruct |
| language: |
| - en |
| library_name: peft |
| license: mit |
| tags: |
| - LLM |
| - VLM |
| - Embedding |
| - Multimodal |
| pipeline_tag: image-text-to-text |
| --- |
| |
| ```markdown |
| ## Model Details |
| |
| Instruction finetuned adapter for ABC: Acheiving Better Control of Multiomodal Embeddings using VLMs. |
| |
| ### Model Sources |
| |
| This model is trained on top of Qwen2VL-Instruct. |
| |
| ### Paper and Website |
| |
| For more information, please refer to [Website](https://tiger-ai-lab.github.io/ABC/). |
| |
| ## Citation |
| |
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| ``` |
| @misc{schneider2025abcachievingbettercontrol, |
| title={ABC: Achieving Better Control of Multimodal Embeddings using VLMs}, |
| author={Benjamin Schneider and Florian Kerschbaum and Wenhu Chen}, |
| year={2025}, |
| eprint={2503.00329}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2503.00329}, |
| } |
| ``` |
| ``` |