Qwen3-VL-Embedding-8B-GGUF

GGUF conversion artifacts for Qwen/Qwen3-VL-Embedding-8B, prepared for local llama.cpp / mtmd usage.

Files

  • Qwen3-VL-Embedding-8B-Q4_K_M.gguf
  • mmproj-Qwen3-VL-Embedding-8B-f16.gguf

Source Model

This repository only contains converted GGUF artifacts for local inference workflows.

Notes

  • The main embedding model is provided as Q4_K_M.
  • The multimodal projector is provided as f16.
  • These files are intended for llama.cpp-based embedding pipelines that support Qwen3-VL multimodal embedding.

Example Usage

With llama-server:

./llama-server \
  --model Qwen3-VL-Embedding-8B-Q4_K_M.gguf \
  --mmproj mmproj-Qwen3-VL-Embedding-8B-f16.gguf \
  --embeddings \
  --pooling last \
  --ctx-size 8192

With a native llama.cpp embedding integration, load both files and use the model for text and image embeddings.

Attribution

If you use these GGUF files, please review and follow the original model card, license, and any upstream usage guidance.

Downloads last month
80
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lainsoykaf/Qwen3-VL-Embedding-8B-GGUF

Quantized
(17)
this model