Qwen3-VL-Embedding-2B-FP8
This is an FP8 quantized version of Qwen/Qwen3-VL-Embedding-2B.
Quantization Details
| Component | Precision | Notes |
|---|---|---|
| Vision Encoder (ViT) | BF16 | Preserved for accuracy |
| LLM Decoder Layers | FP8 | Quantized for efficiency |
| Embeddings | BF16 | Preserved |
- Scheme: FP8_DYNAMIC
- Weights: FP8_E4M3 (per-channel quantization)
- Activations: Dynamic per-token quantization at runtime
- Tool: llm-compressor
- Calibration: None required (data-free quantization)
Creation
This model was quantized using llm-compressor:
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
# Load model
model = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-Embedding-2B",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
# FP8 quantization recipe (data-free)
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
ignore=[
"lm_head",
r"re:model\.visual\..*", # Keep vision encoder in BF16
]
)
# Apply quantization
oneshot(model=model, recipe=recipe)
# Save
model.save_pretrained("Qwen3-VL-Embedding-2B-FP8", save_compressed=True)
Usage
- Requirements
transformers>=4.57.0
qwen-vl-utils>=0.0.14
torch==2.8.0
llmcompressor==0.9.0.2
- Basic Example
from scripts.qwen3_vl_embedding import Qwen3VLEmbedder
import numpy as np
import torch
# Define a list of query texts
queries = [
{"text": "Visible embers scatter across the ground."}, # Fire prompt
{"text": "Routine scene with no disturbances."}, # Normal prompt
]
# Define a list of document (images, texts, videos)
documents = [
{"text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust."},
{"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},
{"video": "video.mp4"}
]
# Initialize the Qwen3VLEmbedder model
model_name_or_path = "PIA-SPACE-LAB/Qwen3-VL-Embedding-2B-FP8"
model = Qwen3VLEmbedder(model_name_or_path=model_name_or_path, max_frames=8, fps=8)
# Combine queries and documents into a single input list
inputs = queries + documents
# Process the inputs to get embeddings
embeddings = model.process(inputs)
# Compute similarity scores between query embeddings and document embeddings
similarity_scores = (embeddings[:4] @ embeddings[4:].T)
# Print out the similarity scores in a list format
print(similarity_scores.tolist())
For more usage examples, please visit our GitHub repository.
- Downloads last month
- 48
Model tree for PIA-SPACE-LAB/Qwen3-VL-Embedding-2B-FP8
Base model
Qwen/Qwen3-VL-2B-Instruct