You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Voxtral Mini Realtime FP8 Runtime Package

This repository packages mistralai/Voxtral-Mini-4B-Realtime-2602 with the exact vLLM serving configuration used for the reported benchmark results. The model is served with vLLM runtime FP8 quantization and FP8 E4M3 KV cache.

The repository root includes consolidated.safetensors, so the serving config resolves the model locally:

vllm serve --config vllm_config.yaml

Single-command benchmark reproduction after cloning:

bash reproduce.sh

The reproduction script serves this package with vllm_config.yaml, runs the configured FLEURS benchmark slices, records energy, and writes benchmark JSON files under reports/.

Package Contents

Base model: mistralai/Voxtral-Mini-4B-Realtime-2602
Base revision: 2769294da9567371363522aac9bbcfdd19447add
Packaged weights: consolidated.safetensors
Serving config: vllm_config.yaml
Local model path in serving config: .
Runtime quantization: fp8
KV cache dtype: fp8_e4m3
Max model length: 4096
Benchmark policy: --language-hint-mode fleurs_primary --empty-retry-count 2
VAD trimming: disabled

This is a runtime-quantized serving package. The checkpoint weights are the packaged BF16 base weights; compression is applied by the pinned vLLM runtime configuration.

Reported Results

Every value in this table is cross-referenced through reports/claimed_results.json and the committed benchmark reports in reports/.

Language	Samples	Metric	Value	95% CI low	95% CI high	Energy
English (`en_us`)	500	normalized WER	6.1456%	5.4996%	6.7794%	189,442.10 J
French (`fr_fr`)	100	normalized WER	8.4548%	6.7809%	10.2486%	37,882.64 J
Hindi (`hi_in`)	100	normalized WER	25.4309%	22.4806%	28.6336%	44,502.93 J
Japanese (`ja_jp`)	100	no-space CER	7.0919%	5.5534%	8.6900%	73,906.48 J

reports/claimed_results.json lists the source report file for each row in the table.

Energy Summary

Runtime-FP8 total energy across the reported slices: 345,734.14 J
BF16 reference total energy under the same benchmark policy: 474,614.96 J
Measured energy reduction: 27.15%

These values are derived in reports/claimed_results.json from the FP8 reports and the BF16 reference reports committed in reports/.

Reproduction

Default full reproduction:

bash reproduce.sh

Expected output: language-specific benchmark JSON files written under reports/.

Quick smoke run:

RUN_SLICES="en_us:1:packaged_smoke_en1" DOWNLOAD_MODEL=0 MODEL_DIR=/path/to/voxtral bash reproduce.sh

Useful environment overrides:

SKIP_INSTALL=1 bash reproduce.sh
INSTALL_VLLM=0 bash reproduce.sh
BASE_PORT=8200 bash reproduce.sh
MODEL_DIR=/path/to/local/voxtral bash reproduce.sh

Before running benchmarks, the script verifies the committed claims:

python scripts/verify_claimed_reports.py --reports-dir reports --claims reports/claimed_results.json

That check fails if reported values drift from the committed JSON reports.

Logs

Server logs from the reported FP8 runs are included under logs/.

Notes

Energy measurements are hardware- and harness-dependent; the reported values are tied to the committed benchmark reports.
The benchmark uses FLEURS primary language hints: en_us -> en, fr_fr -> fr, hi_in -> hi, and ja_jp -> ja.
Prefix caching is enabled in vllm_config.yaml, but the reported results do not attribute any efficiency gain to prefix-cache reuse.

Downloads last month: 111

Model tree for Shankara-A-S/voxtral-mini-realtime-fp8-runtime

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Finetuned

(19)

this model