Safetensors
mistral-common
voxtral
vllm
Eval Results

model support per CrispASR — pure C++ inference with GGUF quantisation (no Python needed)

#48
by cstr - opened

We've built a complete C++ runtime for Voxtral-Mini-3B in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no PyTorch, no pip install.

What works:

  • Full transcription pipeline (mel → Whisper encoder → Mistral 3B LLM decode)
  • Q4_K / Q5_0 / Q8_0 / F16 quantisation (2.5 GB Q4_K vs 6+ GB BF16)
  • Word-level timestamps via CTC forced alignment (-am canary-ctc-aligner.gguf or -am qwen3-forced-aligner.gguf)
  • Temperature sampling + best-of-N decoding (--best-of 5 -tp 0.3)
  • Streaming from mic/stdin (--stream, --mic, --live)
  • Audio Q&A mode (--ask "What language is this?" — voxtral 3B is a full audio LLM, not just ASR)
  • Speech translation (--translate -tl de)
  • Speaker diarisation, language ID, SRT/VTT/JSON output
  • GPU acceleration via CUDA / Metal / Vulkan (ggml backends)

Quick start:

# Build
git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8

# Auto-download model and transcribe
./build/bin/crispasr --backend voxtral -m auto -f audio.wav

# Or use pre-quantised GGUF from HF
./build/bin/crispasr -m voxtral-mini-3b-2507-q4_k.gguf -f audio.wav -osrt

Pre-quantised GGUFs: cstr/voxtral-mini-3b-2507-GGUF

CrispASR supports 11 ASR backends in the same binary (Whisper, Parakeet, Canary, Cohere, Granite, Qwen3, wav2vec2, and both Voxtral variants).

cstr changed discussion title from CrispASR — pure C++ inference with GGUF quantisation (no Python needed) to model support per CrispASR — pure C++ inference with GGUF quantisation (no Python needed)

Sign up or log in to comment