model support per CrispASR — pure C++ inference with GGUF quantisation (no Python needed)
#48
by cstr - opened
We've built a complete C++ runtime for Voxtral-Mini-3B in CrispASR, a multi-backend ASR tool based on ggml. One binary, one GGUF file — no Python, no PyTorch, no pip install.
What works:
- Full transcription pipeline (mel → Whisper encoder → Mistral 3B LLM decode)
- Q4_K / Q5_0 / Q8_0 / F16 quantisation (2.5 GB Q4_K vs 6+ GB BF16)
- Word-level timestamps via CTC forced alignment (
-am canary-ctc-aligner.ggufor-am qwen3-forced-aligner.gguf) - Temperature sampling + best-of-N decoding (
--best-of 5 -tp 0.3) - Streaming from mic/stdin (
--stream,--mic,--live) - Audio Q&A mode (
--ask "What language is this?"— voxtral 3B is a full audio LLM, not just ASR) - Speech translation (
--translate -tl de) - Speaker diarisation, language ID, SRT/VTT/JSON output
- GPU acceleration via CUDA / Metal / Vulkan (ggml backends)
Quick start:
# Build
git clone https://github.com/CrispStrobe/CrispASR && cd CrispASR
cmake -S . -B build && cmake --build build -j8
# Auto-download model and transcribe
./build/bin/crispasr --backend voxtral -m auto -f audio.wav
# Or use pre-quantised GGUF from HF
./build/bin/crispasr -m voxtral-mini-3b-2507-q4_k.gguf -f audio.wav -osrt
Pre-quantised GGUFs: cstr/voxtral-mini-3b-2507-GGUF
CrispASR supports 11 ASR backends in the same binary (Whisper, Parakeet, Canary, Cohere, Granite, Qwen3, wav2vec2, and both Voxtral variants).
cstr changed discussion title from CrispASR — pure C++ inference with GGUF quantisation (no Python needed) to model support per CrispASR — pure C++ inference with GGUF quantisation (no Python needed)