VoiceOwn — Stutter-Aware Intelligence

VoiceOwn is a specialized adaptation of Gemma 4 E2B, engineered to bridge communication gaps for individuals who stutter.

Unlike conventional ASR systems that transcribe every disfluency, this model performs intent-focused speech understanding—filtering repetitions, prolongations, and blocks to produce clean, intended language.

✨ Core Capabilities

  • Stutter-Awareness
    Handles repetitions, prolongations, and speech blocks natively.

  • Intent Extraction
    Identifies the speaker’s intended words rather than literal disfluent output.

  • Multimodal Intelligence
    Uses Gemma 4’s audio encoder to interpret timing, tone, and structure of speech.

📦 Model Weights (GGUF)

File Description
voiceown-base-Q4_K_M.gguf Mobile-optimized
voiceown-base-Q8_0.gguf Higher-fidelity evaluation build
gemma-4-e2b-it.BF16-mmproj.gguf Required multimodal projector (core logic)

🧪 Training Insights

  • Objective: Intent-accurate output from disfluent speech
  • Dataset: naazimsnh02/voiceown-stutter-asr
  • Samples: 2,850 real-world recordings
  • Epochs: 2
  • Training Loss: 1.1124
  • Hardware: NVIDIA A100-SXM4
  • Training Time: ~60 minutes

⚙️ Usage

Run with llama.cpp multimodal CLI:

./llama-mtmd-cli \
  -m voiceown-base-Q4_K_M.gguf \
  --mmproj gemma-4-e2b-it.BF16-mmproj.gguf \
  --audio user_clip.wav \
  -p "Capture the speaker's intended words, ignoring any stutters."
Downloads last month
989
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naazimsnh02/voiceown-base-gguf

Quantized
(14)
this model