Model Card for dk2325/whisper-tiny-finetuned
Whisper Tiny English fine-tuned for improved general English ASR performance.
Model Details
Model Description
This model is a fine-tuned ASR checkpoint based on Whisper Tiny English, trained to improve transcription quality on English speech while remaining lightweight and fast.
- Developed by: DK2325
- Funded by: Self-funded personal project
- Shared by: DK2325
- Model type: Seq2Seq speech-to-text transformer (Whisper)
- Language(s): English
- License: Apache-2.0
- Finetuned from model: openai/whisper-tiny.en
Model Sources
- Repository: https://github.com/DK2325/ASR_Finetuning_openai-whisper-tiny.en
- Paper: https://arxiv.org/abs/2212.04356
- Demo: Not available
Uses
Direct Use
Use this model for automatic speech recognition of English audio such as:
- Read speech
- Lectures
- Voice notes
- General transcription tasks
Downstream Use
Can be integrated into:
- Subtitle generation tools
- ASR APIs
- Search/indexing pipelines for spoken content
Out-of-Scope Use
Not intended for:
- Non-English transcription
- High-noise multi-speaker audio without preprocessing
- Safety-critical, legal, or medical decision workflows without human review
Bias, Risks, and Limitations
- Performance varies across accents, recording quality, microphone type, and domain.
- Errors may occur on proper nouns, rare words, and technical terms.
- Model outputs should be reviewed by humans in high-stakes scenarios.
Recommendations
Users (both direct and downstream) should be aware of model limitations. Evaluate on your own target dataset before production deployment.
How to Get Started with the Model
Use with Hugging Face Transformers automatic speech recognition pipeline.
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="dk2325/whisper-tiny-finetuned",
device=-1
)
result = asr(
"path/to/audio.wav",
generate_kwargs={"language": "en", "task": "transcribe"}
)
print(result["text"])
Training Details
Training Data
Fine-tuned on English speech data prepared through project manifests (LibriSpeech-style pipeline).
Training Procedure
Preprocessing
- Audio processed with Whisper feature extractor
- Text tokenized with Whisper tokenizer/processor
- Seq2Seq training with standard ASR collation
Training Hyperparameters
- Training regime: fp16 mixed precision
- Learning rate: 1e-5
- Optimizer: AdamW
- Gradient accumulation: used for low-VRAM setup
Speeds, Sizes, Times
Training was performed on a local low-resource setup (4GB VRAM class GPU). Precise training-time profiling was not fully standardized.
Evaluation
Testing Data, Factors and Metrics
Testing Data
Project validation split (English ASR setup).
Factors
- Baseline model vs fine-tuned model
Metrics
- Word Error Rate (WER)
Results
- Base WER: 0.2806
- Fine-tuned WER: 0.0586
Summary
Fine-tuning substantially reduced WER compared to the base Whisper Tiny English checkpoint in project validation.
Model Examination
No formal interpretability study was performed.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator: https://mlco2.github.io/impact#compute
- Hardware Type: Local consumer GPU (4GB VRAM class)
- Hours used: Not precisely tracked
- Cloud Provider: N/A
- Compute Region: N/A
- Carbon Emitted: Not measured
Technical Specifications
Model Architecture and Objective
Whisper Tiny English encoder-decoder transformer fine-tuned for English speech-to-text transcription.
Compute Infrastructure
Local machine training setup.
Hardware
Consumer GPU with 4GB VRAM class constraints.
Software
Python, PyTorch, Hugging Face Transformers, Datasets, Evaluate.
Citation
BibTeX
@misc{dk2325_whisper_tiny_finetuned_2026,
title={Whisper Tiny English Fine-Tuned ASR},
author={DK2325},
year={2026},
howpublished={\url{https://huggingface.co/dk2325/whisper-tiny-finetuned}}
}
APA
DK2325. (2026). Whisper Tiny English Fine-Tuned ASR. Hugging Face. https://huggingface.co/dk2325/whisper-tiny-finetuned
Glossary
- ASR: Automatic Speech Recognition
- WER: Word Error Rate (lower is better)
More Information
This model is part of an end-to-end fine-tuning and deployment project focused on practical ASR improvements under limited hardware constraints.
Model Card Authors
DK2325
Model Card Contact
Hugging Face profile: https://huggingface.co/dk2325
- Downloads last month
- 89