Model Card for dk2325/whisper-tiny-finetuned

Whisper Tiny English fine-tuned for improved general English ASR performance.

Model Details

Model Description

This model is a fine-tuned ASR checkpoint based on Whisper Tiny English, trained to improve transcription quality on English speech while remaining lightweight and fast.

Developed by: DK2325
Funded by: Self-funded personal project
Shared by: DK2325
Model type: Seq2Seq speech-to-text transformer (Whisper)
Language(s): English
License: Apache-2.0
Finetuned from model: openai/whisper-tiny.en

Model Sources

Repository: https://github.com/DK2325/ASR_Finetuning_openai-whisper-tiny.en
Paper: https://arxiv.org/abs/2212.04356
Demo: Not available

Uses

Direct Use

Use this model for automatic speech recognition of English audio such as:

Read speech
Lectures
Voice notes
General transcription tasks

Downstream Use

Can be integrated into:

Subtitle generation tools
ASR APIs
Search/indexing pipelines for spoken content

Out-of-Scope Use

Not intended for:

Non-English transcription
High-noise multi-speaker audio without preprocessing
Safety-critical, legal, or medical decision workflows without human review

Bias, Risks, and Limitations

Performance varies across accents, recording quality, microphone type, and domain.
Errors may occur on proper nouns, rare words, and technical terms.
Model outputs should be reviewed by humans in high-stakes scenarios.

Recommendations

Users (both direct and downstream) should be aware of model limitations. Evaluate on your own target dataset before production deployment.

How to Get Started with the Model

Use with Hugging Face Transformers automatic speech recognition pipeline.

    from transformers import pipeline

    asr = pipeline(
        "automatic-speech-recognition",
        model="dk2325/whisper-tiny-finetuned",
        device=-1
    )

    result = asr(
        "path/to/audio.wav",
        generate_kwargs={"language": "en", "task": "transcribe"}
    )

    print(result["text"])

Training Details

Training Data

Fine-tuned on English speech data prepared through project manifests (LibriSpeech-style pipeline).

Training Procedure

Preprocessing

Audio processed with Whisper feature extractor
Text tokenized with Whisper tokenizer/processor
Seq2Seq training with standard ASR collation

Training Hyperparameters

Training regime: fp16 mixed precision
Learning rate: 1e-5
Optimizer: AdamW
Gradient accumulation: used for low-VRAM setup

Speeds, Sizes, Times

Training was performed on a local low-resource setup (4GB VRAM class GPU). Precise training-time profiling was not fully standardized.

Evaluation

Testing Data, Factors and Metrics

Testing Data

Project validation split (English ASR setup).

Factors

Baseline model vs fine-tuned model

Metrics

Word Error Rate (WER)

Results

Base WER: 0.2806
Fine-tuned WER: 0.0586

Summary

Fine-tuning substantially reduced WER compared to the base Whisper Tiny English checkpoint in project validation.

Model Examination

No formal interpretability study was performed.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator: https://mlco2.github.io/impact#compute

Hardware Type: Local consumer GPU (4GB VRAM class)
Hours used: Not precisely tracked
Cloud Provider: N/A
Compute Region: N/A
Carbon Emitted: Not measured

Technical Specifications

Model Architecture and Objective

Whisper Tiny English encoder-decoder transformer fine-tuned for English speech-to-text transcription.

Compute Infrastructure

Local machine training setup.

Hardware

Consumer GPU with 4GB VRAM class constraints.

Software

Python, PyTorch, Hugging Face Transformers, Datasets, Evaluate.

Citation

BibTeX

@misc{dk2325_whisper_tiny_finetuned_2026,
  title={Whisper Tiny English Fine-Tuned ASR},
  author={DK2325},
  year={2026},
  howpublished={\url{https://huggingface.co/dk2325/whisper-tiny-finetuned}}
}

APA

DK2325. (2026). Whisper Tiny English Fine-Tuned ASR. Hugging Face. https://huggingface.co/dk2325/whisper-tiny-finetuned

Glossary

ASR: Automatic Speech Recognition
WER: Word Error Rate (lower is better)

More Information

This model is part of an end-to-end fine-tuning and deployment project focused on practical ASR improvements under limited hardware constraints.

Model Card Authors

DK2325

Model Card Contact

Hugging Face profile: https://huggingface.co/dk2325

Downloads last month: 89

Safetensors

Model size

37.8M params

Tensor type

F32

Paper for dk2325/whisper-tiny-finetuned

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 53