Model Card for dk2325/whisper-tiny-finetuned

Whisper Tiny English fine-tuned for improved general English ASR performance.

Model Details

Model Description

This model is a fine-tuned ASR checkpoint based on Whisper Tiny English, trained to improve transcription quality on English speech while remaining lightweight and fast.

  • Developed by: DK2325
  • Funded by: Self-funded personal project
  • Shared by: DK2325
  • Model type: Seq2Seq speech-to-text transformer (Whisper)
  • Language(s): English
  • License: Apache-2.0
  • Finetuned from model: openai/whisper-tiny.en

Model Sources

Uses

Direct Use

Use this model for automatic speech recognition of English audio such as:

  • Read speech
  • Lectures
  • Voice notes
  • General transcription tasks

Downstream Use

Can be integrated into:

  • Subtitle generation tools
  • ASR APIs
  • Search/indexing pipelines for spoken content

Out-of-Scope Use

Not intended for:

  • Non-English transcription
  • High-noise multi-speaker audio without preprocessing
  • Safety-critical, legal, or medical decision workflows without human review

Bias, Risks, and Limitations

  • Performance varies across accents, recording quality, microphone type, and domain.
  • Errors may occur on proper nouns, rare words, and technical terms.
  • Model outputs should be reviewed by humans in high-stakes scenarios.

Recommendations

Users (both direct and downstream) should be aware of model limitations. Evaluate on your own target dataset before production deployment.

How to Get Started with the Model

Use with Hugging Face Transformers automatic speech recognition pipeline.

    from transformers import pipeline

    asr = pipeline(
        "automatic-speech-recognition",
        model="dk2325/whisper-tiny-finetuned",
        device=-1
    )

    result = asr(
        "path/to/audio.wav",
        generate_kwargs={"language": "en", "task": "transcribe"}
    )

    print(result["text"])

Training Details

Training Data

Fine-tuned on English speech data prepared through project manifests (LibriSpeech-style pipeline).

Training Procedure

Preprocessing

  • Audio processed with Whisper feature extractor
  • Text tokenized with Whisper tokenizer/processor
  • Seq2Seq training with standard ASR collation

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • Learning rate: 1e-5
  • Optimizer: AdamW
  • Gradient accumulation: used for low-VRAM setup

Speeds, Sizes, Times

Training was performed on a local low-resource setup (4GB VRAM class GPU). Precise training-time profiling was not fully standardized.

Evaluation

Testing Data, Factors and Metrics

Testing Data

Project validation split (English ASR setup).

Factors

  • Baseline model vs fine-tuned model

Metrics

  • Word Error Rate (WER)

Results

  • Base WER: 0.2806
  • Fine-tuned WER: 0.0586

Summary

Fine-tuning substantially reduced WER compared to the base Whisper Tiny English checkpoint in project validation.

Model Examination

No formal interpretability study was performed.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator: https://mlco2.github.io/impact#compute

  • Hardware Type: Local consumer GPU (4GB VRAM class)
  • Hours used: Not precisely tracked
  • Cloud Provider: N/A
  • Compute Region: N/A
  • Carbon Emitted: Not measured

Technical Specifications

Model Architecture and Objective

Whisper Tiny English encoder-decoder transformer fine-tuned for English speech-to-text transcription.

Compute Infrastructure

Local machine training setup.

Hardware

Consumer GPU with 4GB VRAM class constraints.

Software

Python, PyTorch, Hugging Face Transformers, Datasets, Evaluate.

Citation

BibTeX

@misc{dk2325_whisper_tiny_finetuned_2026,
  title={Whisper Tiny English Fine-Tuned ASR},
  author={DK2325},
  year={2026},
  howpublished={\url{https://huggingface.co/dk2325/whisper-tiny-finetuned}}
}

APA

DK2325. (2026). Whisper Tiny English Fine-Tuned ASR. Hugging Face. https://huggingface.co/dk2325/whisper-tiny-finetuned

Glossary

  • ASR: Automatic Speech Recognition
  • WER: Word Error Rate (lower is better)

More Information

This model is part of an end-to-end fine-tuning and deployment project focused on practical ASR improvements under limited hardware constraints.

Model Card Authors

DK2325

Model Card Contact

Hugging Face profile: https://huggingface.co/dk2325

Downloads last month
89
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for dk2325/whisper-tiny-finetuned