Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Audio Course documentation

Check your understanding of the course material

Audio Course

Unit 0. Welcome to the course!

Unit 1. Working with audio data

What you'll learn Introduction to audio data Load and explore an audio dataset Preprocessing audio data Streaming audio data Quiz Supplemental reading and resources

Unit 2. A gentle introduction to audio applications

Unit 3. Transformer architectures for audio

Unit 4. Build a music genre classifier

Unit 5. Automatic Speech Recognition

Unit 6. From text to speech

Unit 7. Putting it all together

Unit 8. Finish line

Course Events

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Check your understanding of the course material

1. What units is the sampling rate measured in?

dB Hz bit

2. When streaming a large audio dataset, how soon can you start using it?

As soon as the full dataset is downloaded. As soon as the first 16 examples are downloaded. As soon as the first example is downloaded.

3. What is a spectrogram?

A device used to digitize the audio that is first captured by a microphone, which converts the sound waves into an electrical signal. A plot that shows how the amplitude of an audio signal change over time. It is also known as the *time domain* representation of sound. A visual representation of the frequency spectrum of a signal as it varies with time.

4. What is the easiest way to convert raw audio data into log-mel spectrogram expected by Whisper?

A.

librosa.feature.melspectrogram(audio["array"])

B.

feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-small")
feature_extractor(audio["array"])

C.

dataset.feature(audio["array"], model="whisper")

A B C

5. How do you load a dataset from 🤗 Hub?

A.

from datasets import load_dataset

dataset = load_dataset(DATASET_NAME_ON_HUB)

B.

import librosa

dataset = librosa.load(PATH_TO_DATASET)

C.

from transformers import load_dataset

dataset = load_dataset(DATASET_NAME_ON_HUB)

A B C

6. Your custom dataset contains high-quality audio with 32 kHz sampling rate. You want to train a speech recognition model that expects the audio examples to have a 16 kHz sampling rate. What should you do?

Use the examples as is, the model will easily generalize to higher quality audio examples. Use Audio module from the 🤗 Datasets library to downsample the examples in the custom dataset Downsample by a factor 2x by throwing away every other sample.

7. How can you convert a spectrogram generated by a machine learning model into a waveform?

We can use a neural network called a vocoder to reconstruct a waveform from the spectrogram. We can use the inverse STFT to convert the generated spectrogram into a waveform You can't convert a spectrogram generated by a machine learning model into a waveform.

Update on GitHub

←Streaming audio data

Check your understanding of the course material 1. What units is the sampling rate measured in?2. When streaming a large audio dataset, how soon can you start using it?3. What is a spectrogram?4. What is the easiest way to convert raw audio data into log-mel spectrogram expected by Whisper?5. How do you load a dataset from 🤗 Hub?6. Your custom dataset contains high-quality audio with 32 kHz sampling rate. You want to train a speech recognition model that expects the audio examples to have a 16 kHz sampling rate. What should you do?7. How can you convert a spectrogram generated by a machine learning model into a waveform?