Audio Course documentation
Supplemental reading and resources
Unit 0. Welcome to the course!
Unit 1. Working with audio data
Unit 2. A gentle introduction to audio applications
Unit 3. Transformer architectures for audio
Unit 4. Build a music genre classifier
Unit 5. Automatic Speech Recognition
Unit 6. From text to speech
What you'll learn and what you'll buildText-to-speech datasetsPre-trained models for text-to-speechFine-tuning SpeechT5Evaluating text-to-speech modelsHands-on exerciseSupplemental reading and resources
Unit 7. Putting it all together
Unit 8. Finish line
Course Events
Supplemental reading and resources
This unit introduced the text-to-speech task, and covered a lot of ground. Want to learn more? Here you will find additional resources that will help you deepen your understanding of the topics and enhance your learning experience.
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis: a paper introducing HiFi-GAN for speech synthesis.
- X-Vectors: Robust DNN Embeddings For Speaker Recognition: a paper introducing X-Vector method for speaker embeddings.
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech: a paper introducing FastSpeech 2, another popular text-to-speech model that uses a non-autoregressive TTS method.
- A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech: a paper introducing MQTTS, an autoregressive TTS system that replaces mel-spectrograms with quantized discrete representation.