Deliverable 5.3 Automatic Speech Recognition for minoritised languages

Automatic Speech Recognition for Minoritised Languages.

By Dr. Arvind Kumar, Max Planck Institute for Psycholinguistics.

Dr. Kumar (MPI), offered a comprehensive deep-dive into Automatic Speech Recognition (ASR) for European low-resource languages by balancing core theoretical frameworks with practical engineering. Beginning with the fundamentals of speech processing and a detailed breakdown of Word Error Rate (WER) as the benchmark metric for model evaluation, the session transitioned into a critical analysis of the unique roadblocks faced in this domain, specifically data scarcity, acoustic variability, and the limitations of zero-shot architectures. To address these challenges, Dr. Kumar provided an overview of modern state-of-the-art ASR models like Whisper and wav2vec 2.0, followed by a curated walkthrough of publicly available open-source datasets for European minority languages alongside essential data augmentation techniques like speed perturbation and noise injection to prevent overfitting. The session culminated in an interactive hands-on lab where participants utilized a Google Colab notebook to actively fine-tune a pre-trained model on a localized low-resource dataset, before concluding with a look toward the future of speech technology, highlighting the roles of cross-lingual transfer learning and community- driven data collection in preserving linguistic diversity.

PDF of the presentation given at the Fosterlang workshop in Donostia on May 28th 2026:

ASR_presentation by Arvind Kumar Fosterlang 28 5 20266 D5.3_compressed

PDF of the instructional ASR training materials: 

Instructional Training Materials ASR Capacity Building Arvind Kumar Fosterlang D5.3_compressed