Overview

Project funded by Prasar Bharati.
Project duration- 2021 to 2029

Description

Prasar Bharati is India’s largest broadcasting agency, broadcasting audio and video content via All India Radio and Doordarshan. It has a humungous wealth of audio and video content. In this age of digitization and AI, machine learning technologies have opened up vast opportunities. A proper content analysis can help in efficient search, recommendation, accessibility, translation and so on. In this project, we propose to develop and deploy automatic speech recognition (ASR) technologies for Prasar Bharati’s multimedia content.
This project involves implementing an automatic speech recognition (ASR) system for certain languages. It will convert speech audio into a textual form. Specifically, it will be a conversational large vocabulary continuous speech recognition (LVCSR) system in its final form.

Objective

This project aims to develop and deploy robust automatic speech recognition (ASR) systems for Indian languages, supporting speech-to-text conversion for Prasar Bharati’s massive multimedia content, enabling subtitling, accessibility, and improved search functionality.

Key Achievements & Impact

Technological Development:

Designed conversational large vocabulary continuous speech recognition (LVCSR) systems, enabling accurate transcription of audio into text.
Trustworthy ASR by incorporating uncertainty estimation and confidence calibration into the transcription tool. This speeds up the annotation workflow by almost 10x.
Focused on Indian languages, addressing the challenges of multilingual subtitling and accessibility.

Competitions and Benchmarks:

Secured second place in Track 2 (Bhojpuri) in the MADASR 2023 challenge, competing on ASR performance for low-resource languages like Bengali and Bhojpuri.

Research Contributions:

Published a significant research paper:
- “TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR”, in IEEE/ACM TASLP, 2024—introducing confidence estimation methods for ASR systems.

Social Relevance:

Supports digitization, accessibility, and public engagement by enabling automated subtitling and transcription of historic and contemporary broadcast content.
Contributes to the preservation and accessibility of Indian language audio-visual content at scale.