Overview

Project funded by Prasar Bharati.
Project duration- Dec 2021 to Dec 2024

Description

Prasar Bharati is India’s largest broadcasting agency, broadcasting audio and video content via All India Radio and Doordarshan. It has a humungous wealth of audio and video content. In this age of digitization and AI, machine learning technologies have opened up vast opportunities. A proper content analysis can help in efficient search, recommendation, accessibility, translation and so on. In this project, we propose to develop and deploy automatic speech recognition (ASR) technologies for Prasar Bharati’s multimedia content.

This project involves implementing an automatic speech recognition (ASR) system for certain languages. It will convert speech audio into a textual form. Specifically, it will be a conversational large vocabulary continuous speech recognition (LVCSR) system in its final form.

Achievements

Secured second place in track 2 (Bhojpuri): Participated in MADASR 2023 challenge for ASR for Bengali and Bhojpuri. Our team achieved various positions in different tracks.
Paper Published- Nagarathna Ravi, Thishyan Raj T, and Vipul Arora, “TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR”, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.