Overview

Project funded by Prasar Bharati.
Project duration- Dec 2021 to Dec 2024
Deployment: https://pb.madhavlab.com

Description

In this age of digitization and AI, machine learning technologies have opened up vast opportunities. A proper content analysis can help in efficient search, recommendation, accessibility, translation and so on. In this project, we propose to develop and deploy audio-based content retrieval technologies for Prasar Bharati’s multimedia content.

The overarching goal of this project is audio-based content retrieval. We will develop two kinds of retrieval methods, viz., extracting text labels from audio (audio tagging) and direct audio matching (audio fingerprinting).

Publications

Sagar Dutta and Vipul Arora, “AudioNet: Supervised Deep Hashing for Retrieval of Similar Audio Events”, in IEEE TASLP, 2024.
Kavya Ranjan Saxena, and Vipul Arora, “Interactive singing melody extraction based on active adaptation”, in IEEE TASLP, 2024.
Akanksha Singh, Vipul Arora, and Yi-Ping Phoebe Chen, “An efficient TF-IDF based Query by Example Spoken Term Detection”, in IEEE Conference on Artificial Intelligence (CAI), 2024.
Akshay Raina, Sayeedul Islam Sheikh, and Vipul Arora, “Learning Ontology Informed Representations with Constraints for Acoustic Event Detection”, in IEEE ICASSP, 2024.
Anup Singh, Kris Demuynck, and Vipul Arora, “Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example”, in IEEE ICASSP, 2023.
Sumit Kumar, B. Anshuman, Linus Ruettimann, Richard H.R. Hahnloser and Vipul Arora, “Balanced Deep CCA for Bird Vocalization Detection”, IN IEEE ICASSP, 2023.
Adhiraj Banerjee and Vipul Arora, “wav2tok: Deep Sequence Tokenizer for Audio Retrieval, In ICLR, 2023.
Anup Singh, Kris Demuynck, and Vipul Arora, “Attention-Based Audio Embeddings for Query-by-Example”, In ISMIR, 2022.

Codes

Audio Search with noisy snippets or Audio Fingerprinting