- Project funded by Prasar Bharati.
- Project duration- Dec 2021 to Dec 2024
- Deployment: https://pb.madhavlab.com
In this age of digitization and AI, machine learning technologies have opened up vast opportunities. A proper content analysis can help in efficient search, recommendation, accessibility, translation and so on. In this project, we propose to develop and deploy audio-based content retrieval technologies for Prasar Bharati’s multimedia content.
The overarching goal of this project is audio-based content retrieval. We will develop two kinds of retrieval methods, viz., extracting text labels from audio (audio tagging) and direct audio matching (audio fingerprinting).
- Anup Singh, Kris Demuynck, and Vipul Arora, “Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example”, in ICASSP, 2023.
- Sumit Kumar, B. Anshuman, Linus Ruettimann, Richard H.R. Hahnloser and Vipul Arora, “Balanced Deep CCA for Bird Vocalization Detection”, IN ICASSP, 2023.
- Adhiraj Banerjee and Vipul Arora, “wav2tok: Deep Sequence Tokenizer for Audio Retrieval, In ICLR, 2023.
- Anup Singh, Kris Demuynck, and Vipul Arora, “Attention-Based Audio Embeddings for Query-by-Example”, In ISMIR, 2022.