Here is an exciting opportunity to work on cutting-edge projects in the field of Audio Processing through the DCASE2025 Challenge. This international competition features various tasks using Machine Learning, Audio Processing, and related domains, offering a platform to apply and enhance your technical skills.
Students interested in participating under the guidance of Prof. Vipul Arora are encouraged to complete the screening task provided below.
The task is as simple as it can be: create a model to classify non-speech sounds. Train the model with the training+validation data, and test on the test data. The dataset is given below.
Metrics: Provide F1 Score, Precision and Recall as the final evaluation metrics by checking on the test data labels. The models will be verified along with the scores provided, so answer them as accurately as possible.
Dataset: (download link) The dataset consists of 7,014 files delivered as 32kHz, mono audio files in .wav format and divided into train and test sets. The train set consists of 6,289, and the test set consists of 725 files. The files were strongly manually annotated with a single ground-truth label. The length of each file is from 500 milliseconds to 4 seconds. It consists of 7 classes of non-speech sounds, based on which the labels have been provided in the metadata for the train and test data.