For student intern and short-term research (after BTech) positions.
Present day LLMs (such as chatGPT) learn from text. Can we have them learn from audio, without text? We are not talking of just English speech, but multilingual speech. We are not talking of just speech, but also music.
To build your basics, read the following paper:
Diffusion models generate high-quality images. Can we adapt the diffusion models to improve on generating certain style with little data from that style?
To build your basics, read the following paper:
If you are comfortable with the above papers, send an email to Prof. Arora with the heading “[UG Project Appl] <Project Name>”. We are open for students outside IITK too.