CS PhD student at USC's Institute for Creative Technologies, working with Prof. Mohammad Soleymani at the Intelligent Human Perception Lab. Bronze Medallist from IIT Roorkee (2021).
I build and improve multimodal LLMs (audio/video/omni) — specifically using post-training techniques like preference optimization to give models better social and emotion understanding. I also work on video generation for social behaviors.
| Paper | Venue |
|---|---|
| MoD-DPO — Mitigating cross-modal hallucinations in Omni LLMs | CVPR 2026 🏔️ Denver |
| AVERE — Audiovisual emotion reasoning with preference optimization | ICLR 2026 🇧🇷 Rio |
| Face-LLaVA — Facial expression understanding via instruction tuning | WACV 2026 🌵 Tucson |
| DiTaiListener — Controllable listener video generation | ICCV 2025 🌺 Hawai'i |
Multimodal LLMs Post-training & RLHF Emotion Understanding Social AI Video Generation Audio/Visual Reasoning
Currently looking for Research/Applied Scientist internships on multimodal LLMs and video generation — feel free to reach out!