Dheeraj Varghese
Building a path toward helpful intelligence.

I’m an incoming PhD researcher at the VIS Lab, supervised by Cees Snoek, where I work on generalist multimodal foundation models as part of the Horizon Europe ELLIOT project. My research focuses on designing unified architectures that combine modalities into a shared space, aiming for models that generalize well, adapt efficiently, and assist meaningfully across a wide range of tasks.
Previously, I worked on combining discrete diffusion and autoregression for multilingual image generation with Mohammad M. Derakhshani, and explored curriculum learning in vision-language models under the supervision of Yuki Asano.
At my core, I’m an applied engineer with an enthusiasm for recreating intelligence that serves as a tool, to make tasks easier for the human user. Sample efficiency in learning, blurring the context window, and unified representation spaces - all capture my attention at the moment.
news
Jul 18, 2025 | Two of my works: NeoBabel and TaxonomiGQA are out! 🎉 |
---|---|
Mar 11, 2025 | Co-organized a hackathon for the First Workshop on Structure & Generalization in Multimodal Language Understanding (SAGE-MLU 2025) |
Mar 21, 2024 | Will be a Teaching Assistant for Natural Language Processing at VU! |
Mar 15, 2024 | Attended the ELLIS Winter School on Foundation Models |