Dheeraj Varghese

Building a path toward helpful intelligence.

prof_pic.jpg

I’m an incoming PhD researcher at the VIS Lab, supervised by Cees Snoek, where I work on generalist multimodal foundation models as part of the Horizon Europe ELLIOT project. My research focuses on designing unified architectures that combine modalities into a shared space, aiming for models that generalize well, adapt efficiently, and assist meaningfully across a wide range of tasks.

Previously, I worked on combining discrete diffusion and autoregression for multilingual image generation with Mohammad M. Derakhshani, and explored curriculum learning in vision-language models under the supervision of Yuki Asano.

At my core, I’m an applied engineer with an enthusiasm for recreating intelligence that serves as a tool, to make tasks easier for the human user. Sample efficiency in learning, blurring the context window, and unified representation spaces - all capture my attention at the moment.

news

Jul 18, 2025 Two of my works: NeoBabel and TaxonomiGQA are out! 🎉
Mar 11, 2025 Co-organized a hackathon for the First Workshop on Structure & Generalization in Multimodal Language Understanding (SAGE-MLU 2025)
Mar 21, 2024 Will be a Teaching Assistant for Natural Language Processing at VU!
Mar 15, 2024 Attended the ELLIS Winter School on Foundation Models

selected publications

  1. taxonomicGQA-title.png
    Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
    Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström, and 3 more authors
    2025
  2. neobabel_new_color.png
    NeoBabel: A Multilingual Open Tower for Visual Generation
    Mohammad Mahdi Derakhshani, Dheeraj Varghese, Marzieh Fadaee, and 1 more author
    arXiv preprint arXiv:2507.06137, 2025