Vatsal Gupta
g.vatsal@alumni.iitg.ac.in
g.vatsal [at] alumni [dot] iitg [dot] ac [dot] in

My research has been centered around making ML systems adaptable. I have hence explored robustness and trustworthiness in ML systems. This has also evolved into an interest in both NLP and CV towards multimodal reasoning to enhance models.


Create a soft, atmospheric scientific image-abstract illustrating isotropic 3D microscopy reconstruction. Show a transition from a blurry, elongated axial PSF on the left to a crisp, isotropic 3D fluorescent structure on the right. Represent the blur as stretched, hazy light and the reconstruction as sharp, symmetric volumetric detail. Subtly include the idea of flow or continuous transformation—a smooth field of arrows or gradients guiding the volume from degraded to restored quality. Hint at physical optics by including gentle diffraction-like patterns, and hint at deep learning with an abstract neural-flow motif (smooth curves, not literal network diagrams). Use a calm palette of blues, cyans, and violets with soft glows, misty transitions, and no text. Convey realism of the physical PSF on the left and high-fidelity isotropic recovery on the right, blending science and art.
MicroFM: Physics-guided Flow Matching for Isotropic Microscopy Reconstruction
We tackle the challenge of poor axial resolution in microscopy by generating realistic training data using physically accurate PSFs and introducing the first 3D flow-matching framework for reconstruction. Our method, MicroFM, leverages an implicit geometry prior to recover high-fidelity, isotropic volumes across diverse microscopy systems.
Xingzu Zhan, Runmin Jiang, Vatsal Gupta, Tanush Swaminathan, Yanwen Wang, Genpei Zhang, Haili Wang, Min Xu
Submitted to CVPR 2026
Create a soft, modern infographic illustrating the concept of NTSEBench, a dataset for evaluating cognitive multimodal reasoning. Use gentle pastel colors, rounded shapes, and clear visual hierarchy. Include sections representing: • Cognitive reasoning tasks — puzzles, series, analogies, both textual and visual • LLMs and VLMs — strong at common-sense reasoning but challenged by deeper cognitive tasks • Dataset Overview — 2,728 multiple-choice questions, 4,642 images, 26 categories • NTSE exam source — real aptitude and intelligence problems • Evaluation — four modeling strategies comparing open-source vs proprietary models Use simple icons (brain, puzzle pieces, flow lines, book, image frame). Add soft shadows and smooth gradients. Style should be highly readable, calm, friendly, and explanatory. No dense text; concise labels only.
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
We explore how well modern LLMs and VLMs handle deeper cognitive reasoning using NTSEBench, a dataset built from real NTSE aptitude problems. The benchmark reveals clear gaps in models’ performance on puzzles, analogies, and visual–text reasoning.
Vatsal Gupta*, Pranshu Pandya*, Agney S Talwarr, Tushar Kataria, Dan Roth, Vivek Gupta
Published at NAACL, 2025
Paper on ACL Anthology / Website / slides / poster
Create a clean, modern infographic illustrating a research workflow for improving trust and robustness in language models. Include sections showing: (1) the problem—LMs as black boxes, hallucinations, and sensitivity to input perturbations; (2) the methodology—examining perturbation effects across model scales, analyzing failure modes; (3) robustness strategies—fine-tuning approaches, evaluating transfer between perturbations, and multi-perturbation training; (4) LLM extension using chain-of-thought prompting with exemplars; (5) results—improved robustness on Tabular-NLI while maintaining accuracy. Use icons, arrows, and a structured flow layout with a tech-research aesthetic.” Of dimnesions 400 (width) by 200 (height)
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets
We study how input perturbations impact language models—both pre-trained models and LLMs—to better understand their vulnerabilities and failure modes. Through fine-tuning and CoT-based prompting, we test robustness transfer between perturbations and develop strategies that build multi-perturbation resilience. Our evaluation on Tabular-NLI shows the method preserves accuracy while enhancing robustness.
Vatsal Gupta*, Pranshu Pandya*, Tushar Kataria, Vivek Gupta Dan Roth
Published at EMNLP, 2024
Paper on ACL Anthology / Website / slides / poster
Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual question-answering multimodal language models in reasoning with flowcharts as visual contexts. FlowVQA comprises 2,272 carefully generated and human-verified flowchart images from three distinct content sources, along with 22,413 diverse question-answer pairs, to test a spectrum of reasoning tasks, including information localization, decision-making, and logical progression. We conduct a thorough baseline evaluation on a suite of both open-source and proprietary multimodal language models using various strategies, followed by an analysis of directional bias. The results underscore the benchmark's potential as a vital tool for advancing the field of multimodal modeling, providing a focused and challenging environment for enhancing model performance in visual and logical reasoning tasks.
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
We present FlowVQA, a benchmark that fills the gap in spatial and logical reasoning for visual question answering.It features a curated collection of flowcharts and rich QA pairs covering multiple reasoning skills.Comprehensive model evaluations highlight the benchmark’s value for advancing multimodal reasoning.
Shubhankar Singh*, Purvi Chaurasia*, Vatsal Gupta, Pranshu Pandya, Yerram Varun, Vivek Gupta, Dan Roth
Published at ACL, 2024
Paper on ACL Anthology / Website
ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation
We dive into the exciting intersection of LLMs and hardware verification to rethink how assertions are created. ChIRAAG takes natural-language specs, structures them, and produces accurate SystemVerilog Assertions through guided iteration. It streamlines engineering workflows and helps ease into a traditionally tedious task.
Bhabesh Mali, Karthik Maddala, Vatsal Gupta, Sweeya Reddy, Chandan Karfa, Ramesh Karri
Published at ISVLSI, 2024
Paper
and * : Equal Contribution

For a comprehensive and up-to-date list of my publications, please refer to Google Scholar.


Modified template by Jon Barron
Last updated: October 2025