Hello, I'm
AI Researcher — NLP • LLM Interpretability • RAG • Multi-Agent Systems
MS & B.Tech graduate from IISER Bhopal (April 2026), currently at Lexsi Labs as a Research Intern in Mechanistic Interpretability. I build systems that understand why language models believe what they believe.
Recent
I am an MS & B.Tech graduate from IISER Bhopal (April 2026), specializing in Data Science and Engineering. My research sits at the intersection of natural language processing, large language model interpretability, and multi-agent reasoning systems.
At Lexsi Labs, I investigate whether machine unlearning produces genuine circuit disruption or merely suppresses behavioral expression — using mechanistic attribution tools on Gemma Scope SAE features. This work connects deeply to my thesis on how RAG systems handle epistemic conflict between parametric and retrieved knowledge.
I am drawn to research that asks mechanistic questions: not just what models do, but why they do it, and how we can design architectures and training procedures that produce more reliable, trustworthy reasoning.
Ongoing and completed research projects.
Bio Medical Data Science Lab, IISER Bhopal • PI: Dr. Tanmay Basu
Investigating mechanisms to overcome imperfect context retrieval and resolve knowledge conflicts — both parametric and external — in Retrieval-Augmented Generation frameworks. The core contribution is a DARE dialectical engine that operationalizes formal cross-examination to dynamically assess source credibility based on logical resilience rather than static weighting. The engine subjects candidate sources to structured adversarial challenges and infers reliability from how well each source withstands scrutiny.
Lexsi Labs • Mechanistic Interpretability
Investigating whether standard machine unlearning methods produce genuine circuit disruption or merely suppress behavioral expression of internal knowledge states. Using EAP-IG attribution shifts over Gemma Scope SAE features pre- and post-unlearning on the TOFU forget10 benchmark, this work operationalizes the hypothesis that unlearning — like instruction-tuning — acts as a suppression mechanism rather than true knowledge erasure.
Bio Medical Data Science Lab • ICMR Bhopal Collaboration
Developing an end-to-end deep learning framework for automated PICO (Population, Intervention, Comparison, Outcome) extraction from full-text clinical papers. In collaboration with ICMR Bhopal, curating annotated datasets of full research articles to accelerate evidence synthesis for systematic reviews.
Bio Medical Data Science Lab, IISER Bhopal
Engineered a two-stage summarization pipeline combining a ModernBERT-based Siamese extractive stage (using scaled adaptive margin triplet loss for candidate ranking) with an abstractive generation stage. Evaluated on CNN/DailyMail.
School of Public Policy, IIT Delhi • Guide: Dr. Nandana Sengupta
Analyzed 2000+ faculty profiles from IRINS to identify a 12% gender differential in negative marking impact under JEE. Evaluated the socio-economic viability and impact of the 20% supernumerary quota for women at IITs using statistical modeling and causal inference methods.
Student Innovation Grant • IICE / DST, Government of India
Developed an AI-driven fintech platform awarded the Student Innovation Grant (Rs. 2 Lakhs) by IICE, funded by DST, Government of India. The platform demonstrated a 68% profit increase in backtesting using ML-driven signal generation and portfolio optimization.
Peer-reviewed publications and workshop papers. Click any paper to expand its abstract.
49th International ACM SIGIR Conference on Research and Development in Information Retrieval
Large language models (LLMs) exhibit a curious failure mode in retrieval-augmented generation: they may internally "disagree" with retrieved context even while outwardly appearing to follow it. We identify a mechanistic law (p < 10-42) predicting internal epistemic conflict in 70B-scale models by examining logit-level interactions between parametric knowledge and retrieved context. Our analysis uncovers the Alignment Paradox: instruction-tuning decouples a model's internal epistemic tension from its surface-level textual behavior, causing standard behavioral metrics to systematically miss genuine conflict. We further develop a 25ms "Mechanistic Auditor" that identifies latent sycophancy with 76% F1, achieving a 600× speedup over state-of-the-art probing methods and significantly outperforming model self-reporting.
@inproceedings{sadhu2026ragdisagrees,
title = {When {RAG} Disagrees: Detecting Latent Epistemic Conflict
via Logit Interactions},
author = {Sadhu, Saisab and others},
booktitle = {Proceedings of the 49th International {ACM} {SIGIR} Conference
on Research and Development in Information Retrieval},
year = {2026},
publisher = {ACM}
}
48th European Conference on Information Retrieval
We present DARE (Dialectical Adversarial and Evidence-Aware RAG), a framework that resolves factual conflicts in retrieval-augmented generation through a structured cross-examination process. Rather than weighting sources by static relevance scores, DARE subjects candidate passages to adversarial challenges and infers source reliability from logical resilience — how well each source withstands targeted counterfactual pressure. This dynamic credibility assessment mechanism achieves state-of-the-art gains of 77% on FaithEval and 28% on RAMDocs. The framework is model-agnostic and operates entirely at inference time, requiring no additional fine-tuning.
@inproceedings{sadhu2026dare,
title = {{DARE}: A Dialectical Framework for Adversarial
and Evidence-Aware {RAG}},
author = {Sadhu, Saisab and others},
booktitle = {Proceedings of the 48th European Conference on
Information Retrieval ({ECIR})},
year = {2026},
publisher = {Springer}
}
Proceedings of The 10th Workshop on FinNLP, EMNLP 2025
Generating high-quality financial analysis from earnings call transcripts requires synthesizing heterogeneous signals — management tone, analyst pushback, forward guidance, and market context — into coherent, persuasive reports. We design a hierarchical multi-agent framework modelling investment committee debates: specialist agents independently analyse distinct aspects of the call, a moderator orchestrates structured argumentation, and a synthesis agent produces the final report. Our system achieved a 68.75% win rate over cooperative baselines, ranking first globally on the official "Win Rate vs. Analyst Report" metric — preferred over professional human analysts.
@inproceedings{sadhu2025adversarial,
title = {Structured Adversarial Synthesis: A Multi-Agent Framework
for Generating Persuasive Financial Analysis from
Earning Call Transcripts},
author = {Sadhu, Saisab and others},
booktitle = {Proceedings of the 10th Workshop on Financial Technology
and Natural Language Processing ({FinNLP}), {EMNLP} 2025},
year = {2025},
publisher = {Association for Computational Linguistics}
}
JustNLP Workshop at IJCAI-AACL 2025
Abstractive summarization of ultra-long legal documents presents unique challenges: legal text is structured by rhetorical roles (facts, arguments, holdings, orders), and naive chunking destroys cross-section coherence. We propose a rhetorically-informed chunking pipeline that segments documents along argumentative boundaries before abstractive generation. Through systematic analysis we identify and characterize the "Coherence Gap" — a fundamental trade-off between local phrase-level accuracy and global narrative coherence in legal summarization — and propose mitigation strategies through structure-aware segmentation.
@inproceedings{sadhu2025legal,
title = {Structure-Aware Chunking for Abstractive Summarization
of Long Legal Documents},
author = {Sadhu, Saisab and others},
booktitle = {Proceedings of the {JustNLP} Workshop at {IJCAI-AACL} 2025},
year = {2025}
}
AAAI 2026 EGSAI Community Activity
AI tutoring systems are prone to confident errors, sycophantic validation, and pedagogically unsound explanations. We introduce Hierarchical Pedagogical Oversight (HPO), a multi-agent adversarial framework for reliable AI tutoring. HPO employs a hierarchy of specialised agents — a tutor, a challenger, and an overseer — where the challenger actively probes for errors and the overseer arbitrates. An 8B-parameter model structured via HPO outperforms GPT-4o by 3.3% Macro F1 on MRBench, demonstrating that adversarial architectural design can overcome raw model scale for pedagogical reliability.
@inproceedings{sadhu2026hpo,
title = {Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial
Framework for Reliable {AI} Tutoring},
author = {Sadhu, Saisab and others},
booktitle = {{AAAI} 2026 {EGSAI} Community Activity},
year = {2026}
}
Lexsi Labs • Mumbai, India
Bio Medical Data Science Lab, IISER Bhopal • PI: Dr. Tanmay Basu
MIQ Digital • Bengaluru, India
School of Public Policy, IIT Delhi • Guide: Dr. Nandana Sengupta
Indian Institute of Science Education and Research Bhopal
Integrated five-year program combining undergraduate engineering and master's-level research in data science.
Ranked #1 globally on the official "Win Rate vs. Analyst Report" metric. System-generated financial reports were preferred over those of professional human analysts.
Selected to present at AAAI 2026 EGSAI Community Activity — one of 51 works chosen from global submissions.
Awarded by IICE (Funded by DST, Government of India) to develop an AI fintech platform; demonstrated 68% profit increase in backtesting.
Awarded full registration waiver and travel support for poster presentation at the Collaborative for Academic Research Excellence Conference, IIT Guwahati.
I'm always happy to discuss research ideas, collaborations, or ongoing work. Feel free to reach out.