|
Research
My research is centred around Large Language Models. More specifically, I am interested in building next generation LLMs that can continually and scalably learn from interactions. To this end, my work includes
- Training dynamics of SFT and RL: We were the first to empirically show that RL causes sparse parameter updates to a base LLM. For eg. when training DeepSeek R1 Zero from DeepSeek V3 base, 86% weights were never updated.TweetPaperCode
- Verification in Natural Language: We showed that LLMs are able to find the structure of deductive reasoning by finding the correct premises. By a simple intervention of 1. first finding the correct set of premises, and 2. verifying steps under the premises, we improve verification significantly. TweetPaperCode
- Agentic Reasoning: Our multi-agent information aggregation framework (web agent based) improved multi-hop QA performance significantly. TweetPaperCode
- Social Reasoning: In our survey, we were the first to call out that there is a severe over-simplification of the concept of "culture" in LLMs-and-culture research, and provided recommendations on the measures to be taken TweetPaper. In another work, we showed that the concept of bias in LLMs is not systematic, but rather random. TweetPaper
|
- September 2025: Received the Neurips scholar award! See you at San Diego.
- September 2025: Our RL subnetwork paper was accepted at Neurips 2025 !
- May 2025: Starting my summer internship at AIIL group, Microsoft Research. I will be working on how LLMs behave in an ambiguous underspecified world.
- May 2025: We released new research showing that RL causes sparse updates (naturally) to a base model.
- May 2025: PARC got accepted at ICML 2025!
- Feb 2025: New pre-print on Chain-of-thought verification. We showed that LLMs can verify chain-of-thoughts better via a simple structure induction.
|
|
|
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur, Hao Peng
NeurIPS 2025
RL causes sparse updates to a base model.
|
|
|
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
Sagnik Mukherjee*, Abhinav Chinta*, Takyoung Kim, Tarun Anoop Sharma, Dilek Hakkani-Tür
ICML 2025
Structure induction improves error identification.
|
|
|
Towards Measuring and Modeling "Culture" in LLMs: A Survey
Muhammad Farid Adilazuarda*, Sagnik Mukherjee*,Pradhyumna Lavania, Siddhant Singh, Ashutosh Dwivedi, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, Monojit Choudhury
EMNLP 2024 (main)
We propose a taxonomy of how the community has been studying culture so far.
|
|
|
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
Sagnik Mukherjee∗, Muhammad Farid Adilazuarda∗, Sunayana Sitaram, Kalika Bali, Alham Fikri Aji, Monojit Choudhury
EMNLP 2024 (main)
Studies the (in)-efficacy of sociodemographic prompting.
|
|