Sagnik Mukherjee

I am a Second year PhD student at the University of Illinois, Urbana Champaign, advised by Prof. Dilek Hakkani-Tur and Prof. Hao Peng. Previously I worked at the Bing Ads Team at Microsoft, where I worked on Dense and Generative retrieval for Ads Snippets. I also spent 3 months as a Research Assistant at MBZUAI working with Prof. Monojit Choudhury.

I graduated from Indian Institute of Technology Kanpur in 2021 with a bachelors degree in Electrical Engineering. While at IIT Kanpur, I worked with Prof. Ashutosh Modi and Prof. Arnab Bhattacharya

Email / CV / LinkedIn / Google Scholar / Twitter

Research

My research is centred around Large Language Models. More specifically, I am interested in building next generation LLMs that can continually and scalably learn from interactions. To this end, my work includes

Training dynamics of SFT and RL: We were the first to empirically show that RL causes sparse parameter updates to a base LLM. For eg. when training DeepSeek R1 Zero from DeepSeek V3 base, 86% weights were never updated.Tweet Paper Code
Verification in Natural Language: We showed that LLMs are able to find the structure of deductive reasoning by finding the correct premises. By a simple intervention of 1. first finding the correct set of premises, and 2. verifying steps under the premises, we improve verification significantly. Tweet Paper Code
Agentic Reasoning: Our multi-agent information aggregation framework (web agent based) improved multi-hop QA performance significantly. Tweet Paper Code
Social Reasoning: In our survey, we were the first to call out that there is a severe over-simplification of the concept of "culture" in LLMs-and-culture research, and provided recommendations on the measures to be taken Tweet Paper. In another work, we showed that the concept of bias in LLMs is not systematic, but rather random. Tweet Paper

News

September 2025: Received the Neurips scholar award! See you at San Diego.
September 2025: Our RL subnetwork paper was accepted at Neurips 2025 !
May 2025: Starting my summer internship at AIIL group, Microsoft Research. I will be working on how LLMs behave in an ambiguous underspecified world.
May 2025: We released new research showing that RL causes sparse updates (naturally) to a base model.
May 2025: PARC got accepted at ICML 2025!
Feb 2025: New pre-print on Chain-of-thought verification. We showed that LLMs can verify chain-of-thoughts better via a simple structure induction.

Selected Papers

	Reinforcement Learning Finetunes Small Subnetworks in Large Language Models Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur, Hao Peng NeurIPS 2025 RL causes sparse updates to a base model.
	Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs Sagnik Mukherjee, Abhinav Chinta, Takyoung Kim, Tarun Anoop Sharma, Dilek Hakkani-Tür ICML 2025 Structure induction improves error identification.
	Towards Measuring and Modeling "Culture" in LLMs: A Survey Muhammad Farid Adilazuarda, Sagnik Mukherjee,Pradhyumna Lavania, Siddhant Singh, Ashutosh Dwivedi, Alham Fikri Aji, Jacki O'Neill, Ashutosh Modi, Monojit Choudhury EMNLP 2024 (main) We propose a taxonomy of how the community has been studying culture so far.
	Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting Sagnik Mukherjee∗, Muhammad Farid Adilazuarda∗, Sunayana Sitaram, Kalika Bali, Alham Fikri Aji, Monojit Choudhury EMNLP 2024 (main) Studies the (in)-efficacy of sociodemographic prompting.

Template from here