Microsoft Research India — Research Intern
Reinforcement learning for LLMs with attention to long-horizon training dynamics and stability. Agentic verifier modules for robust multi-step reasoning evaluation.
researcher · pre-training & post-training of language models
currently at Microsoft Research India · previously Lossfunk & Athena Agents (RL)
I work on the science and craft of training language models — what makes them learn, what makes them stable over long horizons, and how to tell whether they’re actually getting better.
At Microsoft Research India I’m focused on reinforcement learning for LLMs and agentic verifier modules for multi-step reasoning evaluation. Before that, probabilistic forecasting at Lossfunk, model merging & RL post-training at Athena, and a fine-tuning / inference pipeline at Kotoba Research.
Reinforcement learning for LLMs with attention to long-horizon training dynamics and stability. Agentic verifier modules for robust multi-step reasoning evaluation.
First-author paper on LLM forecasting behavior, accepted at the AIR-FM Workshop, AAAI 2026. Calibration, Brier, ECE, and pass@k pipelines for probabilistic forecasting. arXiv:2511.18394 ↗
Built Aryabhatta 1.0, a domain-adapted LLM for JEE Main math — 90.2% accuracy, +35.5 pp over baseline. Model merging (SLERP, TIES) blended with GRPO / REINFORCE post-training.
Computer vision pipelines for segmentation and keypoint estimation; document-extraction chunking; containerized deploy that took per-page processing from 2 min to 3 s.
Fine-tuning and inference pipeline; QAT and LoRA benchmarking across standardized evaluations.
A minimal GPT training playground in pure PyTorch implementing composable 4D parallelism — Data, Tensor, Pipeline, Expert — with manual gradient averaging, Megatron-style sharded linears, GPipe scheduling, and MoE routing.
From-scratch implementations of language & vision papers — GPT-2, Llama, SigLip, speculative decoding — built as a step-by-step pipeline from tokenizers through vision models to fine-tuning.