Our Thoughtology paper investigating the reasoning chains-of-thoughts of Large Reasoning Models like DeepSeek-R1 has been published at TMLR!
Our paper proposing SafeArena, a benchmark for evaluating the safety of autonomous web agents is accepted at ICML 2025!
Our paper on AI safety investigating the transferability of adversarial triggers in LLMs has been accepted to TACL!
I'm a visiting graduate student at the Simons Institute at UC Berkeley as a part of their special year on LLMs and Transformers.
Our paper proposing the CHASE method to automatically generate challenging synthetic data for evaluating LLMs is out!
Presented my AI2 internship work on evaluating code generation in LLMs at NAACL 2024 in Mexico City!
How to Get Your LLM to Generate Challenging Problems for Evaluation
, Siva Reddy, Dzmitry Bahdanau
Preprint
pdf
code
abstract
DeepSeek-R1 Thoughtology: Let’s think about LLM reasoning
Sara Vera Marjanović*, , Vaibhav Adlakha, Milad Aghajohari, Parishad BehnamGhader, Mehar Bhatia, Aditi Khandelwal, Austin Kraft, Benno Krojer, Xing Han Lù, Nicholas Meade, Dongchan Shin, Amirhossein Kazemnejad, Gaurav Kamath, Marius Mosbach, Karolina Stańczak, Siva Reddy
TMLR'26
pdf
code
abstract
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade, , Dongchan Shin, Alejandra Zambrano, Karolina Stanczak, Peter Shaw, Christopher Pal, Siva Reddy
CoLM'25
pdf
code
abstract
Safearena: Evaluating the safety of autonomous web agents
Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, , Esin Durmus, Spandana Gella, Karolina Stańczak, Siva Reddy
ICML'25
pdf
code
abstract
Universal Adversarial Triggers Are Not Universal
Nicholas Meade, , Siva Reddy
TACL'25
pdf
code
abstract
Evaluating In-Context Learning of Libraries for Code Generation
, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi
NAACL'24
pdf
code
abstract
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Satwik Bhattamishra, , Phil Blunsom, Varun Kanade
ICLR'24 [Oral]
pdf
code
abstract
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau
EMNLP'23 [Oral]
pdf
code
abstract
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
Satwik Bhattamishra, , Varun Kanade, Phil Blunsom
ACL'23
pdf
code
abstract
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
Ankur Sikarwar, , Navin Goyal
EMNLP'22 [Oral]
pdf
code
abstract
Revisiting the Compositional Generalization Abilities of Neural Sequence Models
, Satwik Bhattamishra, Phil Blunsom, Navin Goyal
ACL'22
pdf
code
abstract
Are NLP Models really able to Solve Simple Math Word Problems?
, Satwik Bhattamishra, Navin Goyal
NAACL'21
pdf
code
abstract
article
On the Computational Power of Transformers and its Implications in Sequence Modeling
Satwik Bhattamishra, , Navin Goyal
CoNLL'20
pdf
code
abstract
VehicleChain: Blockchain-based Vehicular Data Transmission Scheme for Smart City
, Naigam Shah, Trupil Limbasiya, Debasis Das
IEEE SMC'19
pdf