Selected Publications
Forecasting Downstream Performance of LLMs With Proxy Metrics
Arkil Patel, Siva Reddy, Marius Mosbach, Dzmitry Bahdanau
Toward Open Weight Models Without Risks: Separating Public and Private Capabilities in LLMs
Arkil Patel*, Charbel El Feghali*, Nicholas Meade, Spandana Gella, Verna Dankers, Siva Reddy
How to Get Your LLM to Generate Challenging Problems for Evaluation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau
Understanding Scaling Laws With Token-Level Analysis
Arkil Patel, Marius Mosbach, Siva Reddy, Dzmitry Bahdanau
DeepSeek-R1 Thoughtology: Let's think about LLM reasoning
Arkil Patel*, Sara Vera Marjanović*, Vaibhav Adlakha, Milad Aghajohari, Parishad BehnamGhader, Mehar Bhatia, Aditi Khandelwal, Austin Kraft, Benno Krojer, Xing Han Lù, Nicholas Meade, Dongchan Shin, Amirhossein Kazemnejad, Gaurav Kamath, Marius Mosbach, Karolina Stańczak, Siva Reddy
Investigating Adversarial Trigger Transfer in Large Language Models
Nicholas Meade, Arkil Patel, Siva Reddy
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Stańczak, Siva Reddy
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau
Revisiting the Compositional Generalization Abilities of Neural Sequence Models
Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal
Are NLP Models really able to Solve Simple Math Word Problems?
Arkil Patel, Satwik Bhattamishra, Navin Goyal