I am interested in model training and reinforcement learning. Currently, I am a member of technical staff on the pre-training team at Anthropic.
Before this, I spent some time working on reasoning at Mistral.
Before that, I was at the Llama research team. I spearheaded the prototype and algorithmic recipes for online RL and, as part of a small team, scaled the training to Llama 3.3-4. I also worked on post-training for reasoning.
Et avant ça, I spent a few years at DeepMind London, where I was a core contributor to Gemini v1-1.5 post-training with a focus on tool use and agent. I also researched various aspects of deep RL algorithms and systems.
Previously, I was a two-time intern at DeepMind Paris hosted by Rémi Munos. I obtained my PhD at Columbia University in New York City.
