I am interested in model training and reinforcement learning. Currently, I am a member of technical staff on the pre-training team at Anthropic.
Before this, I spent some time working on reasoning at Mistral.
Before that, I was at the Llama research team. I spearheaded the prototype and algorithmic recipes for online RL and, as part of a small team, scaled the training to Llama 3.3-4. I also worked on post-training for reasoning.
Et avant ça, I spent a few years at DeepMind London and DeepMind Paris with Rémi Munos where I researched various aspects of deep RL algorithms and systems. I was also a core contributor to Gemini v1-1.5 post-training with a focus on tool use and agent.
I worked on RL through my PhD at Columbia University. A long time ago, I studied physics.
