I am interested in model training and reinforcement learning. Currently, I am a member of technical staff on the pre-training team at Anthropic.
Before this, I spent some time researching reasoning at Mistral. Before that, I spearheaded the prototype and algorithmic recipes for online RL at Llama research, and as part of a small team, scaled the training to Llama 3.3-4.
Et avant ça, I spent a few years at DeepMind London and DeepMind Paris with Rémi Munos where I researched various aspects of deep RL algorithms and systems. I was also a core contributor to Gemini v1-1.5 post-training with a focus on tool use and agent.
I worked on RL through my PhD at Columbia University. Initially, I studied physics.
