Yunhao (Robin) Tang

I am interested in model training and reinforcement learning. Currently, I am a member of technical staff on the pre-training team at Anthropic.

Before this, I spent some time working on reasoning at Mistral.

Before that, I was at the Llama research team. I spearheaded the prototype and algorithmic recipes for online RL and, as part of a small team, scaled the training to Llama 3.3-4. I also worked on post-training for reasoning.

Et avant ça, I spent a few years at DeepMind London, where I was a core contributor to Gemini v1-1.5 post-training with a focus on tool use and agent. I also researched various aspects of deep RL algorithms and systems.

Previously, I was a two-time intern at DeepMind Paris hosted by Rémi Munos. I obtained my PhD at Columbia University in New York City.

Yunhao (Robin) Tang