Jan Leike leads new research project at Anthropic, steps away from alignment

Jan Leike is embarking on a new research project at Anthropic, stepping back from his role leading the alignment team, which he has transferred to Ethan Perez and Spencer Price. Leike expressed excitement about his new endeavor, noting that while alignment is crucial, it represents just one aspect of the broader challenges associated with advancing artificial general intelligence (AGI). Anthropic continues to address various facets of AGI development, including their work on Claude’s constitution and automated alignment, which involve AI agents that excel in proposing and testing ideas.

Anthropic: Anthropic is an AI safety and research company working to build reliable, interpretable, and steerable AI systems such as the Claude models featuring Constitutional AI. The company recently published research on automated alignment researchers that accelerate progress on scalable oversight challenges. It is the host for Jan Leike’s new research project focused on broader aspects of safe AGI development.
Jan Leike: Jan Leike is an AI alignment researcher at Anthropic, previously co-leading the Superalignment team at OpenAI and working at DeepMind. He recently led Anthropic’s alignment science team and contributed to advancements like automated alignment research. In the news, he announces stepping away from running alignment to lead a new project emphasizing multiple factors needed for AGI to benefit humanity.

Alignment Outlook: While progress includes Claude’s constitution and automated alignment research, supervising superhuman models remains an unsolved challenge.
Automated Alignment: Anthropic developed AI agents that propose ideas, run experiments, and outperform humans on scalable oversight research problems.
Leadership Transition: Jan Leike has handed over leadership of Anthropic’s alignment team to Ethan Perez and Spencer Price to focus on his new project.