August 27, 2024

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work

In this article:

Introduction:
- The team recently shared a detailed recap with the AI/AF community, outlining their latest efforts and motivations.
- The goal is to help others build on their work and understand how different projects align.
Team Overview:
- The team, formerly known as Google DeepMind’s main technical group focused on existential risk from AI systems, has now evolved into the AGI Safety & Alignment team.
- This team is divided into subteams:
  - AGI Alignment: Includes mechanistic interpretability and scalable oversight.
  - Frontier Safety: Focuses on the Frontier Safety Framework and dangerous capability evaluations.
- The team has experienced significant growth, increasing by 39% last year and 37% so far this year.
Leadership:
- The leadership team consists of Anca Dragan, Rohin Shah, Allan Dafoe, and Dave Orr, with Shane Legg serving as the executive sponsor.
- They operate under the larger AI Safety and Alignment organization, which is led by Anca Dragan.
Related Teams:
- Gemini Safety: Focuses on safety training for current Gemini models.
- Voices of All in Alignment: Concentrates on alignment techniques that consider value and viewpoint pluralism.