MATS empowers researchers to advance AI safety

The ML Alignment & Theory Scholars (MATS) Program is an independent research and educational seminar program that connects talented scholars with top mentors in the fields of AI alignment, interpretability, and governance. For 10 weeks, MATS scholars will conduct research while also attending talks, workshops, and networking events with other members of the Berkeley alignment research community.

The Summer 2024 Program will run Jun 17-Aug 23, 2024 and the Winter 2024-25 Program will run Jan 6-Mar 14, 2025.

General applications for the Summer 2024 Program have closed.

Applications for the Winter 2024-25 Program remain open and will close on Aug 1, 2024.

Our Mission

The MATS program aims to find and train talented individuals for what we see as the world’s most urgent and talent-constrained problem: reducing risks from unaligned artificial intelligence (AI). We believe that ambitious researchers from a variety of backgrounds have the potential to meaningfully contribute to the field of alignment research. We aim to provide the training, logistics, and community necessary to aid this transition. We also connect our scholars with financial support to ensure their financial security. Please see our theory of change for more details.

Program Details

  • MATS is an independent research program and educational seminar that connects talented scholars with top mentors in the fields of AI alignment, interpretability, and governance. Read more about the program timeline and content in our program overview.

    The Summer 2024 Cohort will run Jun 17-Aug 23 in Berkeley, California and feature seminar talks from leading AI safety researchers, workshops on research strategy, and networking events with the Bay Area AI safety community. Applications close on Mar 24. [EDIT: extended to April 7.]

    The Winter 2024-25 Cohort will run Jan 6-Mar 14.

    The MATS program is an initiative supported by the Berkeley Existential Risk Initiative. We have historically received funding from Open Philanthropy, the Survival and Flourishing Fund, DALHAP Investments, and several donors via Manifund. We are accepting donations to support further research scholars.

  • Since its inception in late 2021, the MATS program has supported 213 scholars and 47 mentors. After completion of the program, MATS alumni have:

  • Our ideal applicant has:

    • An understanding of the AI alignment research landscape equivalent to completing the AI Safety Fundamentals Alignment Course;

    • Previous experience with technical research (e.g. ML, CS, maths, physics, neuroscience, etc.) or governance research, ideally at a postgraduate level;

    • Strong motivation to pursue a career in AI alignment research, particularly to reduce global catastrophic risk, prevent human disempowerment, and enable sentient flourishing.

    Even if you do not entirely meet these criteria, we encourage you to apply! Several past scholars applied without strong expectations and were accepted.

    We are currently unable to accept applicants who will be under the age of 18 on June 17, 2024.

  • MATS now accepts rolling applications for future cohorts.

    If you submit an application for Summer 2024 by March 10, your application will be in the first batch considered for admission. We will consider all applications submitted before March 24. [EDIT: Extended to April 7.]

    In late March, applicants who make it past the first stage will receive emails that allow them to indicate which mentors’ streams they wish to apply to. Applicants will receive the corresponding mentor selection problems, which are questions and tasks selected by mentors to evaluate your aptitude as a researcher. Note: some mentors request that you spend as long as ~10 hours on their selection problems.

    We expect to send out all admissions decisions by mid May.

Program Tracks from Winter 2023-24

  • Agent Foundations

    Vanessa Kosoy (MIRI)

    Some systems in the world seem to behave like “agents”: they make consistent decisions, and sometimes display complex goal-seeking behaviour. Can we develop a robust mathematical description of such systems and build provably aligned AI agents?

  • Aligning Language Models

    Ethan Perez (Anthropic)

    Current ML models that predict human language are surprisingly powerful and might scale into transformative AI. What novel alignment failures will future models exhibit, how can we develop demonstrations of those failures, and how can we mitigate them?

  • Concept-Based Interpretability

    Stephen Casper (MIT AAG), Erik Jenner (UC Berkeley CHAI), Jessica Rumbelow (Leap Labs)

    Identifying high-level concepts in ML models might be critical to predicting and restricting dangerous or otherwise unwanted behaviour. Can we identify structures corresponding to “goals” or dangerous capabilities within a model and surgically alter them?

  • Cooperative AI

    Jesse Clifton (Center on Long-Term Risk), Caspar Oesterheld (CMU FOCAL)

    The world may soon contain many advanced AI systems frequently interacting with humans and with each other. Can we create a solid game-theoretic foundation for reasoning about these interactions to prevent catastrophic conflict and incentivize cooperation?

  • Deceptive Alignment

    Evan Hubinger (Anthropic)

    Powerful AI systems may be instrumentally motivated to secretly manipulate their training process. What ML training processes and architectures might lead to this deceptive behavior, and how can it be detected or averted?

  • Developmental Interpretability

    Jesse Hoogland (Timaeus), Daniel Murfet (Timaeus, Uni Melb)

    Singular learning theory (SLT) offers a principled scientific approach to detecting phase transitions during ML training. Can we develop methods to identify, understand, and ultimately prevent the formation of dangerous capabilities and harmful values?

  • Evaluating Dangerous Capabilities

    Owain Evans (University of Oxford), Francis Rhys Ward (Imperial College London)

    Many stories of AI accident and misuse involve potentially dangerous capabilities, such as sophisticated deception and situational awareness, that have not yet been demonstrated in AI. Can we evaluate such capabilities in existing AI systems to form a foundation for policy and further technical work?

  • Mechanistic Interpretability

    Neel Nanda (Google DeepMind), Alex Turner (Google DeepMind), Lee Sharkey (Apollo Research), Adrià Garriga Alonso (FAR AI)

    Rigorously understanding how ML models function may allow us to identify and train against misalignment. Can we reverse engineer neural nets from their weights, similar to how one might reverse engineer a binary compiled program?

  • Provable AI Safety

    David “davidad” Dalrymple (ARIA)

    If we could encode a detailed world model and coarse human preferences in a formal language, it could be possible to formally verify that an AI-generated agent won’t take actions leading to catastrophe. Can we use frontier AI to help create such a detailed multi-scale world model, and/or to synthesize agents and proof certificates for small-scale demonstrations already?

  • Safety Cases for AI

    Buck Shlegeris (Redwood Research)

    When a power company wants to build a nuclear power plant, they’re obligated to provide a safety case: an argument backed by evidence that their power plant is acceptably safe. What’s the shortest path towards AI developers being able to make reliable safety cases for their training and deployment, and how can we start iterating on techniques to fit into these safety cases now?

  • Scalable Oversight

    Asa Cooper Stickland, Julian Michael, Shi Feng, David Rein (NYU ARG)

    Human overseers alone might not be able to supervise superhuman AI in domains that we don’t understand. Can we design systems that scalably evaluate AI and incentivize AI truthfulness?

  • Understanding AI Hacking

    Jeffrey Ladish (Palisade Research)

    Current and near-term language models have the potential to greatly empower hackers and fundamentally change cybersecurity. How effectively can current models assist bad actors and how soon might models be capable of hacking unaided?