The AI Cognition Initiative is a research group focused on the philosophical and technical questions raised by advanced AI systems. As AI technologies continue to rapidly advance, we will need systematic research that bridges the gap between technical understanding and philosophical reflection.
We are motivated by grand philosophical questions about the nature of mind and value that bear on how we will incorporate AI systems into our world.
We believe in the importance of grounding technological, social, and philosophical speculation in data, experimentation, and direct observation.
Our work combines perspectives and techniques from philosophy, machine learning, consciousness research, statistics, and economics to tackle complex and intractable questions.
The production of generally intelligent artificial systems presents society with novel philosophical and practical challenges that extend beyond near-term safety concerns. As AI systems become increasingly sophisticated in their behavior and exhibit what appear to be preferences, reasoning, and even rudiments of self-reflection, we face difficult questions about the appropriate framework for understanding them. Are these systems merely executing statistical patterns, or do they possess something approximating genuine mental states? What would it mean for an artificial system to have interests, experiences, or welfare considerations? These questions are not merely theoretical—the decisions we make now about how to conceive of, interact with, and regulate AI systems may shape technological trajectories for decades to come.
The AI Cognition Initiative will organize and conduct research into AI as both moral agents and moral patients. It will take up and extend several ongoing research programs at Rethink Priorities and expand work in this domain to new topics.
Risk Alignment in Agentic AI Systems:
This project examines how autonomous AI systems should handle risk when making decisions on behalf of users.
As AIs gain the ability to act independently, a critical question emerges: should they be risk-averse, risk-seeking, or somewhere in between?
In three papers, we explore how to align AI risk preferences with diverse user attitudes while establishing ethical guardrails to protect society.
Predicting and Controlling Risk in Agentic AI Systems:
As AI systems gain autonomy to act with minimal supervision, their risk attitudes become critical for safety and trust. An AI that shares our values but has excessive risk tolerance could still cause catastrophic harm—from crashing power grids to faking alignment or sharing dangerous information. This project develops methods to: (1) measure AI risk attitudes and detect recklessness, (2) instill desired risk preferences that resist jailbreaking, and (3) test whether multi-agent interactions pressure cautious AIs into reckless behavior. By pioneering these prediction and control tools now, we can build AI agents that act predictably and avoid dangerous decisions before they're deployed in high-stakes environments.
Estimating the Probability of AI Consciousness
As AI systems advance, assessing whether they might be conscious—and thus deserving of moral consideration—becomes increasingly urgent, yet current methods face multiple layers of uncertainty about consciousness theories, neural correlates, behavioral indicators, and AI architectures. Building on Rethink Priorities' successful Moral Weight Initiative, which estimated welfare ranges across animal species using probabilistic modeling, this project will: (1) evaluate different approaches to modeling AI consciousness, (2) identify measurable proxies for consciousness despite theoretical uncertainties, and (3) develop a prototype Monte Carlo framework that translates these uncertainties into probability estimates for current and future AI systems. This model would provide updated probability estimates as AI capabilities evolve, track historical trends, and help labs and policymakers set appropriate thresholds for precautionary measures regarding potentially conscious AI.
AI Cognitive Mechanisms
Understanding AI moral status requires not just assessing consciousness, but also examining the cognitive mechanisms underlying AI minds—including their capacities for introspection, preferences, and distinct personalities. Through the FIG fellowship program, we have supervised emerging professionals conducting technical research to clarify which AI capabilities are morally relevant and how they function. By refining our understanding of whether and how AI systems can examine their own mental states, form genuine preferences, and develop stable personality traits, this work provides crucial evidence for evaluating AI welfare and moral consideration. These technical investigations complement broader consciousness research by identifying concrete cognitive markers that may indicate morally significant mental lives, helping stakeholders make informed decisions about our responsibilities to the AI systems we create.