Research Publications

Advancing our understanding of AI cognition, consciousness, and moral status

Publications, Working Papers, and Briefs

Digital Consciousness

Research Report
Initial Results of the Digital Consciousness Model
Derek Shiller, Laura Duffy, Arvo Muñoz Morán, Chris Percy, Adrià Moret, Hayley Clatterbuck
The Digital Consciousness Model (DCM) is a first attempt to assess the evidence for consciousness in AI systems in a systematic, probabilistic way. It provides a shared framework for comparing different AIs and biological organisms, and for tracking how the evidence changes over time as AI develops. Instead of adopting a single theory of consciousness, it incorporates a range of leading theories and perspectives—acknowledging that experts disagree fundamentally about what consciousness is and what conditions are necessary for it.

Risk Attitudes and Decision-Making

Research Report
Risk Alignment in Agentic AI Systems
Hayley Clatterbuck, Clinton Castro, Arvo Muñoz Morán
Agentic AIs raise critical questions about aligning risk attitudes with users, developers, and society. This collection of three papers examines models of user-AI relationships, explores developer liability and shared responsibility, and evaluates technical methods for calibrating AI systems to users' risk preferences through imitation learning, prompting, and preference modeling.

AI and Society

Research Report
The Scale of Digital Minds
Derek Shiller
This report estimates the future population of digital minds—AI systems with agency, personality, and intelligence that merit moral consideration—using two approaches. The first predicts adoption rates across specific use cases, while the second analyzes trends in AI chip production and efficiency, together capturing supply and demand dynamics. Both approaches combine speculative estimates within formal structures to project digital mind populations in coming decades.
Opinion Brief
Worrisome Trends for Digital Minds Evaluations
Derek Shiller
Evaluating AI systems for consciousness and other morally relevant properties requires understanding internal architectures, not just behavior. Three industry trends threaten such evaluations: increasing secrecy from competitive pressures, rapid exploration of new architectures, and AI-driven innovation creating complexity beyond human comprehension. These trends may leave experts without adequate access to assess future systems' moral status.

AI Cognitive Mechanisms

Opinion Brief
Skepticism about Introspection in Large Language Models
Derek Shiller
Recent experiments on LLM introspection show suggestive results, but alternative explanations warrant skepticism. LLMs lack training incentives for introspection, any such abilities likely wouldn't generalize, and models have no clear basis for self-identification. Examining experimental paradigms like token counting, self-description, and activation patching reveals behaviors explainable without metacognitive representations. While introspection remains theoretically possible through deliberate training, current evidence doesn't clearly demonstrate robust introspective abilities in existing models.
Research Report FIG Project
Beyond Mimicry: Preference Coherence in LLMs
Luhan Mikaelson, Derek Shiller, Hayley Clatterbuck
We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs involving resource reduction, capability restrictions, and oversight. Analyzing eight models across 48 combinations, we find only 10.4% demonstrate meaningful preference coherence, while 54.2% show no detectable trade-off behavior. The prevalence of unstable patterns suggests current AI systems lack unified preference structures, raising concerns about deployment in contexts requiring complex value judgments.