Research - AI Cognition Initiative

Digital Consciousness

Research Report

Initial Results of the Digital Consciousness Model

Derek Shiller, Laura Duffy, Arvo Muñoz Morán, Chris Percy, Adrià Moret, Hayley Clatterbuck

The Digital Consciousness Model (DCM) is a first attempt to assess the evidence for consciousness in AI systems in a systematic, probabilistic way. It provides a shared framework for comparing different AIs and biological organisms, and for tracking how the evidence changes over time as AI develops. Instead of adopting a single theory of consciousness, it incorporates a range of leading theories and perspectives—acknowledging that experts disagree fundamentally about what consciousness is and what conditions are necessary for it.

📄 PDF

Risk Attitudes and Decision-Making

Research Report

Risk Alignment in Agentic AI Systems

Hayley Clatterbuck, Clinton Castro, Arvo Muñoz Morán

Agentic AIs raise critical questions about aligning risk attitudes with users, developers, and society. This collection of three papers examines models of user-AI relationships, explores developer liability and shared responsibility, and evaluates technical methods for calibrating AI systems to users' risk preferences through imitation learning, prompting, and preference modeling.

📄 PDF 🔗 arXiv

AI and Society

Research Report

The Scale of Digital Minds

Derek Shiller

This report estimates the future population of digital minds—AI systems with agency, personality, and intelligence that merit moral consideration—using two approaches. The first predicts adoption rates across specific use cases, while the second analyzes trends in AI chip production and efficiency, together capturing supply and demand dynamics. Both approaches combine speculative estimates within formal structures to project digital mind populations in coming decades.

📄 PDF 🔗 arXiv

Opinion Brief

Worrisome Trends for Digital Minds Evaluations

Derek Shiller

Evaluating AI systems for consciousness and other morally relevant properties requires understanding internal architectures, not just behavior. Three industry trends threaten such evaluations: increasing secrecy from competitive pressures, rapid exploration of new architectures, and AI-driven innovation creating complexity beyond human comprehension. These trends may leave experts without adequate access to assess future systems' moral status.

EA Forum

AI Cognitive Mechanisms

Opinion Brief

Skepticism about Introspection in Large Language Models

Derek Shiller

Recent experiments on LLM introspection show suggestive results, but alternative explanations warrant skepticism. LLMs lack training incentives for introspection, any such abilities likely wouldn't generalize, and models have no clear basis for self-identification. Examining experimental paradigms like token counting, self-description, and activation patching reveals behaviors explainable without metacognitive representations. While introspection remains theoretically possible through deliberate training, current evidence doesn't clearly demonstrate robust introspective abilities in existing models.

LessWrong

Research Report FIG Project

Beyond Mimicry: Preference Coherence in LLMs

Luhan Mikaelson, Derek Shiller, Hayley Clatterbuck

We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs involving resource reduction, capability restrictions, and oversight. Analyzing eight models across 48 combinations, we find only 10.4% demonstrate meaningful preference coherence, while 54.2% show no detectable trade-off behavior. The prevalence of unstable patterns suggests current AI systems lack unified preference structures, raising concerns about deployment in contexts requiring complex value judgments.

📄 PDF 🔗 arXiv

Research Publications

Publications, Working Papers, and Briefs

Digital Consciousness

Risk Attitudes and Decision-Making

AI and Society

AI Cognitive Mechanisms