Doing and Observing in (Large) Sequence Models. A Discussion of Auto-suggestive Delusions
Event: Seminar
Location: Hall 214 (“Google”), Faculty of Mathematics and Computer Science, University of Bucharest (14 Academiei Street)
8 May 2025, 14.00-16.00
Andreea EȘANU (NEC Alumna), (non-tenure) Assistant Professor, Faculty of Philosophy, University of Bucharest; Coordinator of the research group Technology – Culture – Humanities, New Europe College
In recent artificial intelligence (AI) literature, so-called hallucinations or delusions—instances where generative models produce false or unfounded outputs—have raised legitimate concerns about the trustworthiness and applicability of AI systems in real-world contexts. While much of the literature frames these delusions as surface-level failures of content fidelity, a more profound explanation lies in the structural underpinnings of these models—specifically, in the presence of confounding variables and their treatment under causal inference frameworks like Judea Pearl’s do-calculus.
A compelling causal account of such delusions, particularly those termed auto-suggestive delusions, was developed recently by Ortega et al (2021) in the context of (large) sequence models, which are embedded in a great number of current generative AI models (including the GPT class of language models). According to this account, auto-suggestive delusions arise not merely from data scarcity or exposure bias but from a deeper conflation in the model of observations (ground-truth data) and actions (model-generated data) during inference. This misidentification introduces delusions whereby the model interprets its own outputs as evidence about the world—a self-reinforcing effect that systematically distorts its representation of reality.
This leads to broader philosophical implications. If a model recursively treats its own outputs as valid observations, it may gradually drift into a self-referential epistemic loop—where it no longer requires, nor aligns with, external validation. Such a model isn’t merely biased; rather, it might be said it reasons according to an internal causal model that’s just cut off from the world, albeit coherent.
This presentation proceeds in four parts. First, I will review exposure bias and its limitations in explaining AI models’ delusions. Second, following Ortega et al (2021) I will introduce Pearl’s causal inference framework that reveals how confounding operates in (large) sequence models. Third, I will examine the nature of auto-suggestive delusions through the lens of causal inference. Finally, I will argue that these findings suggest that (large) sequence models and the AI models implementing them, under certain conditions, evolve toward solipsistic behavior—a state in which they create internally coherent but externally ungrounded representations of the world.
*
This seminar is organized within the research seminar series of The Institute for Logic and Data Science (www. ilds.ro).