Upcoming

AI Errors

Albert Menkveld, VU Amsterdam

Monday, September 14, 2026 · 9:00 AM MT

Abstract

When AIs are tasked with empirical research, how do their outcomes compare to those of humans? Are the distributions similar? We run an experiment where we let AI models repeat an experiment that was run with 164 human teams. Not surprisingly, distributions differ. The deeper question is: Why? We develop an approach that identifies which decisions on the analysis path drive these differences, which we define as "AI errors." The results show that AI concentrates on a narrow set of analysis paths, yielding markedly lower dispersion. For complex tasks, their estimates are systematically shifted relative to the human benchmark. Fork-level diagnostics and quantile regressions trace these shifts primarily to the choice of the statistical model, e.g., identifying a time trend by adding a stationary trend to a model in levels, or by computing relative changes and taking the average.