The map is not the territory: AI and synthetic data


I recently re-read John Kay’s 2011 essay “The Map is Not the Territory: An Essay on the State of Economics” . It’s nearly 15 years old from one of the UK’s leading economists. Kay argues how economists leading up to the 2008 financial crisis mistook their models for reality. Kay’s point about economists building “complete artificial worlds” that can be “put on a computer and run” feels relevant to our current excitement around AI-generated consumer insights via synthetic data.

The MRS Advanced Data Analytics group points to some benefits: better privacy protection, the ability to expand small sample sizes, and strong correlations with real-world datasets (see links in comments). Some practitioners report good results too - see a recent study by Stanford and others using qualitative interviews and the US General Social Survey to create synthetic responses with decent accuracy.

But Kay’s insights about economic models are worth bearing in mind. Economic models leading up to 2008 failed not because they were mathematically wrong, but because economists confused modelling outputs with reality itself. They built systems that were “akin to Tolkien’s Middle Earth, or a computer game like Grand Theft Auto”—complete worlds where “everything about them is either known, or can be made up”. Synthetic data has a similar issue. When LLMs generate consumer responses, we’re not accessing real consumer thinking—we’re exploring patterns in the training data. The responses might correlate with human surveys, but there’s an important difference: real consumers build meaning on the spot, influenced by context and emotion. Synthetic responses come from algorithmic patterns.

Kay’s thinking suggests that we should simply treat synthetic responses, like economic models, as “potentially illuminating abstractions,” not a replacement for reality. This means using synthetic data primarily as supplements to real data and human judgment, rather than replacement.

The path forward probably isn’t about rejecting synthetic data outright, but remembering Kay’s lesson that “the map is not the territory”.

Previous
Previous

The Economics of Looking Away: Why Better AI Makes Human Oversight Harder to Achieve

Next
Next

Hedgehog or Fox?