The Economics of Looking Away: Why Better AI Makes Human Oversight Harder to Achieve
A working paper published in December 2025 by Hamsa Bastani and Gerard Cachon of the Wharton School contains a finding that, on first encounter, seems almost perverse. As artificial intelligence becomes more reliable, they argue, it becomes economically harder to maintain the human oversight that everyone agrees is essential. The wage required to incentivize a worker to actually inspect AI output scales inversely with the error rate. When AI works correctly 99% of the time, the cost of vigilance approaches the prohibitive.
This is not a behavioural quirk or a coordination failure. It emerges from the mathematics of incentive-compatible contracts between employers and employees, what economists call the principal-agent problem. The implications extend well beyond academic interest. If Bastani and Cachon are right, then the widespread assumption that "human in the loop" represents a viable safety architecture rests on foundations that may not hold.
This paper complements a recent essay by Johann Rehberger, a security researcher, who borrows from Diane Vaughan's analysis of the Challenger disaster to describe what he calls "The Normalization of Deviance in AI." The two pieces illuminate the same phenomenon from different angles: Rehberger documents the cultural symptoms, while the Wharton researchers diagnose the underlying economic disease.
Vaughan's original work on the Challenger disaster, The Challenger Launch Decision (1996), introduced the concept of "normalization of deviance" to organisational theory. Her argument was that NASA had not made a single catastrophic misjudgement on the morning of 28 January 1986. Rather, engineers had observed O-ring erosion on previous flights, noted that the shuttles had returned safely, and gradually recalibrated their understanding of acceptable risk. What had once been a warning signal became background noise. The deviation was normalised because nothing bad had happened yet.
Rehberger applies this framework directly to the AI industry. His argument, stated plainly: we are treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe. Major vendors—Microsoft, OpenAI, Anthropic, Google—simultaneously promote agentic AI capabilities while warning in their documentation that these systems may perform unintended actions, exfiltrate data, or be manipulated by attackers. The warnings become ritual disclaimers. The AI keeps working. Teams stop questioning the shortcuts.
"The deviation," Rehberger writes, "does not happen through a single reckless decision but through a series of 'temporary' shortcuts—patches in place of fixes, skipped reviews, delayed audits. Each lapse, viewed alone, seems minor or expedient. But these small deviations accumulate."
This is not a moral failing on the part of product teams. It is, the Wharton research suggests, an economically predictable outcome.
Bastani and Cachon's paper formalises the problem using principal-agent theory, the economic framework for analysing situations where one party (the principal) delegates work to another (the agent) whose effort cannot be directly observed. A principal (employer) wants an agent (worker) to use AI tools productively while maintaining human oversight. The AI produces good output with probability (1-p), where p represents the error rate. When the AI errs, the agent can inspect the output, identify the problem, and fix it but inspection is costly. The agent might instead simply accept the AI output without checking, saving effort but risking the occasional failure.
The fundamental tension is that the principal cannot directly observe whether the agent actually inspects AI output or simply accepts it. The agent, being rational, will choose whichever option maximises their utility, and inspection costs them effort while acceptance costs them only occasionally, when errors slip through.
The question becomes: what wage does the principal need to offer to incentivize genuine inspection?
The answer turns out to be w = (C₀ + pC₁)/p, where C₀ is the fixed cost of inspection and C₁ is the additional cost of fixing problems when they're found. This formula has a clear property: as p approaches zero (as AI becomes more reliable), the required wage approaches infinity. The mathematics are relentless. A system that fails 10% of the time requires a certain oversight premium. A system that fails 1% of the time requires roughly ten times that premium. A system that fails 0.1% of the time? Roughly a hundred times.
This creates what the authors call the "human-AI contracting paradox." When AI is unreliable, human oversight is both necessary and affordable. When AI becomes highly reliable, human oversight remains necessary, errors, though rare, may still be catastrophic but becomes economically prohibitive to incentivize.
Faced with this dynamic, organisations make rational but troubling choices. In some parameter regions, principals simply ban AI use entirely, forgoing productivity gains to avoid the oversight problem. In others, they embrace AI but abandon meaningful oversight, accepting the risk of occasional failures. In still others, they perversely prefer less reliable AI, because a tool that fails more often is cheaper to monitor.
The Bastani-Cachon model provides the micro-foundations for exactly the cultural drift Rehberger describes. When he observes that "systems continue to work" and "teams stop questioning the shortcuts," he is describing the surface manifestation of the contracting paradox. The economic cost of incentivizing human vigilance increases hyperbolically as AI becomes more reliable. Organisations rationally abandon oversight because maintaining it costs more than accepting occasional failures.
Consider Rehberger's observation that vendors simultaneously promote agentic AI while warning of security risks. This apparent contradiction makes sense through the lens of the contracting model. Vendors must warn users for liability reasons while knowing that users face the same economic constraints they do. The warnings become unactionable because the incentive structures make adequate oversight prohibitively expensive.
The Challenger parallel deepens when viewed through both lenses. NASA normalised O-ring erosion because previous flights had succeeded—the absence of disaster was mistaken for safety. The contracting model explains why this mistake is economically rational rather than merely culturally lazy. The agent (engineer) faces a gamble: exert costly inspection effort that will usually reveal everything is fine anyway, or skip the inspection and accept a small probability of failure. When failure probability is low, the expected cost of skipping (wage × probability of failure) is much less than the certain cost of inspection. The rational agent shirks unless paid a premium. The rational principal may refuse to pay that premium because it exceeds the expected loss from occasional failures.
The Challenger crew died because schedule pressure dominated safety warnings. The same dynamic emerges endogenously in human-AI systems: incentives to save on inspection costs dominate incentives to maintain vigilance against rare failures.
One result from the Bastani-Cachon paper offers a more hopeful trajectory. In an extension examining asymmetric information about task suitability, the authors show that principals can actually benefit from higher variance in AI reliability—provided humans can judge which tasks the AI handles well and which it handles poorly.
This finding is counterintuitive. Conventional wisdom suggests that consistent, reliable tools are preferable to inconsistent ones. Who would choose a system that works brilliantly sometimes and fails spectacularly other times over one that performs adequately across the board?
The answer lies in the economics of oversight allocation. A specialised tool that is predictably excellent on some tasks and predictably poor on others outperforms a mediocre generalist. Why? Because the human's expertise in task-matching becomes economically valuable. Rather than maintaining universal vigilance across all AI outputs (prohibitively expensive), the human allocates oversight selectively to high-risk tasks while trusting AI on tasks where it reliably excels.
The key is predictability of variance, not variance itself. If humans can accurately judge which tasks fall within the AI's competence and which do not, they can concentrate their limited oversight resources where they matter most. The cognitive burden shifts from "inspect everything" to "route correctly"—a more sustainable proposition.
Rehberger, drawing on work by Dell'Acqua and colleagues on "jagged technological frontiers," arrives at a similar practical conclusion from different premises. His recommendation—that "high-risk workflows can be done with proper threat modeling, mitigations and oversight" while "many low stakes workflows can be implemented already today"—aligns precisely with the theoretical insight. The human's role shifts from universal inspector to expert allocator.
This has implications for how organisations should think about AI tool selection. The conventional assumption favours general-purpose models that perform adequately across domains. The variance finding suggests that a portfolio of specialised tools, each clearly excellent or clearly limited on defined tasks, may produce better outcomes than a single capable-but-mediocre generalist—because specialisation makes human judgment about where to apply oversight economically viable.
Rehberger's recommendations focus heavily on technical controls: sandboxes, hermetic environments, least-privilege architectures, temporary credentials. These are sensible engineering practices. But the economic model suggests they are insufficient on their own.
Technical controls still require monitoring. Someone must verify that the sandbox is actually constraining agent behaviour, that least-privilege policies are correctly configured, that temporary credentials are actually expiring. Monitoring is precisely the inspection cost that the contracting paradox makes prohibitively expensive to incentivize. Unless organisations can restructure the underlying economics—not just the technical architecture—the same drift toward under-oversight will recur.
This points toward a harder conclusion. If organisations cannot efficiently incentivize internal oversight through standard employment contracts, external accountability mechanisms may be necessary to align organisational incentives with systemic safety. The candidates are familiar from other regulated industries: strict liability for AI failures (making organisations bear the full cost of errors regardless of fault), mandatory audit requirements (imposing oversight from outside the organisation), or insurance mandates (forcing organisations to internalise expected failure costs through premiums set by actuaries who have reason to care about actual risk).
None of these mechanisms are simple to implement, and each carries its own distortions. But the economic analysis suggests that voluntary self-regulation—the current dominant model—systematically under-provides oversight when AI systems are sufficiently reliable to make vigilance expensive.
Several important questions remain unaddressed by either analysis.
The Bastani-Cachon model treats AI failure probability as exogenous, a fixed characteristic of the technology that principals and agents observe and contract upon. Rehberger, writing from a security perspective, implicitly challenges this assumption. Adversaries can manipulate the failure probability. If attackers know that organisations rationally under-invest in oversight when AI is "usually reliable," they have incentive to craft attacks that exploit this blind spot. The normalisation of deviance becomes a strategic target.
Neither source examines how liability regimes interact with the contracting paradox. Would strict liability for AI failures change the principal's calculus enough to make oversight investment worthwhile? Would it simply drive organisations to ban AI use entirely? Would it create perverse incentives to use AI covertly, avoiding formal processes that might establish liability?
The model also restricts attention to single interactions. In most organisational contexts, principals observe agent performance over time and can update beliefs, adjust compensation, or implement reputation mechanisms. Repeated game dynamics might substantially alter the contracting friction. The authors acknowledge this limitation but do not formally analyse how repeated interactions would modify their results.
From my own experience working on AI tool adoption in enterprise settings, I've observed a pattern that maps onto both analyses. Early in a deployment, teams maintain high vigilance. Every AI output is reviewed, errors are documented, feedback loops are tight. As the system demonstrates reliability over weeks and months, review practices gradually relax. The deviation normalises.
The question of what to do about this remains genuinely open. Technical controls matter, but cannot solve the economic problem alone. Liability and regulatory frameworks might help, but carry their own costs and distortions. Specialised tools with clear capability boundaries offer one path forward, but require organisations to resist the appeal of general-purpose systems.
What emerges clearly from considering these perspectives together is that "human in the loop" as currently practiced is not the safety architecture it is often assumed to be. The loop exists on paper. Whether anyone is actually in it, watching carefully, is a function of incentive structures that systematically favour disengagement.
The normalization Rehberger describes is not a moral failing. It is an economically predictable outcome of existing incentive structures. Changing the outcome requires changing those structures, a harder problem than any technical intervention alone.
-
The Human-AI Contracting Paradox (2025)
The Normalization of Deviance in AI (2025)
The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA
Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality