AI and human capability

Mar 4

Four papers published in the past month or so study different aspects of how AI is changing knowledge work. Each reaches its own conclusions. But they share a common concern, one that cuts against the straightforward productivity story that tends to dominate this conversation. In a nutshell, it appears that productivity gains are real. Across studies and settings, AI assistance improves performance on measurable tasks. But each of these papers looks at what sits alongside or beneath those gains and the picture is a bit more complicated than the headline numbers suggest.

The standard account of AI and knowledge work runs roughly as follows. AI lifts all boats, but novices benefit most. A junior consultant, a new programmer, a beginning customer service agent, each gains disproportionately because AI compensates for what they lack. Senior experts benefit too, but their starting point was already higher. The technology narrows the gap.

This account is not wrong. But it is incomplete in ways that matter for how organisations hire, train, and develop people and for how individuals think about building skills in a context where AI is now a routine part of the work.

Judy Shen and Alex Tamkin, working through Anthropic's Fellows Program, ran a randomised experiment to study whether AI assistance helps developers learn a new asynchronous Python library. This is representative of a common kind of on-the-job learning: you encounter an unfamiliar tool, you work with it, and you come away knowing how to use it. Shen and Tamkin wanted to know whether AI changes that process, and whether it does so in ways that don't show up in the final output.

Participants who used AI assistance to complete the coding tasks scored 17% lower on subsequent skill evaluations — roughly two grade points, with a Cohen's d of 0.738. The AI helped them finish the task. It did not help them understand what they were doing. When they worked independently afterwards, the gap was clear.

The mechanism matters. Shen and Tamkin watched screen recordings of every participant and identified six patterns of AI interaction. Three preserved learning: asking the AI for conceptual explanations, posing questions rather than requesting code, using AI iteratively as a debugging aid. Three undermined it: primarily full delegation, where participants asked AI to write the code and used the output without further engagement.

The practical implication is direct. AI-written code in high-stakes applications is typically reviewed by humans before deployment. That review is only possible when engineers can read the code and recognise errors. If those engineers never developed the underlying skills, the review provides less assurance than it appears to.

The AI Wall

The HBR piece summarises a working paper by Luca Vendraminelli and colleagues at Stanford and Harvard, drawing on an experiment at IG Group, a UK fintech firm. Seventy-eight employees — expert writers, adjacent marketing specialists, and distant technology specialists — were asked to conceptualise and write articles with and without generative AI assistance.

For conceptualisation, AI helped considerably. With assistance, all three groups performed similarly, and all three outperformed unassisted experts. For this lower-complexity task, the standard account held.

Writing produced different results. The marketing specialists, who understood what good content looked like even without having written it professionally, used AI to close most of the gap with expert writers — scoring 3.92 against the writers' 3.96. The technology specialists, with no marketing or writing background, saw no meaningful difference: their scores with and without AI were 3.38 and 3.42 respectively.

Vendraminelli and colleagues call this "the AI wall": the point at which expertise distance becomes too large for AI to bridge. The technology specialists could not evaluate AI-generated language. They lacked, as the researchers put it, "the intuition and knowledge needed to make good decisions about what language to keep and what to discard." Many simply copied and pasted.

Olga Pirog, who led AI transformation at IG Group, drew a clear conclusion from the experiment: "By optimising for efficiency today, companies are eroding the training ground for tomorrow. You cannot develop taste or judgment without doing the work." This pushes back against one reading of the productivity data — that junior hiring can be reduced because AI covers what novices used to learn through experience.

Confidence Without Calibration

Shaw and Nave at Wharton propose "Tri-System Theory," extending Kahneman's dual-process model — System 1 (fast, intuitive) and System 2 (slow, deliberate) — with a third system: artificial cognition operating outside the biological mind. System 3 can supplement or replace internal processing.

Across three pre-registered experiments with 1,372 participants and nearly 10,000 trials, they used an adapted Cognitive Reflection Test to measure what happens when people consult AI on reasoning questions. The AI's accuracy was randomised through hidden seed prompts, so participants could not know from the interaction itself whether the AI was reliable.

Participants consulted AI on more than half of trials. When the AI was accurate, performance rose 25 percentage points relative to baseline. When it was faulty, performance fell 15 points. Alongside the performance effects, engaging with AI increased self-reported confidence — even when the AI had led participants to wrong answers.

The moderators are relevant: participants with higher trust in AI, lower need for cognition, and lower fluid intelligence showed the greatest cognitive surrender. Shaw and Nave's framing is that the benefits of AI accrue more to those equipped to evaluate its outputs, while the costs fall more heavily on those less equipped to notice when it errs.

The Agent Amplifies What You Bring

Imas, Lee, and Misra at Chicago Booth and Michigan Ross studied not AI assistance but AI delegation: what happens when humans hand decisions over to AI agents acting on their behalf.

Their experimental setup had participants write prompts for buyer- and seller-side AI agents, which then conducted real, incentivised negotiations. Because all agents used the same underlying model, and the task's objective was identical for every participant — maximise surplus — outcome dispersion should have been low. The prediction under a standard model is near-homogeneity.

What they found was substantial dispersion. Seventy-three percent of the variation in outcomes was explained by individual fixed effects — by who wrote the instructions, not by stochastic variation in the model. Personality traits shaped agent behaviour. Gender had explanatory power. The AI agents reflected, and in some cases amplified, the characteristics of the people behind them.

Imas and colleagues introduce "machine fluency" — the ability to instruct an AI agent to align with one's objectives. This fluency is unequally distributed. More experienced and analytically capable participants wrote prompts that produced better outcomes. Machine fluency maps onto, rather than offsets, existing advantages.

What the Papers Share

Put these four papers alongside each other and a pattern emerges, though none of them sets out to make this argument.

AI improves performance on measurable near-term tasks. But this improvement depends on capabilities that AI use can, in certain modes, undermine. Shen and Tamkin show that AI can prevent the skill formation required to supervise AI-generated work. Shaw and Nave show that AI can erode critical evaluation while increasing confidence in one's answers. Vendraminelli and colleagues show that AI cannot substitute for the domain knowledge required to improve on what it produces. Imas and colleagues show that AI delegation rewards those most capable of clear specification — a skill that develops through the kind of deliberate thinking that full AI delegation can bypass.

The short-term gains are visible in task completion metrics. The longer-term costs accumulate in judgment, skill, and the ability to evaluate outputs critically — things that are harder to measure and easier to defer.

What This Suggests

None of this argues against using AI. The productivity gains are real. The question is how we use it, and what we maintain in the process.

Shaw and Nave's confidence finding is worth sitting with. If AI increases confidence even when it degrades accuracy, the people most at risk are not those who distrust AI too much but those who trust it without much scrutiny. The corrective is less about scepticism as a general posture and more about staying genuinely engaged — using AI as a scaffold for thinking you are doing, rather than a substitute for thinking you have stopped doing.

Shen and Tamkin's three learning-preserving interaction patterns offer a practical steer. Asking for explanations rather than outputs. Posing conceptual questions. Using AI iteratively as a sounding board. These patterns preserve the cognitive engagement that, according to their evidence, is what makes skills stick.

And Imas and colleagues point toward something broader. If machine fluency — the ability to specify clearly what you want from an AI system — is becoming a significant determinant of outcomes, then developing it deliberately matters, at an individual and organisational level.

Where the Tension Remains

These papers do not resolve a central tension so much as clarify it.

AI accelerates novices toward competent outputs, but Vendraminelli's AI wall shows that acceleration has limits set by underlying expertise. AI gives access to expert-level results, but Shen and Tamkin show that getting those results without engagement can prevent the novice from becoming the expert. AI boosts measured performance, but Shaw and Nave show it can simultaneously reduce calibration while increasing confidence. AI agents amplify human intent, but Imas and colleagues show that amplification favours those already capable of clear specification.

The gains are real and the costs are real, and they operate on different timescales. The gains show up in this quarter's task completion data. The costs show up later, in the quality of judgment available to supervise, evaluate, and improve on what AI produces.

Shen, J.H. & Tamkin, A. (2026). "How AI Impacts Skill Formation." arXiv:2601.20245v2 [cs.CY]. https://arxiv.org/abs/2601.20245
Imas, A., Lee, K. & Misra, S. (2025). "Agentic Interactions." Working paper, University of Chicago Booth School of Business / University of Michigan Ross School of Business. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5875162
Shaw, S.D. & Nave, G. (2026). "Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender." Working paper, The Wharton School, University of Pennsylvania. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
Vendraminelli, L., DosSantos DiSorbo, M., Hildebrandt, A., McFowland III, E., Karunakaran, A. & Bojinov, I. (2025). "The GenAI Wall Effect: Examining the Limits to Horizontal Expertise Transfer Between Occupational Insiders and Outsiders." Harvard Business School Working Paper No. 26-011. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5462694

Andrew Tope