Is AI killing critical thinking?
A survey of 319 knowledge workers in 2025 by Carnegie Mellon and Microsoft Research revealed a troubling correlation. Researchers asked participants to share real examples of using AI tools at work and to report on their critical thinking during each task. The pattern that emerged was counterintuitive: the more confidence people had in AI's ability to do a task, the less critical thinking they applied to its output. The statistical relationship was strong and significant. Meanwhile, those with higher confidence in their own abilities invested more effort in evaluating what AI produced, even though it felt harder. The workers who trusted AI most were, quite literally, thinking the least. The very features that make AI tools helpful, their fluency, their speed, their apparent competence may actually be systematically undermining the human capacity to evaluate their output.
Advait Sarkar, a researcher at Microsoft and the University of Cambridge and co-author of the Carnegie/Microsoft study, has argued that: "The most important challenge for generative AI is not hallucinations but critical thinking." The distinction matters. Hallucinations are errors in the AI's output. The critical thinking problem is a change in what humans do, or don't do, with that output.
Sarkar's framework, developed with colleagues at Microsoft Research, describes this as a shift "from material production to critical integration." When AI handles the work of generating content such as drafting, summarising, coding, analysing, the human role shifts toward evaluating, steering, and verifying what the AI has produced. This sounds manageable. The difficulty is that the same efficiency gains that make AI attractive also reduce the opportunities to practice the judgment required to evaluate it.
This pattern has a name in the automation literature. In 1983, industrial psychologist Lisanne Bainbridge identified what she called the "ironies of automation." By mechanising routine tasks and leaving only exception-handling to humans, automation deprives users of the regular practice that keeps their skills sharp. When exceptions arise, when the AI produces something subtly wrong then the atrophied human is least prepared to catch it. The irony is structural: the more reliable the system, the more catastrophic the consequences when it fails and the less capable the human is of noticing.
The Carnegie Mellon and Microsoft researchers didn't just ask whether people were thinking critically. They examined how knowledge workers described their own practices, mapping these descriptions against Bloom's taxonomy of cognitive activities: recall, comprehension, application, analysis, synthesis, and evaluation.
The qualitative findings are instructive. When workers did engage critically with AI output, their activities clustered into three categories. First, verification: cross-referencing AI claims against external sources, checking generated code against documentation, consulting their own domain expertise. Second, integration: selecting which parts of the output to use, adjusting tone and style, combining AI-generated content with other materials. Third, stewardship: translating intentions into effective prompts, iterating when outputs missed the mark, maintaining accountability for the final work product.
These activities require effort. They feel harder than simply accepting what AI produces. And in 41% of the examples participants shared, they didn't happen at all.
The reasons workers gave for not thinking critically are revealing. Many said the task seemed "trivial" or "secondary"—not worth the mental investment of careful evaluation. Others described trust: prior positive experiences had created a mental model in which AI was simply assumed to be competent for certain kinds of work. As one participant explained, "With straightforward factual information, ChatGPT usually gives good answers." The assumption became a default.
The researchers call this an "awareness barrier." Critical thinking requires recognising that it's needed. When AI output is fluent and plausible, the cue to question it may never arrive.
The statistical model the researchers built tells a more precise story. Controlling for task type, user demographics, and overall tendency toward reflection, two confidence measures stood out.
Confidence in AI doing the task showed a strong negative correlation with perceived enaction of critical thinking. The more confident the worker was in AI's capability, the less they reported engaging critically. This held across task types—creative work, information retrieval, advice-seeking—and persisted even when researchers controlled for whether the worker trusted AI in general.
Confidence in one's own ability showed the opposite pattern. Workers who were more confident in their own skills reported more critical engagement with AI output, particularly during evaluation and application stages. Crucially, they also perceived these activities as requiring more effort, yet they did them anyway.
The workers best positioned to catch AI errors, those with domain expertise and self-confidence, are investing the cognitive effort to do so. The workers most likely to miss errors,those who lack expertise or defer to the machine, are the least likely to check. The gap between skilled and unskilled use of AI may be widening, not because of access to better tools, but because of differences in the human disposition to question them.
Recent MIT research points in the same direction. Students who used AI to help structure their essays showed reduced executive brain activity compared to unaided students. The effect carried over: when they later wrote essays without AI assistance, the reduction persisted. The tool designed to help them think had changed how their brains worked—even when the tool wasn't present.
What would it mean to design AI systems that strengthen critical thinking rather than erode it?
The Microsoft Research team has proposed one approach: "co-audit" tools that help users verify AI-generated content. The term is carefully chosen. These are not systems that audit themselves, the approach that failed spectacularly when a New York lawyer asked ChatGPT to verify its own fabricated case citations. They are systems designed to support the human in doing the checking.
Co-audit tools might highlight claims that require verification, surface disagreement between multiple AI responses to the same prompt, or visualise the relationship between AI output and source material. The key principle is that verifying AI output demands cognitive effort, and tool design should distribute that effort appropriately rather than eliminate it entirely.
Sarkar has proposed a more provocative idea: AI as "provocateur." Rather than simply assisting users, AI systems could be designed to challenge them—generating critiques, highlighting weaknesses, surfacing alternative perspectives. The goal would be to make passivity harder. In a prototype system for spreadsheet analysis, his team experimented with "provocations": short textual critiques that accompany AI suggestions, pointing out limitations, biases, and risks. The intention is to interrupt the default of acceptance.
David Weinberger, writing in the MIT Press Reader, suggests a pedagogical version of this approach. AI's capacity for dialogue, for back-and-forth exchange that drills down, challenges assumptions, and expands in unexpected directions, could be used to teach students how to think through conversation. The transcript of a student's dialogue with AI could become an artefact for evaluation, assessed not for the quality of AI's answers but for the quality of the student's questioning. The shift reframes AI from answer machine to thinking partner but only if the human remains actively engaged.
These proposals share a common tension. They ask users to invest more cognitive effort in tasks that AI has made feel effortless. This is not a trivial ask.
The Carnegie Mellon and Microsoft researchers found that workers frequently cited lack of time as a reason for not thinking critically. "In sales, I must reach a certain quota daily or risk losing my job," one participant explained. "Ergo, I use AI to save time and don't have much room to ponder over the result." The economic logic of AI adoption—faster, cheaper, more—pushes in the opposite direction from careful evaluation.
There is also the question of what critical thinking is for. The researchers found that workers were most likely to engage critically when they perceived potential negative outcomes: career consequences, social conflict, reputational damage. When stakes were low, or seemed low, critical effort dropped. The problem is that stakes are often invisible until something goes wrong.
The broader pattern suggests a mismatch between what AI tools promise and what thoughtful use of AI requires. The promise is reduced effort. The requirement is sustained judgment. These are not impossible to reconcile, but they are in tension—and the design defaults of most current systems resolve that tension in favour of ease.
The confidence paradox from the Carnegie Mellon and Microsoft study points toward something important. The workers who trusted their own abilities thought more carefully about AI output. The workers who trusted AI thought less.
This suggests that the capacity for critical engagement may not be primarily a skill problem. It may be a disposition problem—a question of whether users approach AI with the expectation that their judgment matters. The researchers describe this as "self-confidence," but it might equally be called intellectual self-respect: the conviction that your evaluation of AI output is worth the effort it requires.
Building that disposition is not something AI tools can do alone. It requires cultures—organisational, educational, professional—that value the human capacity for judgment rather than treating it as a bottleneck to be automated away. It requires training that emphasises not just how to use AI but when to question it. And it requires the honesty to acknowledge that making work easier is not the same as making it better.
The workers in the survey who thought carefully about AI output didn't do so because it was easy. They did so because they believed it mattered. That belief is not a feature that can be added to a product. It is a human disposition that must be cultivated—and that current AI design may be quietly eroding.
The most helpful AI, in this light, might not be the AI that reduces cognitive effort. It might be the AI that makes the effort feel worthwhile.