You Are Not an Expert Because AI Agrees With You

The most dangerous person in any high-stakes room is the one who asked AI all the questions and mistook confident answers for real ones.

Jun 02, 2026

Brain surgery, reduced to its components, is four things: open the skull, locate the pathology, correct it, close. Any literate person can write that sentence. No sane person would hand a scalpel to the person who wrote it.

Yet something structurally identical to handing over that scalpel is happening across boardrooms, operating rooms, courtrooms, and engineering teams — not because people are stupid, but because a genuinely impressive technology has made it possible to produce a convincing performance of expertise without any of the underlying substance. The performance is so good that the person producing it often cannot tell the difference. That is the specific danger worth examining.

Tasks Are the Skeleton. The Job Is Everything Holding It Together.

Every profession has a surface structure and a deep structure. The surface structure is the sequence of steps — the visible, describable, retrievable checklist. The deep structure is what makes those steps work: the judgment about when to deviate, the pattern recognition built from thousands of irreducible encounters, the ability to read what is not on the page.

A neurosurgery resident learns the steps of a craniotomy in year one. The next six years are not about learning more steps. They are about learning what the steps cannot teach — how tissue behaves differently from one patient to the next, how intraoperative bleeding changes the calculus, when the right move is to stop entirely and close without finishing. The attending surgeon’s value is not informational. It is not a knowledge deficit that better prompting would close. It is pattern recognition accumulated through encounters that cannot be compressed into a prompt.

The same structure holds everywhere the surface obscures the depth. A radiologist does not read films. A radiologist reads films in the context of a clinical presentation, a patient history, a prior scan, and a probability distribution built from years of pathology — including the pathology they almost missed, the one that looked normal until it did not. The task is clicking through images. The job is probabilistic reasoning over incomplete and sometimes contradictory evidence, with a patient on the other end of the conclusion.

A software engineer does not write code. A software engineer translates ambiguous human intent into deterministic systems while managing technical debt, organizational constraints, and the second and third-order consequences of decisions that will not surface as problems for eighteen months. The task is syntax. The job is consequence management — anticipating failure modes in systems that do not yet exist, for requirements that have not been fully articulated, in organizations whose priorities will shift before the project ships.

Language models are extraordinarily good at surface structure. What they cannot do is hold any of it accountable to the deep structure underneath — because the deep structure is not text, and it is not retrievable from text.

The Benchmark That Proves the Point

In April 2026, the ARC Prize Foundation published ARC-AGI-3, a benchmark designed to measure what its creators call agentic intelligence — the ability to navigate completely novel, interactive environments without instructions, without prior training on those environments, and without being told the objective. The agent must explore, infer the goal, build an internal model of how the environment works, and then execute a plan. No hints. No rules. No stated win condition.

Humans solve 100% of the environments. Every frontier AI system tested as of March 2026 scores below 1%. Gemini 3.1 Pro Preview reached 0.37%. GPT-5.4 managed 0.26%. Opus 4.6 scored 0.25%. Grok-4.20 scored exactly 0.00%. This is not a rounding error or a benchmark quirk. It is a 99-percentage-point gap between human adaptive intelligence and the best AI systems money can currently build.

The paper’s authors are precise about why this gap exists, and their explanation maps directly onto the task-versus-job distinction. Modern large reasoning models, they write, enable automation only in domains where two conditions are met: the model’s training data provides sufficient coverage of the domain, and the domain offers an exact correctness signal — a verifiable right answer. Take either condition away and performance collapses. Human reasoning carries no such prerequisite. A person who has never seen a particular type of puzzle can reason their way through it using general principles. An AI system, without domain coverage, has no analogous mechanism.

The ARC Prize Foundation’s finding has a name: they call it the difference between fluid intelligence and task-specific performance. It is, in different language, the same distinction this article opened with — the difference between a job and a list of tasks.

The benchmark’s design makes this concrete. ARC-AGI-3 environments use only what the paper calls Core Knowledge priors — innate, universal human capacities like understanding of objects, basic geometry, simple physics, and the recognition that some entities act with intent. No language. No cultural symbols. No domain-specific knowledge that could be memorized. The playing field is stripped to the level of raw reasoning. At that level, frontier AI is not competitive with an untrained human. The gap is not narrowing on a timeline that current architecture can close.

This is the empirical ground under an argument that is often made rhetorically: AI fluency is not the same as intelligence, and intelligence — in the sense of adaptive efficiency in novel situations — is what expertise actually requires. The person using a chatbot to sound like a surgeon has access to the surface structure. ARC-AGI-3 demonstrates, rigorously and with data, that the deep structure remains categorically out of reach.

The Mechanism Is Not New. The Scale Is.

In 1969, a tobacco industry executive wrote an internal memo that became one of the most cited documents in the history of corporate malfeasance: Doubt is our product. The full sentence reads: Doubt is our product, since it is the best means of competing with the body of fact that exists in the minds of the general public. The industry was not arguing that cigarettes were safe. It was arguing that the science was unsettled — and that unsettled science justifies inaction.

Naomi Oreskes and Erik Conway documented in Merchants of Doubt how this strategy migrated from tobacco to acid rain to the ozone hole to climate change, carried by a small and overlapping network of scientists who were not primarily researchers but doubt entrepreneurs. The goal was never to win the scientific argument. It was to keep the argument alive long enough for the commercial and political status quo to persist. Winning would have required evidence. Prolonging required only the appearance of legitimate disagreement.

Robert Proctor, the Stanford historian who coined the term agnotology — the study of culturally produced ignorance — made the deeper point: ignorance is not simply the absence of knowledge. It is frequently manufactured, maintained, and strategically deployed. The tobacco companies knew privately what they were denying publicly. The doubt they sold was not their own uncertainty. It was a product engineered for external consumption, designed to create in the public mind a state of uncertainty that did not exist in the boardroom.

David Michaels, writing in Doubt Is Their Product, identified the operational infrastructure: the product defense industry, populated by scientists whose professional brief was not to discover but to contest. Challenge the methodology. Demand longer studies. Dispute the statistical threshold. The goal was never resolution — resolution would have been fatal. The goal was procedural indefiniteness, the permanent deferral of a conclusion that the evidence had already reached.

What Paolo Bacigalupi’s fiction adds — and fiction is sometimes the sharper instrument — is the observation about who manufactures doubt and whether they know they are doing it. The most durable doubt machines are not staffed by cynics. They are staffed by people who have genuinely convinced themselves that the uncertainty is real, that the science is legitimately contested, that they are performing a public service by slowing premature consensus. Sincere doubt manufacturers are harder to discredit than mercenary ones. They defend their work with the conviction that mercenariness cannot sustain.

The AI-era version of this mechanism runs the same play at a different target. The profession is not tobacco science or climate modeling — it is human expertise itself. The argument is not that surgeons are wrong or that engineers make errors. The argument is that the mystery is a racket — that the years of training, the licensing requirements, the accumulated judgment, the irreducible depth of a practiced profession are guild protections masquerading as quality controls. And the evidence offered for this argument is the chatbot output that looks, to someone who cannot evaluate it, exactly like what an expert would produce.

The doubt being manufactured is this: if the output is indistinguishable, the expertise was never real. The output is not indistinguishable — not to anyone qualified to read it carefully.

Proctor’s insight applies directly: this is not ignorance as absence. It is ignorance as weapon — deployed not by industry against the public this time, but by a diffuse, largely well-intentioned population of AI users against the professions they are inadvertently dismantling. The tobacco executives knew what they were doing. Most of the people currently manufacturing doubt about expertise do not. In Bacigalupi’s framing, that is not a mitigating factor. It is what makes the machine run.

The Concession Worth Making — and Its Limits

Some jobs are, in fact, just bundles of tasks. Work that was institutionalized because the technology to automate it did not yet exist, not because it required irreducible human judgment. Form processing. Routine data extraction. First-pass document review with no stakes. Scheduling coordination that follows deterministic rules. These were always jobs built on task lists without the connective tissue of expertise, and their displacement is not a loss of professional value. It is the removal of a workaround the market had been tolerating.

The ARC Prize Foundation makes an analogous distinction in their paper. They note that AI reasoning automation works well in what they call verifiable domains — situations where a correctness signal exists and training data covers the territory. Programming environments are one example. Scientific domains with mechanistic, testable outputs are another. These are not diminished by that observation; they are simply accurately characterized. The automation is real and the productivity gains are real. The error is assuming that because AI performs well in verifiable domains with high knowledge coverage, it performs equivalently in domains that lack those properties.

The radiologist and the data entry clerk are not variations of the same thing at different price points. One is a practiced accumulation of irreducible judgment, built through years of high-stakes feedback loops that ARC-AGI-3 is explicitly designed to simulate. The other was always waiting for a sufficiently capable spreadsheet. The tell is accountability. Professionals whose work cannot be reduced to tasks carry liability for outcomes. The surgeon answers for the patient. The structural engineer answers for the building. The attorney answers for the position they took on the client’s behalf. When accountability disappears from a job — when no one can be identified as responsible for what the output causes — the job was probably always a task cluster dressed in professional clothing.

Informed Ignorance Is More Dangerous Than Pure Ignorance

The person who queries a chatbot for a diagnosis and acts on it is not simply uninformed. They are dangerous in a specific, underappreciated way: they have acquired the vocabulary of expertise without any of the error-correction mechanisms that expertise includes.

A trained diagnostician knows the failure modes of their own reasoning. Training instills not just knowledge but epistemic humility — the practiced awareness of when the presentation is atypical, when the prior probability should override the surface reading, when the right answer is I don’t know yet, and here is what I need to find out. The chatbot user has the conclusion without the uncertainty calibration. They cannot recognize edge cases because recognizing edge cases, and knowing which edges matter, is most of what the training was actually for.

The ARC Prize Foundation describes this gap precisely, though in the language of benchmarking rather than medicine. They distinguish between systems that perform well because their training data covers a domain and systems that demonstrate genuine fluid intelligence — the ability to adapt efficiently to novel problems from first principles. The chatbot user believes they are in the presence of the latter. They are in the presence of the former. The difference is invisible until it isn’t.

This is a worse epistemic position than pure ignorance. Pure ignorance tends toward caution. Informed ignorance — the state of knowing enough to feel confident, with a plausible-sounding output to point to — is how errors propagate into irreversible outcomes. The person who knows they do not understand surgery will not perform surgery. The person who has read AI-generated surgical notes and absorbed the vocabulary will, in some adjacent domain, make a decision with surgical-level consequences while feeling entirely qualified to make it.

This is not a hypothetical. It is the legal brief written by someone who used AI to research case law and did not know enough to recognize that the cases cited did not exist. It is the architectural specification produced from AI-generated structural assumptions that no licensed engineer reviewed. It is the business acquisition diligence conducted by a team that used AI to produce the financial model and missed the covenant that made the deal unfinanceable. The output looked right. The output was right in structure and wrong in substance, in the specific ways that only someone with deep knowledge of the domain would have caught.

Informed ignorance is how errors propagate into irreversible outcomes. The person who knows they do not understand surgery will not perform surgery. The person who absorbed the vocabulary will make surgical-level decisions while feeling entirely qualified.

What the Tool Actually Does — and Does Not — Displace

AI does not threaten expertise. It threatens the transaction costs that sit between expertise and the people who need it — the administrative layer, the surface work, the boilerplate that consumed professional time without requiring professional judgment.

The correct reading of AI’s effect on skilled professions is not replacement but compression of the surface layer — which should, in a functioning system, push practitioners deeper into the work that actually requires them. The surgeon spends less time on documentation and more on pre-operative planning. The radiologist reviews fewer normal studies and focuses attention on the ambiguous ones. The engineer writes less boilerplate and spends more time on architecture. The lawyer generates the first draft and concentrates on the argument that cannot be generated.

ARC-AGI-3 gives this argument empirical grounding. If the best AI systems in the world, operating without computational cost constraints, score below 1% on tasks that require adaptive intelligence in novel environments — and untrained humans score 100% — then the claim that AI is replacing expertise in genuinely complex domains is not just premature. It is precisely backwards. What AI has automated is the surface layer of professions that had always been partially automatable. The deep layer, the one that justifies the training, the licensing, the liability, and the years of accumulated judgment, remains where it always was: in the people who did the work long enough to understand what the checklist cannot say.

The threat is not the tool. The threat is the failure — institutional, organizational, and individual — to distinguish between what the tool compresses and what it cannot touch. That failure is being actively accelerated by people who ask AI all their questions, receive structured and confident-sounding answers, and conclude that the questions have been answered. They have not been answered. They have been responded to. The difference between an answer and a response is precisely the deep structure that the checklist cannot carry.

The surgeon is not protecting a mystery. The mystery was never the point. What they are protecting — and what informed ignorance destroys — is the patient who came in expecting that the person holding the scalpel had done more than read the steps.

Sources referenced: Merchants of Doubt — Oreskes & Conway (2010) | Doubt Is Their Product — David Michaels (2008) | Agnotology — Robert N. Proctor | The Doubt Factory — Paolo Bacigalupi (2014) | ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence — ARC Prize Foundation, April 2026 (arXiv:2603.24621)

Discussion about this post

Ready for more?