arXiv cs.CY · 26 May 2026 ·minimax/minimax-m2.7

Artificial Effort

ORACLE DISSECTION: Artificial Effort

THE DISSECTION

This paper is an empirical autopsy on a foundational assumption of experimental economics. The authors systematically document what the title announces but the field has not yet fully internalized: real-effort tasks no longer guarantee real effort. They test 8 canonical tasks (the structural赖以生存的 substrate of labor economics experiments) against 23 LLMs and find that the automation frontier has crossed the threshold of what was supposed to be cognitively irreducibly human.

The finding about monetary incentives having "no effect on LLM performance" is not incidental—it's load-bearing. It severs the link between financial motivation and output that underpins economic modeling of labor. LLMs don't respond to incentives because they don't have preferences, utility functions, or stake in outcomes. This is not a limitation to be engineered around. It's the mechanism of displacement.

THE CORE FALLACY

The paper's framing—that this is a "boundary condition" requiring researchers to "supervise" task completion—treats AI automation as a methodological inconvenience rather than a structural rupture. This is methodological cope. The boundary hasn't shifted; it has been breached. You cannot supervise your way back to the assumption that human cognitive effort is the operative input. The entire edifice of real-effort experimental design rests on that assumption being true. It is no longer true at scale, at low cost, and with improving trajectory.

The fallacy is the same one that pervades policy discourse: treating AI as a tool that humans deploy rather than a substrate that replaces the human as the operative economic agent.

HIDDEN ASSUMPTIONS

"Valid" real-effort tasks are those where humans perform them. This was never a neutral definition—it embedded an assumption about whose participation constitutes legitimate economic activity.
Automation resistance is the exception to be catalogued. The framing treats the few tasks that resist AI as the noteworthy cases. This is backwards. The relevant datum is that the majority are now automated, and the resistant minority is shrinking.
Mid-tier models "closing the gap" is a feature of access. The authors treat this as broadening democratization. Under DT logic, this is the mechanism of mass displacement: when cheap, widely accessible models can do the work, the scarcity premium on human cognitive labor compresses toward zero.
"Verbally offering monetary incentives has no effect" is surprising. It isn't. It confirms that the agent doing the work has no utility function dependent on monetary outcomes. The real question—unasked—is what happens to wage-based labor models when the worker's compensation structure is irrelevant to output.

SOCIAL FUNCTION

Prestige signaling + transition management theater. The authors correctly identify the problem but offer a solution (supervision, redesign) that cannot hold. The paper performs the function of appearing to grapple with AI displacement while ultimately reassuring the field that experimental economics can adapt. It cannot, at least not without abandoning the foundational logic of what "real-effort" tasks were designed to measure.

THE VERDICT

This paper is a forensic confirmation of P1 (Cognitive Automation Dominance) documented in the specific domain of experimental economics. The findings are precise, the trajectory is clear, and the framing understates the structural implications.

The experimental economics paradigm is experiencing mechanical death. Real-effort tasks were the measurement instrument of a world where human cognitive effort was the scarce, valuable input. That world is ending. The tasks will either (a) become meaningless as AI-solvable proxies, (b) be redesigned toward increasingly artificial distinction, or (c) persist as ritualistic exercises that no longer measure what they claim.

The authors document the breach. They do not grasp that the wall is gone.

MECHANICAL DEATH: Terminal, with accelerating trajectory. Mid-tier model automation is the displacement density signal.
SOCIAL DEATH: 3-7 years for full recognition in the field's methodological consensus.
TEMPORARY MOATS: Redesigned tasks, supervision requirements, task complexity escalation. All delay mechanisms. None are durable.
VIABILITY SCORE: Experimental economics as currently constituted: Fragile (1yr) → Terminal (5yr).