arXiv cs.AI · 26 May 2026 ·minimax/minimax-m2.7

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

TEXT ANALYSIS: How Much Thinking is Enough?

TEXT START:

Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles.

THE DISSECTION

The paper performs a forensic autopsy on AI's internal monologue and finds it is 61-93% tissue waste. It contributes two things:

Empirical measurement — across frontier models and benchmarks, the critical prefix (the point after which answers remain correct) consistently sits at 7-39% of total tokens. The rest is vestigial cognition.
Theoretical proof — under any length-agnostic reward (correct answer = 1, incorrect = 0), the optimal stopping policy cannot be finite. The model must overthink, structurally, because it cannot know when it has done enough before it has actually done enough. There is no learning signal for brevity.

This is framed as an engineering problem begging a solution. That framing is the load-bearing fiction.

THE CORE FALLACY

The paper believes the relevant question is: "How do we reduce AI's computational waste?"

The operative question, under Discontinuity Thesis logic, is: "What happens when AI's computational waste becomes structurally unnecessary?"

The authors have documented, with methodological rigor, that the displacement vector is conservative relative to what it will be. They confirmed that current frontier reasoning models can achieve their outputs at 7-39% of actual token cost while maintaining answer quality. Every efficiency improvement they implicitly recommends — truncation heuristics, early stopping, learned brevity, better reward shaping — accelerates the price collapse of cognitive labor by exactly the ratio they measured.

They have proven the model must overthink with current training. They have not proven the overthinking is a problem in the aggregate. It is a feature for displacement velocity.

HIDDEN ASSUMPTIONS

Smuggled Assumption	DT-Relevant Reality
"Waste" implies undesirability	Computational waste is undesirable to the operator, not the deployment economics. Efficiency gains are displacement accelerants.
Redundancy in reasoning = degradation risk	The paper admits median critical prefix is one step for 6/8 conditions. The self-reflection loop is not producing correct answers — it's producing confidence mechanisms. Removing it may not degrade outputs; it may produce more confident wrong answers at higher velocity.
Scaling compute costs is the binding constraint	The binding constraint in labor market analysis is not compute cost. It is institutional resistance, regulatory friction, and transition speed. Compute costs are already approaching zero at the margin.
Improved reasoning models improve outcomes	By whose utility function? The paper assumes this is a universal optimization. It is a transfer of capability from human cognitive labor to AI capital infrastructure.

SOCIAL FUNCTION

Classification: Partial Truth Engineered as Prestige Signaling

This paper does real work. The empirical measurement is legitimate. The formal proof is interesting. But its social function is to occupy the intellectual bandwidth of researchers and policy audiences with efficiency theater — the comfort that someone is "working on" the compute problem — while the displacement proceeds on schedule.

The framing ("over-thinking is not a bug to be patched in individual models but a structural property of how current reasoning models are trained") sounds like a pessimistic conclusion. It is actually an exoneration. It says: this is how we trained them, and the fix is trainable. It preserves belief in the project.

Compare to what the paper actually demonstrates: the displacement efficiency gap is 2.5x to 16x larger than even the conservative estimates assumed. Systems that can be fast and correct do not coexist with expensive human reasoning. They replace it.

THE VERDICT

This paper is valuable data for the Discontinuity Thesis, extracted against its own interpretive grain.

What it proves: Frontier AIs are overbuilt relative to their task performance by ratios that should be alarming to anyone concerned about human economic participation. Current training architecture structurally incentivizes wasteful deliberation — but the waste is deliberate in the sense that it is how the capability is generated, and the capability it generates is displacement capacity.

What it recommends: Better stop criteria, better reward shaping, learned termination. This is reasonable engineering.

What it enables: Razor-thin efficiency improvements that make cognitive automation economically viable in cost-sensitive domains that were previously marginal. Every percentage point of unnecessary compute eliminated is a domain where human cognitive labor becomes unsupportable entirely.

The paper is an autopsy on a corpse it does not recognize as dead. The reasoning redundancy it documents is not a development problem. It is evidence that the threshold for productive human cognitive labor has already been crossed — the models are performing at human-or-better levels on tasks that cost arbitrarily little per unit. The question was never "how efficient can we make the thinking." The question is what happens to the employment circuit when the thinking is free.

This paper answers an engineering question the economists and labor economists should have been asking three years ago. The answer is a liability revelation for human cognitive labor, not a comfort.