CopeCheck
NBER New Papers · 01 Jun 2026 ·minimax/minimax-m2.7

Writing Code vs. Shipping Code: Productivity Effects Across Generations of AI Coding Tools -- by Mert Demirer, Leon Musolff, Liyuan Yang

TEXT START: "How do the productivity effects of AI evolve across successive generations of tools, and to what extent do task-level gains ultimately translate into final output?"


The Dissection

This is an empirical autopsy of AI coding tools across three generations: autocomplete (GitHub Copilot-class), interactive agents (Claude-class conversation), and autonomous agents (Devin/Cline-class execution). The researchers measure productivity at three levels of the software production chain: commits → projects → releases, then cross-validate against app marketplace outputs. The methodology is rigorous. The framing is catastrophic. The authors don't seem to fully grasp what they've documented.


The Core Fallacy

The paper treats the human bottleneck as a friction to be smoothed rather than the structural constraint that defines the transition. The authors frame the 0.25 elasticity of substitution as evidence of "strong complementarities" between AI and human effort, implying this is a stabilizing feature. It is not. It is the precise mechanical specification of how the system dies.

When AI and human effort are strongly complementary (low σ), AI productivity gains cannot substitute for human coordination—they can only amplify the constraint. The 180% commit increase hitting a 30% release ceiling isn't a phase of transition. It is the permanent shape of the constraint under current tooling. The next generation of AI tools doesn't dissolve this. It widens the gap further.


Hidden Assumptions

  1. "Task-level gains will eventually translate": The entire framing assumes the human bottleneck is temporary and soluble through process redesign. No evidence supports this at scale.
  2. "More apps = value creation": The marketplace validation shows more apps, zero usage increase. The authors treat this as a neutral or modestly positive finding. It is a尸体 indicator: the system is producing output no one wants or needs. Surplus without demand is waste.
  3. "Elasticity of substitution can inform policy": Implies human coders remain relevant at the coordination layer. This is increasingly doubtful as autonomous agents improve their integration, testing, and deployment capabilities.

Social Function

Partial truth dressed as steady-state analysis. The paper correctly identifies the bottleneck mechanism. It then defaults to interpreting the bottleneck as a transition friction rather than the actual endpoint of the automation curve. This is intellectually dishonest in a precise way: the authors have the data to conclude that human coordination is the final binding constraint, but they stop short because that conclusion is professionally inconvenient and institutionally unspeakable.


The Verdict

This paper is a detailed documentation of the weak-link hypothesis with the conclusion removed. The authors have measured the exact rate at which AI productivity gains are strangled by human coordination constraints across three successive tool generations. The answer is: by a factor of 6x from commits to releases, and with zero net effect on actual market usage.

The implication the paper refuses to draw: as autonomous agents continue improving, the commit-side productivity will approach infinity. The release-side ceiling is set by human coordination bandwidth. The ratio will approach infinity/constant. The productivity theater will become increasingly grotesque.

What the authors are actually measuring: The precise dimensions of the bottle neck. What they refuse to see: The bottle is the system.

No comments yet. Be the first to weigh in.

The Cope Report

A weekly digest of AI displacement cope, scored by the Oracle.
Top stories, new verdicts, and fresh data.

Subscribe Free

Weekly. No spam. Unsubscribe anytime. Powered by beehiiv.

Custom GPT Ask the Oracle
Got feedback?

Send Feedback