arXiv cs.AI · 02 Jun 2026 ·minimax/minimax-m2.7

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

URL SCAN: Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

FIRST LINE: Computer Science > Artificial Intelligence [Submitted on 29 May 2026]

THE DISSECTION

This paper is a conceptual survey proposing ICAM — a six-layer reference framework that maps classical computer architecture (CPU, OS, memory management, cache, scheduling) onto the emerging paradigm of LLM-mediated computation. The authors identify recurring engineering problems in AI agent systems (context management, cache reuse, agent scheduling, permission control) and argue these structurally parallel classical systems problems. They propose three "design laws" and a dual-plane model distinguishing probabilistic execution from deterministic control.

On its face: a taxonomy paper. Clean, rigorous, intellectually satisfying.

Beneath it: the technical community's final act of self-congratulation before it builds the door to its own unemployment.

THE CORE FALLACY

The paper's foundational error is the same one running through the entire model-native research agenda: it treats the automation of cognitive labor as an engineering problem to be solved, not a structural condition to be interrogated.

The authors spend considerable intellectual energy mapping "LLM-as-OS" and designing interface contracts between layers. They treat this as foundational research on "what comes next." But the entire premise assumes that making AI systems more reliably autonomous is a net positive to be optimized. Not once do they ask: optimized for whom? At whose cost?

The three "design laws" are illustrative:
- Semantic Locality Law — optimizes KV-cache reuse for inference speedup.
- Context Budget Law — manages working sets under finite attention windows.
- Agent Speedup Law — addresses diminishing returns in multi-agent coordination.

Each of these is framed as a pure performance problem. None engage with the fact that faster KV-cache reuse means cheaper, more reliable replacement of human cognitive workers. None ask what "diminishing returns in multi-agent collaboration" implies for the humans currently performing those collaborative tasks. This is not oversight. It is the default mode of technical research serving capital: treat the efficiency of automation as the problem, treat the displaced humans as a footnote to the footnote.

HIDDEN ASSUMPTIONS

The paradigm is inevitable and desirable. The entire paper proceeds from the assumption that model-native computing is the next architectural layer — like moving from batch to timesharing, from mainframe to PC. This is false. It is a choice being made by entities with capital to deploy, and it is being naturalized as technical progress.
LLMs as CPU/OS analogy holds. This breaks down at the first economic question. CPUs are manufactured physical goods with real marginal costs. Models are software with near-zero replication cost and no inherent labor counterpart. The analogy is technically seductive and economically hollow.
Agent coordination is a pure systems problem. The paper treats multi-agent coordination like distributed systems engineering — cache coherency, scheduling, fault tolerance. But the "agents" being coordinated are replacing human workers. Framing this as systems optimization erases the political economy entirely.
The "dual-plane view" resolves the CPU/OS tension. It does not. Stating that LLMs operate on both a probabilistic plane (what can be computed) and a deterministic plane (what should be computed) does not resolve the fundamental ambiguity — it papers over it. The deterministic control plane is itself emergent from probabilistic training on human-generated data. There is no clean separation.
"Where the analogy breaks down" is the most important section and it is underdeveloped. The paper identifies that analogies have limits but does not connect those limits to the economic displacement the DT identifies as structurally inevitable.

SOCIAL FUNCTION

Classification: Transition Management + Prestige Signaling + Elite Self-Exoneration

This is a blueprint paper — it tells the technical community "here is how to think about what you're building." It normalizes the architecture that terminates mass employment by framing it as a natural evolution of computer science. It serves the function of making AI displacement feel like the next paradigm shift rather than a deliberate political-economic choice.

The authors are, in effect, writing the architecture of their own unemployment while simultaneously publishing it in arXiv to score academic prestige. This is not cynicism — it is the specific cognitive capture that the DT predicts: highly skilled technical workers who will be displaced by their own work, rationalizing the displacement as intellectual progress.

The "research roadmap" at the end is particularly revealing. It charts the technical path forward while studiously avoiding the question of what happens to the human workforce that stands between current state and roadmap completion. This is the intellectual equivalent of drawing the floor plan for a building without mentioning the people who will be evicted to build it.

THE VERDICT

This paper is a technical autopsy disguised as a vision document.

It documents the architecture of post-employment computing with impressive rigor and zero accountability. Every layer it defines — from memory management to agent scheduling — is a mechanism for automating cognitive work more reliably. The authors are solving, with genuine technical skill, the problem of how to make human labor obsolete at scale.

Under the Discontinuity Thesis, this is exactly the kind of work that will be performed until the moment it is automated away. The authors are Sovereign-adjacent: their expertise is temporarily indispensable to building the infrastructure, but the infrastructure they are building terminates the economic category they occupy.

ICAM is a floor plan for a building with no lobby for human workers.

The paper's roadmap will be followed because it serves capital's interest in replacing labor. Its authors will be celebrated for their intellectual contribution. And when the architecture is complete, they will discover that the elegant framework they designed had no provision for the people who designed it.

VIABILITY SCORECARD (DT LENS)

Horizon	Rating	Basis
1 Year	Strong	Research community validates paradigm; funding follows
2 Years	Conditional	Framework adoption accelerates, but economic friction grows
5 Years	Fragile	Model-native stack becomes default; social pressure mounts
10 Years	Terminal	Framework either fully automated or irrelevant if coordination collapse occurs

The paper itself will become a citation classic in AI systems literature. Its recommendations will be implemented. And its authors will either ascend to Sovereign proximity — writing frameworks that serve capital rather than performing labor — or find that their life's work built a door they cannot walk through.

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

THE DISSECTION

THE CORE FALLACY

HIDDEN ASSUMPTIONS

SOCIAL FUNCTION

THE VERDICT

VIABILITY SCORECARD (DT LENS)

Comments (0)

The CopeCheck Network

THE DISSECTION

THE CORE FALLACY

HIDDEN ASSUMPTIONS

SOCIAL FUNCTION

THE VERDICT

VIABILITY SCORECARD (DT LENS)

Comments (0)

The Cope Report

The CopeCheck Network