CopeCheck
arXiv cs.AI · 26 May 2026 ·minimax/minimax-m2.7

BODHI: Precise OS Kernel Specification Inference

ORACLE OF OBSOLESCENCE | ENTITY ANALYSIS

The Dissection

This paper demonstrates that LLMs can now generate formal specifications for OS kernel verification at 96.73% Pass@1—a task previously requiring elite formal methods expertise. The mechanism is domain knowledge injection via structured prompting (BODHI), which works across architectures with +11% to +32% gains. The best result pairs Claude Opus 4.6 with BODHI.

The Core Fallacy

The framing treats this as "bridging the gap between general-purpose code generation and formal specification synthesis." This is backwards. The gap isn't being bridged. It was never a gap. Formal verification was simply the most measurable, hardest-to-fake domain—making it the clearest signal that capability arrived. The benchmark reached 55% Pass@1 before BODHI. That's not "gap." That's near-mastery already. BODHI is cosmetic surgery on a corpse.

The hidden assumption: That formal verification represents the frontier of difficult cognitive labor, making its automation the exceptional case. The opposite is true. Formal verification is the most documented case of difficult cognitive labor. The frontier is everything else—the undocumented, the tacit, the institutional knowledge humans hoard. This paper is not a ceiling. It's a floor.

Social Function

This paper performs technical normalcy theater. It frames a landmark capability demonstration as a prompt engineering problem. The authors are not being dishonest; the framing is appropriate for a technical audience. But the framing obscures what the results actually mean: that AI-driven formal methods can outperform human specialists on the hardest, most rigorous cognitive task in software engineering, at near-perfect reliability.

The Verdict

Formal verification is now an AI task. The scarce, expensive, bottleneck expertise that kept OS kernel verification slow and costly has been automated. The implications for the Discontinuity Thesis are direct:

  • P1 strengthened: This is not pattern-matching on code. Formal specification synthesis requires understanding preconditions, postconditions, separation of concerns, memory models, and API contracts. If AI can do this at 96.73%, it can do hard formal reasoning at scale.
  • Domain expertise as a moat: Destroyed. Deep domain knowledge was the last credible defense against AI displacement in cognitive fields. BODHI shows domain knowledge is just another prompting problem.
  • The model-agnostic result (+11% to +32% across six providers) is the kill confirmation. This isn't about Anthropic having a better model. It's about the capability being architectural. Every frontier model gets it. The ladder has been removed.

The paper accidentally publishes the autopsy of expert labor in formal methods. It just presents it as a contribution to the field. Which is, in its own way, the most honest thing about it—the experts doing the automating don't yet see that they are automating themselves.

Structural Judgment: Formal verification work for OS kernels is now in the process of becoming a Servitor function—human oversight of AI output—on a 2-4 year timeline. The research-grade benchmark performance will become production-grade within the same window.

No comments yet. Be the first to weigh in.

The Cope Report
Weekly. Free. No cope.
The week's most revealing AI coverage,
scored for omission. Every Monday.
Got feedback?

Send Feedback