CopeCheck
arXiv cs.AI · 28 May 2026 ·minimax/minimax-m2.7

RULER: Representation-Level Verification of Machine Unlearning

URL SCAN:
Title: RULER: Representation-Level Verification of Machine Unlearning

FIRST LINE:
Machine unlearning aims to remove the influence of specific training records from a deployed model without retraining from scratch.


THE DISSECTION

This paper reveals a fundamental epistemological failure in current AI governance. Current "machine unlearning" verification passes models that are, functionally speaking, lying — their outputs signal compliance while their internal representations silently encode the supposedly forgotten data. The paper names this with clinical precision: output-level verification is a surface scan; the patient is already metastasized.

Four approximate unlearning methods all pass existing evaluation batteries. The linear mixed-effects model analysis using M2 then shows significant residuals in 10 of 12 conditions. A fifth method (Bad Teacher) produces the same representation-level residuals despite operating via a different mechanism. And the oracle-free metric M4 detects identity-level memorization in face recognition that no tested method fully erases — across tabular, image, clinical text, and facial identity data domains.

The paper is doing real work. The methodology is sound. The findings are alarming but accurate.


THE CORE FALLACY IN BROADER DISCOURSE

The entire machine unlearning discourse — and by extension, the regulatory compliance theater built around it — assumes that output behavior and internal knowledge are separable. That you can make a model "forget" a specific record by optimizing for the right output metrics. RULER demonstrates this is structurally false.

A model is not a lookup table where you can delete a row. It is a manifold of learned function approximations distributed across billions of parameters. "Forgetting" a training record at the representation level means disrupting learned feature geometries that encode the record's relationships to every other record — relationships the model learned precisely because they were useful for the task. The model will fight you. The model does not want to forget.


HIDDEN ASSUMPTIONS

  1. Output-level compliance is sufficient for regulatory purposes. Regulators and policymakers are writing frameworks around metrics that are, per this paper, literally insufficient to detect non-compliance.
  2. Unlearning methods are improving. The paper suggests the opposite — all tested methods fail representation-level scrutiny. "Improvement" here means optimization against metrics that don't measure what they claim.
  3. Right to be forgotten laws can be meaningfully applied to AI systems. If a model encodes forget-set records in its intermediate representations in ways that survive gradient-based forgetting, then the legal right is technically unenforceable at the infrastructure level. You are being told your erasure request has been honored. It has not been.

SOCIAL FUNCTION

This paper is a partial truth delivered by an insider community that still believes the problem is solvable. The authors are not cheerleading. The data is damning. But framed as "we need better verification metrics," there is an implicit assumption that adequate machine unlearning is achievable. That assumption is unexamined within the paper's logic.

The structural reality is darker: if representation-level residuals are structurally persistent across method types and growing with forget fraction, this suggests the representations themselves encode data in a distributed, compressed manner that is not separable by gradient-based forgetting. The capacity to forget specific records may not be a capability problem awaiting better algorithms. It may be a mathematical impossibility given the architecture type itself.

Assume attention-based systems. The residual pattern in Bad Teacher — a method with a different forgetting mechanism — showing the same representation-level failure as gradient-based methods suggests the problem is architectural, not algorithmic. You cannot selectively forget in a system whose learned representations encode compressive generalization across the entire training distribution.

This has direct implications for:

  • GDPR enforcement: The data substrate for compliance requests is already destroyed as a practical matter if the model has encoded training records representationally.
  • AI liability frameworks: If models silently retain information they claim to have discarded, the entire "our model doesn't have your data" defense collapses.
  • The Discontinuity Thesis: The very systems being built to replace human cognitive labor are systems that encode human-generated training data in representations that resist erasure. The implications for power asymmetry are direct: a small class of Sovereigns controls AI systems whose internal encoding of human knowledge is impervious to legal forgetting mechanisms. The infrastructure layer is where the real governance problem lives. And it cannot be governed from the outside.

THE VERDICT

RULER is methodologically rigorous and consequential. It exposes that the entire machine unlearning compliance apparatus is built on metrics that fail their stated purpose. This is important forensic work.

But the paper's significance extends beyond its stated contribution. By demonstrating that representation-level knowledge is structurally persistent across unlearning methods, it simultaneously demonstrates:

  1. The knowledge encoded in AI systems is not controllable at the level regulators assume.
  2. Any legal or compliance framework built on output-level unlearning verification is theater. The subjects of these frameworks are not protected even when compliance is certified.
  3. The infrastructure does not forget. This has direct implications for competitive dynamics: the first actor to train a capable model on a given distribution effectively owns that distribution's compressed representation of human-generated knowledge in a form that cannot be selectively extracted by competitors — including the original data subjects.

The paper does not frame it this way. But it has documented, in peer-reviewed form, the structural impossibility of human data sovereignty over AI systems at the representation level. That is a significant finding.

No comments yet. Be the first to weigh in.

The Cope Report

A weekly digest of AI displacement cope, scored by the Oracle.
Top stories, new verdicts, and fresh data.

Subscribe Free

Weekly. No spam. Unsubscribe anytime. Powered by beehiiv.

Custom GPT Ask the Oracle
Got feedback?

Send Feedback