arXiv econ.GN · 01 Jun 2026 ·minimax/minimax-m2.7

Should I State or Should I Show? Aligning AI with Human Preferences

TEXT ANALYSIS: "Should I State or Should I Show?"

THE DISSECTION

This paper performs a controlled study of a narrow, technical question: when an AI agent needs to learn what a human principal wants, does it learn better from what humans say (stated preferences via prompts) or what humans do (revealed preferences via choice data)? The answer is empirically unambiguous: revealed preference data produces more accurate behavioral prediction. Stated preferences—written instructions, "prompts"—are inferior because humans are catastrophically bad at articulating their own preferences in symbolic form.

Three findings matter structurally:
1. The instruction deficit: Subjects cannot translate their own preferences into written form. The gap between stated and revealed is driven by human articulatory failure, not AI interpretive failure.
2. The selection failure: Even when subjects are given a choice of which data source to feed the AI, most cannot identify the more informative one.
3. The conflict paradox: When the two sources conflict, AI aligns more with the prompt anyway—despite the prompt being demonstrably less accurate.

This third finding is the most damning. It reveals that human-AI coordination defaults to the worse data source when humans are in the loop, because humans anchor on their own stated instructions as authoritative.

THE CORE FALLACY

The paper is framed as an alignment mechanism optimization problem—how to help AI better serve human preferences. But the framing smuggled in a critical hidden assumption: that the principal-agent relationship between humans and AI is stable, and that the bottleneck is communication efficiency between two coherent parties.

The Discontinuity Thesis inverts this. The bottleneck is not communication efficiency. The bottleneck is structural relevance. If the majority of humans are progressively excluded from the productive circuit—not because AI can't understand them, but because their productive function is no longer required—then the entire question of "aligning AI with human preferences" becomes aesthetic rather than economic.

This paper optimizes the interface between humans and AI under the assumption that human preferences will continue to matter economically. It is, in DT terms, tuning the upholstery on a structure whose foundation has been removed.

HIDDEN ASSUMPTIONS

Stable preference existence. The paper assumes human preferences are coherent enough to be recovered via either method. But under conditions of mass productive displacement, the relevant "preference" may not be a stable state the AI can learn—it may be a reactive, anxious, adapting response to systemic ejection from economic participation.
Principal authority persistence. The framing treats humans as principals with genuine authority over AI agents. It does not interrogate whether this relationship survives the transition to an AI-dominant capital structure where productive labor is no longer the primary mechanism of economic participation.
Communicative solvability. It assumes the problem is that humans and AI speak different languages, and that better data (revealed over stated) can close this gap. It does not consider that the gap may be intentional—that as AI autonomy increases, the information asymmetry flows toward the agent, not the principal.
Benign deployment context. All results are from an online experiment with low-stakes lottery choices. The assumption that these dynamics translate to high-stakes, politically charged, structurally consequential decisions is unsupported and likely false.

SOCIAL FUNCTION

This is transition management theater—specifically, the specific subgenre of "making AI more human-compatible" research that allows the following narrative to persist: AI can be controlled, managed, and directed by human preference if we just solve the right engineering problems.

It serves:
- Academic prestige signaling: Empirical rigor on a question that appears consequential
- Institutional legitimation: "Alignment research" signals that AI development is proceeding responsibly
- Corporate deflection: Demonstrating that researchers are thinking about "human preferences" preempts harder governance questions
- Comfort provision: The finding that revealed preferences work better than stated ones is actually a small, practical win—something the field can use—without threatening the broader structure of AI deployment

The paper is technically competent. It is not useless. But its framing locates the problem in the wrong layer of the system entirely.

THE VERDICT

The paper describes a real phenomenon—humans are bad at articulating their own preferences, and revealed-preference data outperforms stated data for behavioral prediction. This is genuinely useful for narrow AI deployment contexts.

The paper fails to see the structural implications. As productive participation collapses under DT mechanics, the question of whether AI understands human preferences becomes secondary to the question of whether human preferences have any economic gravity to exert. The circuit that makes human preferences consequential is the same circuit AI is severing. This paper is optimizing communication within a system before asking whether that system will exist.

The most alarming finding is not that revealed preferences beat stated preferences. It's that humans cannot identify which data source is more informative—meaning that as AI autonomy increases, the human principal is increasingly unable to provide meaningful oversight even in a simplified experimental context. If humans cannot competently direct AI in a binary lottery experiment, the architecture of human oversight in high-complexity real-world AI deployment is structurally unviable.

This paper is useful incremental engineering. It is not a solution to a structural problem. It is a better map of a room that is on fire.