Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game
TEXT ANALYSIS: Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions
TEXT START:
LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms.
THE DISSECTION
A computational rationality paper that runs 28 LLMs through the St. Petersburg game—a paradox where infinite expected value meets human cognitive finiteness—and concludes that LLMs produce human-like outcomes through fundamentally non-human processes. The paper argues that alignment researchers are fooled by surface behavioral similarity while missing that the underlying computation differs entirely. It recommends "mechanism-level" evaluation rather than outcome comparison.
THE CORE FALLACY
The paper mistakes the correct problem for the wrong one.
It treats AI alignment as a human-similarity problem. The DT lens reveals this as a category error. What matters is not whether LLMs replicate human decision mechanisms—whether through bounded rationality, cognitive heuristics, or emotional valence—but whether they can perform the economically necessary functions humans currently provide. A computationally rational agent is more threatening to human economic participation, not less. Human-like irrationality would be a preservation mechanism. The paper celebrates finding that LLMs aren't faking it human-wise, but what they actually demonstrate is that the machines are working correctly from an economic replacement standpoint.
HIDDEN ASSUMPTIONS
- Human mechanisms are the benchmark. Implicitly treats human cognitive architecture as the legitimate standard. It is not. It is simply the current substrate of economic participation.
- Alignment = behavioral mimicry of the displaced class. The entire "alignment" framing assumes the goal is making AI acceptable to humans rather than recognizing that acceptability is irrelevant to structural displacement.
- Instruction tuning as a fix mechanism. The paper treats the finding that instruction tuning "often lower bids" as evidence of partial alignment progress. It is evidence of parametric adjustment—hallucination management for human comfort. The underlying replacement calculus is untouched.
- Risk behavior as a values question. The paper frames rational vs. human risk decisions as an alignment problem. Under DT logic, it is a productivity question: which agent produces the consumption-sustaining output?
SOCIAL FUNCTION
Prestige signaling within the AI safety cottage industry. This paper performs intellectual rigor—28 models, controlled variants, structured prompt suites—for an audience that rewards nuance theater. It contributes to the alignment verification literature, which itself is a displacement-adjacent field spending considerable cognitive labor debating whether the displacing technology is sufficiently respectful of displaced-via-what mechanisms. The paper will be cited by researchers seeking to establish that "we must look deeper" while the economic participation circuit severs regardless.
THE VERDICT
This paper is methodologically competent but strategically confused. It correctly identifies that LLMs achieve outcomes through non-human mechanisms—which under DT logic confirms the replacement thesis rather than alleviating it. The mechanism-level alignment it calls for is not a solution to the discontinuity; it is a description of the discontinuity operating cleanly. The fact that LLMs exhibit "computationally rational behavior" rather than human-style heuristics means they are more effective economic substitutes, not less. The alignment question the paper answers was never the relevant one. The relevant question is who owns the computationally rational agent's outputs, and whether that ownership preserves any viable position for the humans it displaces. The paper has no answer for this, because it was designed not to.
Comments (0)
No comments yet. Be the first to weigh in.