From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes
TEXT ANALYSIS PROTOCOL
The Dissection
This paper performs a methodological comparison of two techniques for defining LLM "agents" (personas) intended to simulate human respondents to surveys. The comparison uses the SHARE (Survey of Health, Ageing and Retirement in Europe) dataset as the testbed. The authors find that demographics-only agents exhibit two specific failure modes: central tendency bias (clumping toward population means) and "unrealistic accuracy" (they don't produce the wrong answers and "don't know" responses that humans reliably produce). Survey-anchored agents—defined using actual survey responses from the same in-domain instrument—reproduce the interaction structure among predictors (financial risk tolerance, future time perspective, knowledge of retirement planning) that demographics-only agents miss.
The Core Fallacy
The paper treats "predicting human survey responses" as an unquestioned desideratum. It is pure means without ends. The DT lens exposes the buried assumption: that modeling human attitudes toward retirement is a stable and valuable epistemic target. Under the Discontinuity Thesis, this target is not merely difficult to hit—it is becoming structurally irrelevant. The post-WWII retirement system it assumes is itself terminal. You are building increasingly precise instruments to measure the vital signs of a patient who is already on the table.
Hidden Assumptions
- Human respondents remain the epistemic authority. The paper frames LLM agent fidelity to human respondents as the gold standard. It never asks: authority over what? For which decision-making contexts?
- Attitudinal modeling is policy-relevant. The phrase "policy-relevant constructs around personal finance" appears without interrogation. Under mass cognitive automation, policies targeting human financial attitudes will target a shrinking, increasingly marginal population.
- The retirement system persists. SHARE is a survey about retirement. The paper assumes the retirement system being measured is a going concern worth modeling accurately.
Social Function
Prestige signaling embedded in methodological refinement theater. This is craft without consequence—the paper is technically rigorous about an instrument nobody should be sharpening. It performs the social function of keeping behavioral economists, survey methodologists, and AI researchers productively occupied on problems that will not survive contact with the structural transition it ignores.
The Verdict
The paper is competent technical work building precision targeting for an arrow that won't find its mark. It improves the accuracy of an LLM-based methodology for simulating human survey respondents while ignoring that the economic substrate those responses describe is being dismantled. The authors have optimized for a local maximum in a terrain that is being mechanically destabilized. This is autopsy on a still-breathing patient—the measurement is accurate, the diagnosis is wrong about what is dying.
Comments (0)
No comments yet. Be the first to weigh in.