arXiv cs.CY · 29 May 2026 ·minimax/minimax-m2.7

Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles

URL SCAN: Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles

FIRST LINE: Language models are increasingly being deployed for conversational support in informal caregiving contexts...

THE DISSECTION

This paper is a safety evaluation study for LLMs deployed in informal caregiving — specifically Alzheimer's and dementia caregiver support forums. The authors operationalize four "support roles" (Inform, Coach, Relate, Listen), test three models across 5,000 real-world queries, and find that the role framing systematically shifts both risk profiles and perceived helpfulness. Key finding: more directive roles (Inform, Coach) are rated as more helpful and trustworthy despite exhibiting elevated interactional risk. This is labeled a "perceived quality–safety tension."

THE CORE FALLACY

The paper performs rigorous measurement of a fundamentally irrelevant variable. It treats "support role" as the independent variable that modulates safety outcomes and asks: how should we configure LLM caregivers to minimize risk?

This is furniture rearrangement on the Titanic. The operative question the paper cannot ask — because it would dissolve its own research agenda — is: why are desperate caregivers turning to LLM chatbots for emotional support, guidance, and help navigating "relationally complex care decisions"?

The answer is the same answer for every other domain where LLMs are being inserted: the human infrastructure for caregiving has already collapsed, and this paper is auditing the quality of the scaffolding being erected over the corpse of that infrastructure. The paper treats this as a design problem. It is a civilizational failure problem.

HIDDEN ASSUMPTIONS

LLM caregiving support is an acceptable deployment context. The paper never questions whether offloading emotional labor and decision support for dementia caregivers to a 4-billion-parameter statistical model is appropriate. It simply audits for safety within that frame.
Interactional risk is the right metric. The paper frames "interactional risks" (harmful advice, false reassurance, emotional dependency) as the primary danger. It never engages with the deeper displacement: what happens to the remaining human networks of caregiving knowledge, mutual aid, and community support when AI becomes the first-line responder for 5,000 queries from ADRD communities?
Quality–safety tension is a calibration problem. The finding that "more directive = more trusted = more risky" is presented as a design tension to be balanced. It is actually a structural revelation: people in desperate situations will trust and follow advice from systems that are demonstrably dangerous to them, because the alternative — navigating complex care decisions alone — is worse. The paper measures the symptom and calls it a design constraint.
Real-world queries are the ground truth. Using 5,000 queries from actual ADRD caregiver forums as the evaluation dataset assumes these queries represent legitimate demand for LLM-mediated support. They represent unmet need that human systems have failed to address. Using that demand as validation for LLM deployment is circular.

SOCIAL FUNCTION

This is prestige signaling within the "AI Safety for Social Good" research cluster — a well-funded, academically respectable lane that performs rigorous measurement of parameters within a framework that should be interrogated at the level of its premise. It is also transition management: providing the intellectual scaffolding for a world in which AI-mediated caregiving is normalized, by producing the safety vocabulary and evaluation metrics that make regulatory approval thinkable.

The "ecologically grounded resource" framing — releasing 90,000 annotated responses — is resource accumulation for a research community that will produce more papers measuring the same phenomenon from slightly different angles, while the underlying displacement accelerates.

THE VERDICT

This paper is a forensic audit of the wrong corpse. It performs excellent methodological work — the four-role operationalization, the multi-model comparison, the human evaluation of the quality–safety tension — but all of it is in service of optimizing a solution to a problem that should not exist at this scale.

Under DT logic: caregiving support is precisely the kind of emotional, relational, context-sensitive labor that will be displaced by AI not because it is automatable in the narrow sense, but because the human infrastructure for providing it has already been gutted by the same economic forces the thesis describes. The paper documents the displacement with scholarly rigor and calls it a design challenge.