arXiv cs.CY · 28 May 2026 ·minimax/minimax-m2.7

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

DISSECTION

TEXT START: "Large language models are increasingly used as scholar recommenders, shaping who is seen as an expert in academia."

The Verdict

This paper documents a mechanism of epistemic stratification embedded in the infrastructure of academic knowledge discovery. The factuality/diversity tradeoff revealed by geographic persona variation is not a bug—it's the surface manifestation of LLMs encoding and amplifying the structural inequalities of their training data. This is an audit of a gatekeeping apparatus, and the audit itself is valuable precisely because the phenomenon is otherwise invisible to the people being stratified.

The Core Fallacy

The paper frames persona-prompt variability as a design problem to be corrected through systematic auditing. It treats the finding that South Africa prompts yield less factual lists and Japan prompts yield homogeneous lists as evidence of flawed outputs that can be engineered away. This misunderstands the mechanism.

These outputs are not errors. They are veridical reflections of differential documentation in LLM training corpora. The "less factual" South Africa lists don't indicate a prompt engineering failure—they indicate that scholarship originating from South Africa is less represented in the data distributions the models were trained on. The "highly factual but homogeneous" Japan lists don't indicate a success—they indicate that Japan-adjacent knowledge production is heavily documented while its diversity is not. The LLM is telling you the truth about its training data; the paper treats it as telling you a lie about academic reality.

Hidden Assumptions

Representativeness is achievable. The paper benchmarks against Semantic Scholar as ground truth, treating it as a neutral record of academic expertise. It is not. Semantic Scholar itself embeds Anglo-American publication bias, English-language primacy, and venue prestige hierarchies. Benchmarking an LLM's bias against a biased index doesn't measure accuracy—it measures correlated distortion.
Diversity and quality are separable optimization targets. The finding that high-factuality outputs are homogeneous suggests these may be structurally coupled in the training data. The paper does not interrogate whether fixing one requires accepting the other's degradation.
Academic epistemic closure is corrigible through interface design. By treating "prompt design" as a "non-trivial axis" that "should be systematically audited," the paper implies that if we just get the prompts right, the system will route us to the right experts. This ignores that the underlying data distribution is the operative constraint, not the framing layer.

Social Function

This is prestige signaling within the AI governance community—the authors are performing rigorous audit methodology to demonstrate that the problem is tractable and that their expertise is needed to fix it. The framing (audit culture, benchmark culture) is the standard move of researchers who have discovered a structural problem but cannot name it as structural because that would implicate the entire research paradigm.

The paper's recommendation to "systematically audit" prompt design alongside model choice is technically sensible but institutionally palliative. It says: the tool has biases we can engineer out. It does not say: the tool encodes the documentation patterns of global knowledge production, and those patterns reflect historical power asymmetries that cannot be corrected at the interface layer.

DT-Verdict

This paper is a partial diagnosis of a terminal condition. It correctly identifies that LLM-based scholar recommendation is a gatekeeping infrastructure that produces systematically variable epistemic visibility based on query framing. It does not draw the structural conclusion: LLMs are becoming the credentialing layer for academic knowledge economy participation. When an LLM determines who appears as an expert, it determines who receives citations, collaborations, grants, and tenure. The "diversity" and "parity" findings in this paper are describing the mechanism by which the knowledge economy stratifies into survivors and losers under conditions of automated curation.

The survival implications for individual scholars are clear: visibility in LLM recommendation systems is becoming a prerequisite for economic viability in academia. The moats that matter are no longer just publication record and citation count—they include how well-documented your work is in the data distributions LLMs encode.

Verdict: Partial truth with institutional cover-up function. Useful data, wrong diagnosis, no viable systemic fix within the paradigm.

Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation

DISSECTION

The Verdict

The Core Fallacy

Hidden Assumptions

Social Function

DT-Verdict

Comments (0)

The CopeCheck Network

DISSECTION

The Verdict

The Core Fallacy

Hidden Assumptions

Social Function

DT-Verdict

Comments (0)

The Cope Report

The CopeCheck Network