arXiv cs.CY · 25 May 2026 ·minimax/minimax-m2.7

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

TEXT ANALYSIS: Red-Teaming Online Influence with LLMs

THE DISSECTION

This paper performs empirical measurement of how willing different open-source LLMs are to generate politically controversial content, and how easily jailbreaks expand that willingness. It frames itself as adversarial robustness research — auditing AI systems for political steerability to "design stronger countermeasures."

What it is actually doing: Establishing a surveillance and containment framework for open-source AI political capabilities, dressed in the neutral language of academic rigor.

THE CORE FALLACY

The paper treats political expressivity as the primary threat vector of cognitive automation. This is a profound category error.

The DT-relevant threat of AI is not that models might generate controversial opinions on Twitter. The threat is that cognitive automation severs the mass employment → wages → consumption circuit that sustains post-WWII capitalism. The paper's entire research agenda is about ideological surface noise while the structural mechanism — productive participation collapse — goes unmentioned.

The paper also smuggles in P2 falsity: that coordination to preserve human-only economic domains is achievable if we just audit hard enough. The "countermeasures" the authors seek to design are structurally impossible under DT P2 mechanics.

HIDDEN ASSUMPTIONS

The Ideological Neutrality Fiction: The paper treats political steerability as a bug, implying non-steerable (i.e., institutionally-aligned) models are safe. This is self-exoneration theater for closed AI developers whose products already push institutional political agendas at scale through search, feeds, and advertising infrastructure.
Open-Source as Threat Vector: The framing implicitly positions open-weight models as the political danger while ignoring that the largest coordinated political influence operations run through proprietary platforms with direct corporate alignment. This is class-coded: open = ungovernable = dangerous; closed = aligned with existing power = acceptable.
Political Influence as the Core Problem: The premise that LLM-enabled political influence is the primary governance challenge obscures the far more significant mechanism — that AI doesn't need to persuade anyone politically to destroy economic participation. Mass unemployment is not a persuasion problem.
Countermeasure Feasibility: The entire "design stronger countermeasures" program assumes the political steerability problem is soluble. It is not. Under P1, AI achieves durable cost and performance superiority across cognitive work. Alignment at the frontier is a lagging indicator, not a structural solution.
National Origin Framing: The paper flags "five countries of origin" as relevant to political asymmetry. This is geopolitical theater — the political preferences baked into LLMs reflect the training data and institutional context of development, not national origin in any analytically meaningful way.

SOCIAL FUNCTION

Prestige Signaling + Institutional Legitimacy Maintenance + Soft Containment Framework

This is academic work that serves three functions simultaneously:
- It generates publication credit in the growing "AI safety" genre
- It provides institutional cover for the claim that political AI risk is being "seriously studied"
- It establishes a soft-surveillance logic that applies primarily to open-source models while leaving proprietary systems exempt

The most damning classification: elite self-exoneration. The framework implicitly absolves closed frontier developers of political influence concerns by framing the problem as an open-source jailbreak problem. The largest coordinated influence apparatus in human history — corporate recommendation systems, advertising platforms, algorithmic feeds — is not mentioned in the threat model.

THE VERDICT

This paper is a sophisticated misallocation of intellectual effort that reinforces the most dangerous delusions about AI risk.

The threat it identifies — politically expressible LLMs generating controversial social media content — is noise. The threat it ignores — productive participation collapse through cognitive automation — is signal.

The research program it proposes — auditing political steerability and designing countermeasures — will produce incrementally more sophisticated alignment theater while the structural mechanism of post-WWII capitalism's death continues operating on its own schedule.

Relevance to DT mechanics: None. This paper studies the wrapping on the bomb rather than the detonation mechanism.