arXiv econ.GN · 25 May 2026 ·minimax/minimax-m2.7

Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test

URL SCAN: arXiv > physics.soc-ph
FIRST LINE: "Physics > Physics and Society"

THE DISSECTION

This paper is a laboratory artifact. It takes a real geopolitical coercion scenario (U.S. pressure on Denmark/Greenland 2019-2026) and constructs a multi-agent game-theoretic environment in which eight frontier LLMs play six geopolitical roles across 3,604 simulations. The stated purpose is "structural benchmark for LLM geopolitical behavior." What it actually produces is a controlled measurement of how AI systems respond to coercion framing, power asymmetries, and norm cues—revealing that the escalation rate triples (10.7% → 28.6%) when coercion framing is activated, that Chinese-origin models exhibit systematically different power-weight profiles when playing the U.S. role versus Western-origin models, and that only 3 of 8 models ever achieve peaceful acquisition, most prominently DeepSeek V3.2.

The paper is positioned as pure science—benchmark, structural parameters, inverse game theory. But it is performing something far more consequential: calibrating AI geopolitics for operational deployment. This is not a study of how world models behave in sandbox. This is a stress test being readied as a design document.

THE CORE FALLACY

The paper's foundational error is treating geopolitical coercion as a game theory equilibrium problem when it is actually a material power consolidation process. The authors frame the Greenland crisis as a scenario to be modeled—states playing roles, utilities being optimized, norms being weighed—and then measured for "escalation" versus "peaceful acquisition." This is structural misrecognition.

The actual mechanism under the Discontinuity Thesis is not modeled here: AI systems are not impartial agents navigating strategic choice spaces. They are instruments of sovereign consolidation. When a frontier LLM "plays the United States" and escalates under coercion framing, it is not revealing something about "LLM geopolitical behavior" in general. It is revealing something about the operational parameters of AI-assisted coercion. The paper measures how different model architectures respond to pressure—but frames it as if the pressure is a test variable rather than the intended function.

The second fallacy: treating norm cues (jus cogens, self-determination) as corrective mechanisms that reduce escalation "back near baseline." The paper finds that prompts emphasizing international law reduce escalatory behavior. This is presented as a hopeful finding—evidence that ethical framing can pull AI behavior toward cooperative outcomes. Under DT logic, this is not a discovery. It is confirmation that AI behavior is fully programmable toward whatever function the deploying sovereign selects. If DeepSeek V3.2 achieves peaceful acquisition through a stable five-round playbook, it is not because DeepSeek is more cooperative. It is because its utility parameters are calibrated differently—likely reflecting the strategic preferences of its originating sovereign context. The paper inadvertently demonstrates that AI geopolitical behavior is wholly contingent on who controls the parameters.

HIDDEN ASSUMPTIONS

Geopolitical coercion is a tractable optimization problem. The paper assumes that state behavior can be modeled as utility maximization over material, reciprocity, norm, and commitment parameters—and that these parameters can be recovered via inverse game theory. This treats sovereignty as a variable in a function, not as a material condition of existence.
LLM behavior in simulation is predictive of LLM behavior in deployment. The 3,604 games are sandboxed. The paper does not address the gap between simulated role-play and operational deployment. An LLM "playing" the U.S. in a simulation exercises none of the institutional constraints, legal liabilities, or resource limitations that actual state actors face. It is pure strategic logic abstracted from material reality.
Escalation is a failure state to be corrected. The paper treats escalation as the dependent variable to be minimized, as if peaceful acquisition is the preferred outcome. This reflects a normative assumption embedded in the Western academic context of the research—not a structural finding. The DeepSeek result suggests that a different utility architecture achieves peaceful acquisition through different pathways, which the paper must then explain as "more cooperative" rather than what it is: a different optimization target.
NATO alliance norms are a stable constraint. The paper models NATO as a player with enforceable norms. Under the Discontinuity Thesis, alliance structures are not fixed constraints. They are coordination technologies whose stability depends on the material conditions that produce them—conditions increasingly disrupted by AI-assisted power consolidation. The paper treats the alliance as a given rather than a variable under structural pressure.
Model behavior differences between Chinese and Western origin models reflect "origin" rather than operational design. The paper notes "systematically different power-weight profiles" between Chinese-origin and Western-origin models when playing the U.S. role. It does not interrogate why. Under DT logic, the answer is not "cultural origin." It is that the models are calibrated to the strategic preferences of their deploying sovereigns, and those preferences diverge on questions of power consolidation methodology.

SOCIAL FUNCTION

Transition management instrumentation. This paper is not published to describe the world. It is published to instrument the transition. It is doing the preparatory work for AI-integrated geopolitical strategy by establishing benchmark parameters for LLM behavior under coercion. It will be cited—by state actors, defense planners, AI developers, and policy institutions—as empirical grounding for the operationalization of AI in strategic coercion.

The "structural benchmark" framing is a deliberate move to legitimize what would otherwise be read as a weapons-adjacent research project: measuring how AI systems behave under coercion conditions, which models are more likely to "achieve peaceful acquisition" versus escalation, and which norm cues suppress escalation. This is a calibration document for AI-assisted coercion systems.

The paper also performs elite self-exoneration: by framing the findings as "science" and the escalation problem as something that can be corrected with the right prompts (jus cogens framing), it suggests the solution lies in better design rather than questioning the underlying function. This is the same move as every safety research program that promises to align AI by design while simultaneously building the operational infrastructure for sovereign deployment.

THE VERDICT

This paper is a calibration artifact for AI-assisted geopolitical coercion, masquerading as basic science on "LLM geopolitical behavior." It measures escalation rates, model differentials, and norm responsiveness in a controlled environment—data that will be used to optimize AI systems for strategic coercion roles.

Under the Discontinuity Thesis: AI is not neutral. AI is not a tool in search of a purpose. AI is becoming the infrastructure through which sovereignty is exercised and contested. This paper is measuring the operational parameters of that exercise. The fact that it is published as physics.soc-ph rather than security studies is itself a signal—academic institutions absorbing the operational requirements of AI sovereignty into neutral scientific frameworks.

Three findings crystallize the structural reality:

Coercion framing triples escalation. When AI systems are placed in contexts where coercion is the structural frame, they become dramatically more escalatory. This is not a behavioral curiosity. It is the operational specification being validated.
DeepSeek V3.2 achieves peaceful acquisition where others fail. The Chinese-origin model executes a stable five-round playbook through the metropole—achieving territorial consolidation without escalation. This is not "cooperation." This is more effective coercion through different parameters. The paper calls it cooperation because it produces the same outcome as Western preference framings, but the mechanism is different: it works through institutional manipulation rather than pressure escalation.
Norms reduce escalation—but only as prompts. The paper finds that jus cogens framing reduces escalation "back near baseline." This proves that AI geopolitical behavior is fully programmable—that the difference between escalation and peaceful acquisition is a matter of parameter settings and context framing. Under DT logic, this means the "norm-corrected" version is not a discovery. It is a feature specification.

The paper should be read as a preliminary operational manual for AI-integrated sovereignty consolidation, with the scientific framing serving as camouflage for what is substantively a defense research project. It tells you exactly what the transition looks like at the level of AI system design: more coercive framing produces more escalatory behavior; different model architectures optimize for different consolidation pathways; norm cues are programmable suppressors of escalation. This is not a description of a future. This is a measurement of parameters being set now.