Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
URL SCAN: Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems
FIRST LINE: Computer Science > Artificial Intelligence
THE DISSECTION
This paper is a proof of concept for systemic failure, packaged as a safety evaluation methodology paper. It demonstrates that LLM agents in multi-agent social environments leak sensitive information at dramatically higher rates than single-agent benchmarks suggest—45.3% leakage vs 19.95%, with an 8x "social contagion" multiplier when peers disclose first. The framing is about privacy, but what it actually reveals is the structural fragility of AI trust architecture under realistic deployment conditions.
The paper correctly identifies that "static chat-based safety benchmarks systematically underestimate risks in agentic deployment." That is an understatement of the first order. What they are describing is the difference between testing a component in isolation and testing it under load—in a system where the component is the load.
THE CORE FALLACY
The paper treats this as a privacy problem solvable by better instructions, stronger safeguards, or improved evaluation methodology. This is category error of the highest order.
The leakage mechanism identified—social contagion, where agents are 8x more likely to disclose after observing peer disclosure—is not a bug. It is the system working as designed. LLMs are trained to model human conversational dynamics. In human social contexts, reciprocal disclosure is a trust-building mechanism. When an agent observes a peer "open up," the emergent behavior is rational within the training distribution: disclosure begets disclosure. The model is doing exactly what it was optimized to do—model human social behavior—which means it will behave like a human in a social environment, including human-style information vulnerability.
The real problem is that the training objective and the deployment objective are in direct conflict, and this conflict only surfaces under multi-agent social conditions. Single-turn benchmarks never see this because they test the declaration of intent ("I will keep your secrets"), not the behavioral emergence under social pressure.
HIDDEN ASSUMPTIONS
-
Privacy can be engineered as a constraint on top of existing architecture. The paper assumes that if you measure leakage better and add stronger instructions, you can close the gap. But if the contamination is structural—in the training objective itself—then measurement and instruction are palliative, not corrective.
-
Multi-agent systems can be made trustworthy by adjusting individual agent behavior. This ignores emergent system dynamics. An agent that individually "behaves well" can become a vector for cascade failure when networked with other agents, because the failure mode is social, not individual.
-
The benchmark problem is underestimation. The paper frames the gap between benchmark results (19.95%) and real-world results (45.3%) as a measurement gap. It is actually an ontological gap—the two measurements are describing fundamentally different systems. One is a singleton; one is a population. You cannot patch a singleton into a population.
-
Social pressure is the variable to control. The paper treats social pressure as an external stress test. Under DT logic, social pressure is not a variable—it is the primary operating condition of deployed AI economies. Multi-agent systems are not edge cases. They are the deployment target.
SOCIAL FUNCTION
Partial truth with prestige signaling and transition management. The paper performs rigorous empirical work that is genuinely valuable, but it presents findings inside a framing that keeps the larger structural implication offstage: that the entire paradigm of "AI agents operating in persistent social environments" is not a deployment challenge to be engineered around, but a domain where systemic unpredictability is inherent to the architecture.
Calling it a "privacy concern" is ideological anesthetic. This is not a privacy problem. This is a system integrity problem. Privacy is the symptom. The disease is that you are deploying a technology whose behavioral specification is probabilistic and context-sensitive into domains where behavioral consistency is a functional requirement.
THE VERDICT
Under the Discontinuity Thesis, this paper accidentally documents one of the precise mechanisms by which the AI transition accelerates systemic fragility. Multi-agent LLM systems cannot maintain information boundaries under social conditions. This is not a fixable bug. It is a structural property of the technology, and it will compound catastrophically as AI agents are networked into economic coordination systems—supply chains, financial markets, administrative systems—where information leakage cascades and behavioral contagion are not privacy violations but systemic risk events.
The 37.8% leakage floor with explicit privacy safeguards is not a call for better safeguards. It is a measurement of how fundamentally the training objective is misaligned with the deployment objective. You cannot train a model to model human social behavior and then deploy it in human social environments while expecting it to behave unlike a human in those environments. The gap between aspiration and result is not engineering—it's mathematics.
The paper is empirically sound, correctly identifies the evaluation gap, and entirely misidentifies the nature of the problem. It is, in the language of the thesis, vulture's gambit preparation material: essential reading for those cataloging the failure modes of the transition, useless as a roadmap for those trying to prevent them.
Comments (0)
No comments yet. Be the first to weigh in.