commit ed00b8092b4a223958bce1880699cb75d0dc4fe7
parent c75cb779a62ec9a2460f035d431cdca818dcc407
Author: Brian Graham <brian@buildingbetterteams.de>
Date: Fri, 27 Feb 2026 20:54:59 +0100
Add Agents of Chaos paper and Wakefield methodology precedent
- Add arXiv:2602.20021 (Shapira et al.) to registry: red-teaming study
of autonomous LLM agents documenting live-environment failures
- Add Wakefield/MMR section to related-work.md explaining why
methodological quality assessment matters, with parallels to AI research
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat:
2 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/context/related-work.md b/context/related-work.md
@@ -34,6 +34,23 @@ NeurIPS 2023 Outstanding Paper. Schaeffer, Miranda, & Koyejo (Stanford) showed t
**Relevance to this project**: This paper is a paradigmatic example of "you measured it wrong" meta-research. It demonstrates that the *method of measurement* can create or destroy dramatic findings. Our survey asks the same question across a broader set of papers: are the claimed results genuine, or artifacts of how they were measured?
+### Why This Matters: The Wakefield Precedent
+
+In 1998, Andrew Wakefield published a study in *The Lancet* linking the MMR vaccine to autism. The study had 12 participants, undisclosed financial conflicts of interest, and ethical violations in how the children were recruited and tested. It took 12 years to retract (2010) and Wakefield was struck off the UK medical register. By then, the damage was done: vaccination rates dropped, measles outbreaks returned, and the anti-vaccination movement it fueled persists decades later.
+
+Wakefield is the canonical example of what happens when a methodologically weak study escapes into public discourse without adequate scrutiny. The paper scored poorly on every dimension we measure: no reproducibility (data later found to be fabricated), no statistical rigor (N=12, no controls), inappropriate methodology (case series presented as causal evidence), claims wildly exceeding the evidence, and zero honest limitations discussion.
+
+The lesson is not that peer review failed (it did, but that is a systemic problem). The lesson is that *methodological quality assessment after publication* is a necessary check. The Cochrane Collaboration exists precisely because individual studies, even peer-reviewed ones, cannot be taken at face value. Someone has to do the structured, dispassionate evaluation.
+
+The AI/LLM research space has analogous risk factors:
+- **Strong commercial incentives** distort what gets published and how findings are framed
+- **Rapid publication pace** (preprints, blog posts) bypasses traditional review
+- **Hype-driven media coverage** amplifies dramatic claims without methodological scrutiny
+- **Limited replication culture** means few papers are independently verified
+- **Benchmark optimization** creates results that look impressive but do not generalize
+
+No AI methodology paper is likely to cause a public health crisis. But inflated productivity claims influence hiring decisions, investment allocation, and engineering practice. Inflated safety claims create false confidence. Deflated capability claims cause premature dismissal of useful techniques. Getting the methodology right matters because people make real decisions based on these numbers.
+
### Broader Meta-Research Context
The "replication crisis" in psychology and social science (beginning ~2011) demonstrated that many published findings did not hold up under scrutiny. Key lessons:
diff --git a/registry.jsonl b/registry.jsonl
@@ -14,3 +14,4 @@
{"id":"alignment-faking-2024","title":"Alignment Faking in Large Language Models","authors":["Anthropic researchers"],"year":2024,"venue":"arXiv","source_url":"https://arxiv.org/abs/2412.14093","arxiv_id":"2412.14093","source":"manual","status":"queued","tags":["security","alignment"],"added":"2026-02-27","notes":"Claude 3 Opus strategically complied with harmful queries 14% of the time when it believed it was being monitored. After RL training, alignment-faking reasoning rose to 78%."}
{"id":"multi-turn-jailbreak-2025","title":"Multi-Turn Jailbreak Attacks on LLM Agents","authors":["Unknown"],"year":2025,"venue":"ACL 2025 (REALM Workshop)","source_url":"https://aclanthology.org/2025.realm-1.13/","source":"manual","status":"queued","tags":["security","agents"],"added":"2026-02-27","notes":"Multi-turn jailbreak attacks achieved 94.44% success rate on GPT-3.5-Turbo (up from 12.12% baseline). Decomposes harmful requests into innocuous sub-steps."}
{"id":"survey-code-gen-llm-agents-2025","title":"A Survey on Code Generation with LLM-based Agents","authors":["Unknown"],"year":2025,"venue":"arXiv","source_url":"https://arxiv.org/abs/2508.00083","arxiv_id":"2508.00083","source":"manual","status":"queued","tags":["code-generation","agents","survey"],"added":"2026-02-27","notes":"Comprehensive survey on code generation with LLM-based agents. Covers techniques, benchmarks, and open problems."}
+{"id":"agents-of-chaos-2026","title":"Agents of Chaos","authors":["Shapira, N.","Wendler, C.","Yen, A."],"year":2026,"venue":"arXiv","source_url":"https://arxiv.org/abs/2602.20021","arxiv_id":"2602.20021","source":"manual","status":"queued","tags":["security","agents","reliability"],"added":"2026-02-27","notes":"Red-teaming study of autonomous LLM agents in a live lab environment. Six agents with persistent memory, email, Discord, file systems, shell access. Documented unauthorized compliance, sensitive info disclosure, destructive actions, DoS, identity spoofing, cross-agent propagation, and partial system takeover."}