ai-research-survey

Systematic scan of agentic development research. What's signal, what's noise.
git clone https://git.shiptheloop.com/ai-research-survey.git
Log | Files | Refs

commit 208801951bbe904415b4a651bd792bc95c8f9241
parent ddde6369343ce6a1c7129bb5e8093318815ae2a7
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Tue, 24 Mar 2026 06:51:58 +0100

Add explanatory descriptions to each tension section

Each tension now has 2 sentences below the title explaining what the
tension is and why it matters. E.g., Security Arms Race: "Defense
papers claim their mitigations work; attack papers show they can be
bypassed. Neither side engages seriously with the other."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mexplorer/src/views/tensions.ts | 9++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/explorer/src/views/tensions.ts b/explorer/src/views/tensions.ts @@ -1,33 +1,39 @@ import { loadTensions, type TensionClaim } from '../data'; -const TENSION_META: Record<string, { title: string; positive: string; nuanced: string }> = { +const TENSION_META: Record<string, { title: string; desc: string; positive: string; nuanced: string }> = { productivity: { title: 'Productivity Paradox', + desc: 'Does AI actually make developers faster? Studies claiming large speedups tend to use weaker methodology than those finding mixed or negative results. The only RCT with experienced developers found a 19% slowdown.', positive: 'AI increases productivity', nuanced: 'Effects are mixed or negative', }, benchmarks: { title: 'Benchmark Validity Crisis', + desc: 'Papers simultaneously build on benchmarks and distrust them. SOTA claims proliferate, but fewer than half of benchmark papers discuss whether their benchmark actually measures the claimed capability.', positive: 'Benchmark success = capability', nuanced: 'Benchmarks are flawed or gamed', }, agents: { title: 'Agent Capability Gap', + desc: 'Success claims outnumber limitation findings, but the limitations come from more rigorous papers. Agents succeed in sandboxes; failures are found in deployment. The gap between demo and production is the real story.', positive: 'Agents succeed at tasks', nuanced: 'Agents fail in deployment', }, security: { title: 'Security Arms Race', + desc: 'Defense papers claim their mitigations work; attack papers show they can be bypassed. Attacks outnumber defenses and the cycle repeats with each new technique. Neither side engages seriously with the other.', positive: 'Defenses work', nuanced: 'Attacks succeed', }, code_quality: { title: 'Code Quality Paradox', + desc: 'LLMs simultaneously repair bugs and introduce new ones. The same tools that fix code also generate insecure configurations, hallucinate APIs, and increase cognitive complexity. Which effect dominates depends on the task.', positive: 'LLMs improve code', nuanced: 'LLMs introduce defects', }, scaling: { title: 'Scaling Debate', + desc: 'Smaller efficient models claim to match larger ones at a fraction of the cost. But scaling skeptics find diminishing returns, prohibitive inference costs, and capability gaps that distillation cannot close. The field is split nearly evenly.', positive: 'Scaling is efficient', nuanced: 'Scaling hits limits', }, @@ -63,6 +69,7 @@ export async function renderTensions(app: HTMLElement) { return `<div class="tension-group section"> <h2>${meta.title}</h2> + <p style="font-size:0.85rem;color:var(--text-dim);margin-bottom:0.75rem">${meta.desc}</p> <div class="tension-stat">Positive claims: ${sides.positive.length} (mean score ${posMean}%) \u00b7 Nuanced claims: ${sides.nuanced.length} (mean score ${nuaMean}%)</div> ${renderButterfly(sides.positive, sides.nuanced, meta)}

Impressum · Datenschutz