loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 0f44859d3eac85c7dc11272eacc4e57fda2b14c0
parent 3b81eb9246542dee665795aeb510ae2ced79f03b
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Thu, 16 Apr 2026 16:45:12 +0200

Project runs across all dashboard pages

Apply the projectRunForIndex pattern from index.astro to insights,
explore, compare, pca, and surprises pages. All four active pages
(insights, explore, compare, surprises) need only summary fields
already covered by projectRunForIndex -- no new projectors required.
pca.astro passes pre-computed JSON, not runs, so no change needed.

Before (raw / gzipped):
  insights:  34.0 MB /  ~3.1 MB
  explore:   50.8 MB /  ~5.1 MB
  compare:    8.5 MB /  ~800 KB
  surprises:  8.4 MB /  ~800 KB
  dist/ total: 344 MB

After (raw / gzipped):
  insights:   6.0 MB / 222 KB
  explore:    8.8 MB / 318 KB
  compare:    1.5 MB /  57 KB
  surprises:  1.4 MB /  55 KB
  dist/ total: 263 MB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mdashboard/src/pages/compare.astro | 7+++++--
Mdashboard/src/pages/explore.astro | 18+++++++++++-------
Mdashboard/src/pages/insights.astro | 14+++++++++-----
Mdashboard/src/pages/surprises.astro | 8++++++--
4 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/dashboard/src/pages/compare.astro b/dashboard/src/pages/compare.astro @@ -1,6 +1,6 @@ --- import Base from "../layouts/Base.astro"; -import { loadAllRuns, getAxisValues, getTaskNames, AXIS_NAMES } from "../lib/data"; +import { loadAllRuns, getAxisValues, getTaskNames, AXIS_NAMES, projectRunForIndex } from "../lib/data"; import type { Run, AxisName } from "../lib/data"; import VariabilityViolin from "../components/VariabilityViolin"; @@ -8,6 +8,9 @@ const runs = loadAllRuns(); const axisValues = getAxisValues(runs); const tasks = getTaskNames(runs); +// Projected slim runs for the client:load VariabilityViolin island. +const slimRuns = runs.map(projectRunForIndex); + // Build comparison data using cell-based aggregation. // A "cell" is a unique configuration (cell_id). Multiple runs share a cell_id // when they are repeat trials of the same config. Averaging per-cell first, @@ -201,6 +204,6 @@ for (const axis of AXIS_NAMES) { )} <div style="margin-top: 24px;"> - <VariabilityViolin client:load runs={runs} /> + <VariabilityViolin client:load runs={slimRuns} /> </div> </Base> diff --git a/dashboard/src/pages/explore.astro b/dashboard/src/pages/explore.astro @@ -1,6 +1,6 @@ --- import Base from "../layouts/Base.astro"; -import { loadAllRuns } from "../lib/data"; +import { loadAllRuns, projectRunForIndex } from "../lib/data"; import HeatmapMatrix from "../components/HeatmapMatrix"; import RadarComparison from "../components/RadarComparison"; import BumpChart from "../components/BumpChart"; @@ -9,6 +9,10 @@ import EfficiencyFrontier from "../components/EfficiencyFrontier"; import CorrelationMatrix from "../components/CorrelationMatrix"; const runs = loadAllRuns(); + +// Project down to the fields these islands actually read, avoiding repeated +// serialization of large eval_results payloads into the page HTML. +const slimRuns = runs.map(projectRunForIndex); --- <Base title="Explore"> @@ -18,18 +22,18 @@ const runs = loadAllRuns(); </p> <div style="display: flex; flex-direction: column; gap: 32px;"> - <CorrelationMatrix client:load runs={runs} /> + <CorrelationMatrix client:load runs={slimRuns} /> <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 16px;"> - <EfficiencyFrontier client:load runs={runs} /> - <BumpChart client:load runs={runs} /> + <EfficiencyFrontier client:load runs={slimRuns} /> + <BumpChart client:load runs={slimRuns} /> </div> - <HeatmapMatrix client:load runs={runs} /> + <HeatmapMatrix client:load runs={slimRuns} /> <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 16px;"> - <RadarComparison client:load runs={runs} /> - <ConfigTreemap client:load runs={runs} /> + <RadarComparison client:load runs={slimRuns} /> + <ConfigTreemap client:load runs={slimRuns} /> </div> </div> </Base> diff --git a/dashboard/src/pages/insights.astro b/dashboard/src/pages/insights.astro @@ -1,11 +1,15 @@ --- import Base from "../layouts/Base.astro"; -import { loadAllRuns } from "../lib/data"; +import { loadAllRuns, projectRunForIndex } from "../lib/data"; import Insights from "../components/Insights"; import ScatterPlot from "../components/ScatterPlot"; import Variability from "../components/Variability"; const runs = loadAllRuns(); + +// Project down to the fields these islands actually read, avoiding repeated +// serialization of large eval_results payloads into the page HTML. +const slimRuns = runs.map(projectRunForIndex); --- <Base title="Insights"> @@ -14,14 +18,14 @@ const runs = loadAllRuns(); Which variables move the needle? Where do weaker configs win? How consistent are the results? </p> - <Variability client:load runs={runs} /> + <Variability client:load runs={slimRuns} /> <div style="margin-top: 32px; display: grid; grid-template-columns: 1fr 1fr; gap: 16px;"> - <ScatterPlot client:load runs={runs} defaultX="cost" defaultY="outcome" /> - <ScatterPlot client:load runs={runs} defaultX="turns" defaultY="outcome" /> + <ScatterPlot client:load runs={slimRuns} defaultX="cost" defaultY="outcome" /> + <ScatterPlot client:load runs={slimRuns} defaultX="turns" defaultY="outcome" /> </div> <div style="margin-top: 32px;"> - <Insights client:load runs={runs} /> + <Insights client:load runs={slimRuns} /> </div> </Base> diff --git a/dashboard/src/pages/surprises.astro b/dashboard/src/pages/surprises.astro @@ -1,9 +1,13 @@ --- import Base from "../layouts/Base.astro"; -import { loadAllRuns } from "../lib/data"; +import { loadAllRuns, projectRunForIndex } from "../lib/data"; import SurprisesPage from "../components/SurprisesPage"; const runs = loadAllRuns(); + +// Project down to the fields SurprisesPage actually reads, avoiding +// serialization of large eval_results payloads into the page HTML. +const slimRuns = runs.map(projectRunForIndex); --- <Base title="Surprises"> @@ -12,5 +16,5 @@ const runs = loadAllRuns(); Where weaker configs outperformed stronger ones, and conventional assumptions broke down. </p> - <SurprisesPage client:load runs={runs} /> + <SurprisesPage client:load runs={slimRuns} /> </Base>

Impressum · Datenschutz