insights.astro - loop-benchmarking - Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.

insights.astro (1207B)

      1 ---
      2 import Base from "../layouts/Base.astro";
      3 import { loadAllRuns, projectRunForIndex } from "../lib/data";
      4 import Insights from "../components/Insights";
      5 import ScatterPlot from "../components/ScatterPlot";
      6 import Variability from "../components/Variability";
      7 
      8 const runs = loadAllRuns();
      9 
     10 // Project down to the fields these islands actually read, avoiding repeated
     11 // serialization of large eval_results payloads into the page HTML.
     12 const slimRuns = runs.map(projectRunForIndex);
     13 ---
     14 
     15 <Base title="Insights">
     16   <h1 style="margin-bottom: 8px;">Insights</h1>
     17   <p style="color: var(--text-muted); margin-bottom: 24px; font-size: 11px; text-transform: uppercase; letter-spacing: 0.5px;">
     18     Which variables move the needle? Where do weaker configs win? How consistent are the results?
     19   </p>
     20 
     21   <Variability client:load runs={slimRuns} />
     22 
     23   <div style="margin-top: 32px; display: grid; grid-template-columns: 1fr 1fr; gap: 16px;">
     24     <ScatterPlot client:load runs={slimRuns} defaultX="cost" defaultY="outcome" />
     25     <ScatterPlot client:load runs={slimRuns} defaultX="turns" defaultY="outcome" />
     26   </div>
     27 
     28   <div style="margin-top: 32px;">
     29     <Insights client:load runs={slimRuns} />
     30   </div>
     31 </Base>

	loop-benchmarking Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
	git clone https://git.shiptheloop.com/loop-benchmarking.git
	Log \| Files \| Refs \| README