loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 90cd4760c89de4e0c3848c36ebdb25249b1ae3ac
parent fec57ee83892b809897124d42887597de29fa9b8
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Mon,  6 Apr 2026 15:57:52 +0200

Grid expansion: 7 new axes, migrate all run IDs to abbreviated format

New axes: tests_provided, strategy (replaces sub_agents), design_guidance,
architecture, error_checking, context_noise, renderer. Playwright expanded
from on/off to off/available/instructed.

- Modular prompt builder with PROMPT_SNIPPETS (EN + ES) for all new axes
- Strategy axis controls Agent tool availability (replaces sub_agents on/off)
- VALUE_ABBREV dict shortens multi-word values in cell IDs
- Migrated all 173 existing runs to new abbreviated ID format
- Dashboard: updated types, data normalization for old schema, axis labels
- Migration script: harness/migrate-run-ids.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diff is too large, output suppressed.

Impressum · Datenschutz