loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 711df365354d81be00d01bce2428e7c283e0ec2b
parent f801efc9b7f7880049fdeeeed53d55f0ecae5ecc
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Thu, 16 Apr 2026 11:58:46 +0200

Retag 176 pre-provider anthropic runs with prov=anth in cell_id

These runs were created before the 'provider' axis was introduced. The
earlier legacy migration added provider='anthropic' to each meta.json
but didn't regenerate cell_ids to include the prov=anth segment, leaving
them invisible to the current main_effects coverage check even though
the run data itself was intact.

This pass rebuilds each cell_id with the current AXIS_ABBREV/VALUE_ABBREV
logic, renames run and artifact directories, updates meta.json's cell_id
and run_id, and rewrites results/index.jsonl. Collisions where a
post-provider run already occupied the target slot were resolved by
bumping run_num (Option C: kept both as additional replicates).

Impact:
- haiku-4.5: 73 retagged
- sonnet-4.6: 52 retagged
- opus-4.6: 51 retagged
- 194 total anthropic runs preserved, none deleted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diff is too large, output suppressed.

Impressum · Datenschutz