loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit b8491ac11f96ce6c64fd1a0f448899f846076ea6
parent 9120b36fa6bb57482b2691ca494a03b8d266d8b0
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Sat,  4 Apr 2026 10:39:06 +0200

Add new haiku and sonnet runs (72 total, 0 bad)

57 haiku, 12 sonnet, 3 opus. All runs validated (no null costs,
no timeouts, no 1-turn failures). 70/72 have workspace artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diff is too large, output suppressed.

Impressum · Datenschutz