loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit c83477982174d7a2e8f28ff283b3710b64dc5707
parent 73a420e39a0fea662055acc0face0660b7c4b03c
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Sat,  4 Apr 2026 06:49:04 +0200

93 good runs: 54 haiku, 36 sonnet, 3 opus

Cleaned 12 failed runs from token expiry. Main effects sweep nearly
complete for both haiku and sonnet baselines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diff is too large, output suppressed.

Impressum · Datenschutz