loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 07408995fff0b06d234c4a19fdff6a2dae5b5028
parent baa0f5098d7e7bd291146968e751bbfe5ee255e4
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Thu, 16 Apr 2026 07:59:34 +0200

Remove 39 invalid glm-4.7 runs and add new sweep results

Purged zero-turn 429s from glm-4.7 sweep (Z.AI rate-limited the
model hard during a ~7.5h window). Also includes the successful
glm-4.7 runs from the same sweep and fresh glm-5.1 runs.

glm-5.1: 123 clean runs, 0 bad
glm-4.7: 55 clean runs retained, 39 bad removed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diff is too large, output suppressed.

Impressum · Datenschutz