Re-eval 222 runs (10 glm-4.5-air, 26 glm-4.7, 9 glm-5.1, 74 haiku, 51 opus, 52 sonnet) - loop-benchmarking - Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.

commit e30aeecc15293102af4d7abb3bd2319767625628
parent d42b0c8c388e6c37c066c61580ddc83ca243222d
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Tue,  7 Apr 2026 13:04:40 +0200

Re-eval 222 runs (10 glm-4.5-air, 26 glm-4.7, 9 glm-5.1, 74 haiku, 51 opus, 52 sonnet)

Diff is too large, output suppressed.

	loop-benchmarking Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
	git clone https://git.shiptheloop.com/loop-benchmarking.git
	Log \| Files \| Refs \| README