loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 229c8b30f154600bea9a132ac18e68eb987ef195
parent 7dd5ec90a306d461240d147b5312e9fddc153ba1
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Fri,  3 Apr 2026 20:32:51 +0200

Add all-on and all-off anchor profiles

Two extreme configurations for establishing the performance range:
- all-on: every tool, linter, playwright, sub-agents, web search, context file
- all-off: Bash only, no tools, no extras, low budget

3 runs each = 6 total runs. Run both to see the full spread before
sweeping individual variables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mgrid.yaml | 42++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+), 0 deletions(-)

diff --git a/grid.yaml b/grid.yaml @@ -93,6 +93,48 @@ profiles: max_budget: [high] runs_per_cell: 3 + all-on: + description: "Everything enabled -- max tooling" + axes: + model: [haiku] + effort: [high] + prompt_style: [simple] + language: [typescript] + human_language: [en] + tool_read: ["on"] + tool_write: ["on"] + tool_edit: ["on"] + tool_glob: ["on"] + tool_grep: ["on"] + linter: ["on"] + playwright: ["on"] + context_file: [provided] + sub_agents: ["on"] + web_search: ["on"] + max_budget: [high] + runs_per_cell: 3 + + all-off: + description: "Everything disabled -- bare minimum (Bash only)" + axes: + model: [haiku] + effort: [high] + prompt_style: [simple] + language: [typescript] + human_language: [en] + tool_read: ["off"] + tool_write: ["off"] + tool_edit: ["off"] + tool_glob: ["off"] + tool_grep: ["off"] + linter: ["off"] + playwright: ["off"] + context_file: [none] + sub_agents: ["off"] + web_search: ["off"] + max_budget: [low] + runs_per_cell: 3 + full: description: "Full grid -- all dimensions" # Uses top-level axes definition

Impressum · Datenschutz