loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit d9ac9ec841282c2363015d0c156b3dedc60d7ce5
parent 4bb4cb64ef5efb430c6a2757632ee77720e7b741
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Mon,  6 Apr 2026 20:20:15 +0200

Fix main_effects for provider filtering

- Set baseline provider from --provider flag so baseline matches
- Apply exclusion rules in main_effects_plan (was missing, allowed
  invalid combos like provider=zai + model=haiku)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mharness/lib/experiment_design.py | 2++
Mharness/run.py | 2++
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/harness/lib/experiment_design.py b/harness/lib/experiment_design.py @@ -87,6 +87,8 @@ def main_effects_plan(grid, baseline=None, tasks=None): continue varied = dict(base_cell) varied[axis_name] = value + if _is_excluded(varied, grid): + continue key = _cell_key(task, varied) if key not in seen: seen.add(key) diff --git a/harness/run.py b/harness/run.py @@ -841,6 +841,8 @@ def main(): axes = {name: spec["values"] for name, spec in grid["axes"].items()} baseline = {name: values[0] for name, values in axes.items()} baseline["model"] = baseline_model + if provider_filter: + baseline["provider"] = provider_filter # Determine cell generation strategy if profile == "main_effects":

Impressum · Datenschutz