Fix main_effects for provider filtering - loop-benchmarking - Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.

commit d9ac9ec841282c2363015d0c156b3dedc60d7ce5
parent 4bb4cb64ef5efb430c6a2757632ee77720e7b741
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Mon,  6 Apr 2026 20:20:15 +0200

Fix main_effects for provider filtering

- Set baseline provider from --provider flag so baseline matches
- Apply exclusion rules in main_effects_plan (was missing, allowed
  invalid combos like provider=zai + model=haiku)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
M harness/lib/experiment_design.py  | 2 ++
M harness/run.py  | 2 ++

2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/harness/lib/experiment_design.py b/harness/lib/experiment_design.py
@@ -87,6 +87,8 @@ def main_effects_plan(grid, baseline=None, tasks=None):
                     continue
                 varied = dict(base_cell)
                 varied[axis_name] = value
+                if _is_excluded(varied, grid):
+                    continue
                 key = _cell_key(task, varied)
                 if key not in seen:
                     seen.add(key)
diff --git a/harness/run.py b/harness/run.py
@@ -841,6 +841,8 @@ def main():
         axes = {name: spec["values"] for name, spec in grid["axes"].items()}
         baseline = {name: values[0] for name, values in axes.items()}
         baseline["model"] = baseline_model
+        if provider_filter:
+            baseline["provider"] = provider_filter
 
     # Determine cell generation strategy
     if profile == "main_effects":

	loop-benchmarking Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
	git clone https://git.shiptheloop.com/loop-benchmarking.git
	Log \| Files \| Refs \| README