loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 476b885bb47439b71b3967aaed0a7f15c66699d8
parent 6b806ab1e52180e1c3fdf0cefba9d1c9e4abd7a9
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Mon,  6 Apr 2026 19:09:39 +0200

Accept actual model names with --model for non-anthropic providers

--model glm-4.5-air with --provider zai now works. Harness reverse-maps
to the Claude CLI arg (haiku) internally. No more confusing --model haiku
when you mean GLM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mharness/run.py | 10+++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/harness/run.py b/harness/run.py @@ -829,11 +829,19 @@ def main(): sys.exit(1) # Build baseline override from --model flag + # For non-anthropic providers, accept actual model names (e.g., glm-4.5-air) + # and reverse-map to the Claude arg (e.g., haiku) baseline = None if baseline_model: + provider_cfg = (providers_config.get(provider_filter) or {}) + model_map = provider_cfg.get("model_map", {}) + reverse_map = {v: k for k, v in model_map.items()} + resolved_model = reverse_map.get(baseline_model, baseline_model) + if resolved_model != baseline_model: + print(f"Model: {baseline_model} (mapped to {resolved_model} for Claude CLI)") axes = {name: spec["values"] for name, spec in grid["axes"].items()} baseline = {name: values[0] for name, values in axes.items()} - baseline["model"] = baseline_model + baseline["model"] = resolved_model # Determine cell generation strategy if profile == "main_effects":

Impressum · Datenschutz