Discard runs with 0 turns before eval/commit - loop-benchmarking - Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.

commit 7cad71376c1153a15b0ef0e87176d2c5ee230578
parent 7489b45ee21ecf7850b2b3f445534eae15f26b10
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Tue,  7 Apr 2026 10:39:16 +0200

Discard runs with 0 turns before eval/commit

If Claude produced nothing (num_turns=0), delete the run directory
immediately instead of evaluating and persisting garbage data.
Resume logic will retry the cell next time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
M harness/run.py  | 13 +++++++++++++

1 file changed, 13 insertions(+), 0 deletions(-)
diff --git a/harness/run.py b/harness/run.py
@@ -714,6 +714,19 @@ def run_single(
     meta["completed_at"] = datetime.now(timezone.utc).isoformat()
     (run_dir / "meta.json").write_text(json.dumps(meta, indent=2))
 
+    # Guard: if claude produced nothing (0 turns), discard the run
+    output_path = run_dir / "claude_output.json"
+    if output_path.exists():
+        try:
+            output = json.loads(output_path.read_text())
+            if (output.get("num_turns") or 0) == 0:
+                log(f"  DISCARD: {run_id} - 0 turns (no work done)")
+                shutil.rmtree(run_dir, ignore_errors=True)
+                shutil.rmtree(workspace, ignore_errors=True)
+                return "failed"
+        except Exception:
+            pass
+
     # Evaluate
     task_dir = project_dir / "tasks" / task
     evaluate(task_dir, workspace, cell, run_dir)

	loop-benchmarking Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
	git clone https://git.shiptheloop.com/loop-benchmarking.git
	Log \| Files \| Refs \| README