Wire SonarQube into eval pipeline - loop-benchmarking - Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.

commit 68c55df2846dad69bc4b1247c072387f21e32909
parent b087659035807b7db06e48ad8dd9b9bd6d911aaa
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Mon,  6 Apr 2026 09:26:58 +0200

Wire SonarQube into eval pipeline

SonarQube scan runs automatically during evaluation when the scan
script exists and SonarQube is running at localhost:9000. Each run
gets a unique project key. Results stored in eval_results.sonarqube.

Gracefully skips if SonarQube isn't running (score=0 with error).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
M harness/run.py  | 16 ++++++++++++++++

1 file changed, 16 insertions(+), 0 deletions(-)
diff --git a/harness/run.py b/harness/run.py
@@ -380,6 +380,22 @@ def evaluate(task_dir: Path, workspace: Path, cell: dict, run_dir: Path):
                     "error": str(e),
                 }
 
+    # SonarQube analysis (if SonarQube is running)
+    sonar_script = task_dir / "eval" / "sonarqube-scan.py"
+    if sonar_script.exists():
+        # Use cell_id + run_number as unique project key
+        project_key = f"tetris-{run_dir.name}"[:250].replace("=", "-").replace("/", "-")
+        try:
+            result = subprocess.run(
+                ["python3", str(sonar_script), str(workspace), project_key],
+                capture_output=True, text=True, timeout=90,
+            )
+            results["sonarqube"] = safe_parse_json(result.stdout.strip())
+        except subprocess.TimeoutExpired:
+            results["sonarqube"] = {"error": "SonarQube scan timed out", "score": 0}
+        except Exception as e:
+            results["sonarqube"] = {"error": str(e), "score": 0}
+
     # Compute weighted score from scoring.yaml
     try:
         scoring_file = task_dir / "scoring.yaml"

	loop-benchmarking Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
	git clone https://git.shiptheloop.com/loop-benchmarking.git
	Log \| Files \| Refs \| README