loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 68c55df2846dad69bc4b1247c072387f21e32909
parent b087659035807b7db06e48ad8dd9b9bd6d911aaa
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Mon,  6 Apr 2026 09:26:58 +0200

Wire SonarQube into eval pipeline

SonarQube scan runs automatically during evaluation when the scan
script exists and SonarQube is running at localhost:9000. Each run
gets a unique project key. Results stored in eval_results.sonarqube.

Gracefully skips if SonarQube isn't running (score=0 with error).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mharness/run.py | 16++++++++++++++++
1 file changed, 16 insertions(+), 0 deletions(-)

diff --git a/harness/run.py b/harness/run.py @@ -380,6 +380,22 @@ def evaluate(task_dir: Path, workspace: Path, cell: dict, run_dir: Path): "error": str(e), } + # SonarQube analysis (if SonarQube is running) + sonar_script = task_dir / "eval" / "sonarqube-scan.py" + if sonar_script.exists(): + # Use cell_id + run_number as unique project key + project_key = f"tetris-{run_dir.name}"[:250].replace("=", "-").replace("/", "-") + try: + result = subprocess.run( + ["python3", str(sonar_script), str(workspace), project_key], + capture_output=True, text=True, timeout=90, + ) + results["sonarqube"] = safe_parse_json(result.stdout.strip()) + except subprocess.TimeoutExpired: + results["sonarqube"] = {"error": "SonarQube scan timed out", "score": 0} + except Exception as e: + results["sonarqube"] = {"error": str(e), "score": 0} + # Compute weighted score from scoring.yaml try: scoring_file = task_dir / "scoring.yaml"

Impressum · Datenschutz