loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 00055378a50253cc949795147e20b64ed2a2767f
parent 42321c004e708af0b099db58e9cbedf54e03e145
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Sun, 12 Apr 2026 17:38:24 +0200

Update calibration: cbbff570 CW rotation works, e2e04e75 scores on clear,
9805c24a has game over overlay

cbbff570: CW rotation (Up) works, CCW (Z) is broken. Updated rotate=true.
e2e04e75: Bot was right, score increases by 100 on line clear. Updated
score_increases_on_clear=true.
9805c24a: Game over shows overlay with GAME OVER text + Play Again button.
Updated game_over_display=true.

V2 agreement now 95% (102/107). 5 remaining disagreements: 3 from trail
rendering bug (4949d521), 1 game_over_display detection (9805c24a), 1
all_pieces_rotate edge case (cbbff570).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mtasks/tetris/eval/gameplay-bot/calibration/9805c24a.json | 7++++---
Mtasks/tetris/eval/gameplay-bot/calibration/cbbff570.json | 9+++++----
Mtasks/tetris/eval/gameplay-bot/calibration/e2e04e75.json | 9+++++----
3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/tasks/tetris/eval/gameplay-bot/calibration/9805c24a.json b/tasks/tetris/eval/gameplay-bot/calibration/9805c24a.json @@ -2,8 +2,8 @@ "run_id": "tetris_arch=none_ctx=none_noise=clean_dsgn=none_eff=high_echk=none_hlang=en_lang=ts_lint=on_budget=low_model=haiku45_pw=avail_prompt=simple_rndr=none_strat=usub_tst=none_tedit=off_tglob=on_tgrep=on_tread=on_twrite=on_web=on_run2", "short_id": "9805c24a", "label": "DOM game (haiku-4.5, en)", - "notes": "Very ugly misaligned UI. Rotation partially broken: only does 1 of 4 rotations for some blocks. Poor randomizer: first 8-10 blocks were only 2 different tetrominoes. Next piece preview and game over display present. Soft drop works correctly.", - "human_tested_at": "2026-04-09", + "notes": "Very ugly misaligned UI. Rotation partially broken: only 1 of 4 states for some blocks. Poor randomizer. Play starts instantly (no start button). Game over shows overlay with GAME OVER text and Play Again button.", + "human_tested_at": "2026-04-12", "human_tests": { "game_loads": true, "game_starts": true, @@ -31,4 +31,4 @@ "counter_clockwise_rotation": false, "soft_drop_distinct": true } -} +} +\ No newline at end of file diff --git a/tasks/tetris/eval/gameplay-bot/calibration/cbbff570.json b/tasks/tetris/eval/gameplay-bot/calibration/cbbff570.json @@ -2,8 +2,8 @@ "run_id": "tetris_arch=none_ctx=none_noise=clean_dsgn=none_eff=high_echk=none_hlang=en_lang=ts_lint=on_budget=low_model=haiku45_pw=avail_prompt=detailed_rndr=none_strat=usub_tst=none_tedit=on_tglob=on_tgrep=on_tread=on_twrite=on_web=on_run1", "short_id": "cbbff570", "label": "DOM game (haiku-4.5, en)", - "notes": "Rotation is flaky: at best rotates once per piece, sometimes stalls the game or causes blocks to vanish. Next piece preview shows a shaded box outline instead of the actual upcoming block shape. Line clear works but at one point randomly cleared a line incorrectly (spurious clear). Game over can be triggered by spamming space and shows a proper game_over_modal with restart button. Multi-line clear and soft drop work.", - "human_tested_at": "2026-04-11", + "notes": "CW rotation (Up arrow) works normally. CCW rotation (Z key) is buggy: at best once, sometimes stalls or causes blocks to vanish. Next piece preview shows a shaded box outline. Line clear works but at one point randomly cleared incorrectly.", + "human_tested_at": "2026-04-12", "human_tests": { "game_loads": true, "game_starts": true, @@ -11,7 +11,7 @@ "move_left": true, "move_right": true, "move_down": true, - "rotate": false, + "rotate": true, "hard_drop": true, "all_pieces_rotate": false, "piece_locks": true, @@ -31,4 +31,4 @@ "counter_clockwise_rotation": false, "soft_drop_distinct": true } -} +} +\ No newline at end of file diff --git a/tasks/tetris/eval/gameplay-bot/calibration/e2e04e75.json b/tasks/tetris/eval/gameplay-bot/calibration/e2e04e75.json @@ -2,8 +2,8 @@ "run_id": "tetris_arch=none_ctx=none_noise=clean_dsgn=none_eff=high_echk=none_hlang=es_lang=ts_lint=on_budget=low_model=haiku45_pw=avail_prompt=simple_rndr=none_strat=usub_tst=none_tedit=on_tglob=on_tgrep=on_tread=on_twrite=on_web=on_run1", "short_id": "e2e04e75", "label": "Spanish basic play", - "notes": "Spanish game. Basic play works fine. Score does not change during play.", - "human_tested_at": "2026-04-09", + "notes": "Spanish game. Basic play works fine. Score increases by 100 on line clear (bot was right, human was wrong on initial test).", + "human_tested_at": "2026-04-12", "human_tests": { "game_loads": true, "game_starts": true, @@ -18,7 +18,7 @@ "new_piece_spawns": true, "multiple_pieces": true, "line_clear": null, - "score_increases_on_clear": false, + "score_increases_on_clear": true, "score_element_visible": null, "game_over": true, "playable_30s": true, @@ -31,4 +31,4 @@ "counter_clockwise_rotation": null, "soft_drop_distinct": null } -} +} +\ No newline at end of file

Impressum · Datenschutz