loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 17a4bada036386de83204177a3af6db3546666c3
parent bfd97a203969e55db8b901dd4e9779c27b575264
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Tue,  7 Apr 2026 07:27:33 +0200

Add spec for gameplay bot rewrite (falling piece detection)

Start detection based on detecting a falling piece instead of pixel
changes. Conditional phase execution to prevent false positives.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Atasks/tetris/eval/gameplay-bot/NEXT_SESSION_SPEC.md | 66++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+), 0 deletions(-)

diff --git a/tasks/tetris/eval/gameplay-bot/NEXT_SESSION_SPEC.md b/tasks/tetris/eval/gameplay-bot/NEXT_SESSION_SPEC.md @@ -0,0 +1,66 @@ +# Gameplay Bot Rewrite Spec + +## Problem +Bot has false positives because it thinks the game started when it didn't. +Current start detection clicks canvas and checks if any pixel changed -- +this triggers on title screens, hover effects, animations. + +## New Start Detection + +Universal signal: **a piece is falling**. After each trigger attempt, +run a falling piece detector instead of screenshot comparison. + +### Trigger sequence (try each, check for falling piece after each): +1. Wait 3s (auto-start) +2. Click canvas center +3. Press Enter +4. Press Space +5. Click body at various positions +6. Press various keys (arrow down, Z, etc.) + +### Falling piece detector: +- Take 3 screenshots ~1s apart +- Find a rectangular cluster of colored pixels (~4 cells) that moved downward +- "Roughly square-ish" -- tetromino bounding box is 2x2 to 4x1 +- May have rounded edges, glows, shadows -- look for the bounding box +- Works for canvas, DOM, SVG, WebGL -- any rendering approach +- If piece already at bottom, detect new piece spawning at top instead +- Consider: games might render pieces as individual DOM divs, SVG rects, + canvas fills, or WebGL quads + +### If no falling piece after all triggers: +- Game did not start +- All downstream tests: "skipped: game did not start" +- Zero false positives + +## Conditional Phase Execution + +Each phase depends on the previous succeeding: + +1. **Load + calibrate**: always runs +2. **Start detection**: try triggers, confirm falling piece +3. **Mechanics test**: only if game started (piece detected) +4. **Gameplay (play to win)**: only if mechanics worked +5. **Game over**: only if pieces can be placed. Must stack pieces to top + and verify via grid reader (filled cells in top rows), NOT screenshot comparison +6. **Endurance**: only if gameplay phase succeeded + +Failed prerequisites -> "skipped: [prerequisite] failed" on all downstream tests. +No more false positives from static screens. + +## Game Over Fix + +Current: screenshot comparison (nothing changed = game over). +This false-positives on static start screens. + +New: +1. Actually place pieces (hard drop repeatedly) +2. Verify via grid reader that filled cells reach top rows +3. Then check if inputs stop having effect (piece doesn't spawn) +4. Optionally look for "game over" text in DOM + +## Notes +- Games might auto-start (no button needed) +- Start buttons might be canvas-rendered (no DOM button to find) +- Some games have splash screens with animations (pixel change != game start) +- The key insight: a FALLING PIECE is the only universal signal that gameplay began

Impressum · Datenschutz