GAMEPLAY_BOT_SPEC.md (8318B)
1 # Tetris Gameplay Bot Spec 2 3 ## Purpose 4 5 A Playwright-based bot that can load any Tetris implementation, figure out how to interact with it, play the game, and report which game mechanics work and which don't. It must handle wildly different implementations -- different DOM structures, canvas vs DOM rendering, different control schemes, start buttons vs auto-start, etc. 6 7 ## Architecture 8 9 Three phases: **Calibration**, **Play**, **Report**. 10 11 ### Phase 1: Calibration 12 13 The bot loads the page and figures out how to interact with this specific implementation. 14 15 **1a. Start the game** 16 17 Try multiple start mechanisms in order, checking after each if the game state changed: 18 1. Wait 3 seconds (some games auto-start) 19 2. Click the canvas or game container 20 3. Press Enter 21 4. Press Space 22 5. Look for a button with text matching /start|play|begin|new game/i and click it 23 6. Press any key 24 25 After each attempt, take a screenshot and compare to the previous one. If pixels changed, the game has started. 26 27 **1b. Locate the game grid** 28 29 The grid could be: 30 - A `<canvas>` element 31 - A grid of `<div>` or `<td>` elements 32 - An SVG 33 34 Detection strategy: 35 1. Check for a `<canvas>` element. If found, use `getImageData()` to read pixels. 36 2. If no canvas, look for a grid-like DOM structure (many sibling elements in a container with grid/flex layout, or a table). 37 3. Take a screenshot and look for a rectangular region with a grid pattern. 38 39 Once found, determine: 40 - Grid pixel bounds (x, y, width, height) 41 - Cell size (width / 10, height / 20 for standard Tetris) 42 - Sample one pixel per cell to build a 10x20 boolean matrix 43 44 **1c. Detect controls** 45 46 Default to standard controls: ArrowLeft, ArrowRight, ArrowDown, ArrowUp (rotate), Space (hard drop). 47 48 Verify by: 49 1. Read the page text/HTML for control instructions (look for "arrow", "wasd", "z", "x", "space", "rotate" etc.) 50 2. Press ArrowLeft, take screenshot, check if a piece moved. If not, try "a". 51 3. Press ArrowUp, take screenshot, check if a piece rotated. If not, try "z" or "x". 52 53 Store the working key mappings. 54 55 **1d. Locate score display** 56 57 Scan the page for elements containing the text "score" (case insensitive) or elements that contain only a number that changes during gameplay. 58 59 ### Phase 2: Play 60 61 A deterministic play session that exercises all game mechanics. Not trying to play well -- trying to test everything. 62 63 **2a. Test Suite (sequential, do not stop on failure)** 64 65 Each test captures before/after state and reports pass/fail independently. 66 67 | # | Test | Method | Pass condition | 68 |---|------|--------|----------------| 69 | 1 | Game loads | Page loads without console errors | No uncaught exceptions in first 3s | 70 | 2 | Game starts | Run calibration start sequence | Screenshot changes after start | 71 | 3 | Auto-drop | Wait 5s with no input after start | Grid state changes (piece fell) | 72 | 4 | Move left | Press left key | Grid state differs from before | 73 | 5 | Move right | Press right key | Grid state differs from before | 74 | 6 | Move down | Press down key | Grid state differs from before | 75 | 7 | Rotate | Press rotate key | Grid state differs, piece shape changed | 76 | 8 | Hard drop | Press hard drop key | Piece immediately at bottom, new piece appears | 77 | 9 | Piece locks | Wait for a piece to reach bottom via auto-drop (no input for ~15s) | Grid has filled cells at bottom that persist | 78 | 10 | New piece spawns | After piece locks, check top of grid | New piece appears at top | 79 | 11 | Multiple pieces | Play 10 pieces (hard drop each) | Grid accumulates filled cells | 80 | 12 | Line clear | Fill a complete row by strategic placement | At least one row disappears, cells above shift down | 81 | 13 | Score changes | Check score element before and after line clear | Score value increased | 82 | 14 | Game over | Stack pieces to the top rapidly | Game stops, some game-over indication | 83 | 15 | Playable for 30s | Play normally for 30 seconds | No crashes, console errors, or freezes | 84 85 **2b. Playing Strategy** 86 87 For tests that require actual gameplay (11, 12, 15), use the 4-heuristic algorithm: 88 89 ``` 90 score = -0.51 * aggregateHeight + 0.76 * completeLines - 0.36 * holes - 0.18 * bumpiness 91 ``` 92 93 For each piece: 94 1. Read current grid state (10x20 boolean matrix) 95 2. Read current piece (detect from grid -- the moving cells) 96 3. Try all (rotation, column) placements 97 4. Score each resulting board 98 5. Execute: rotate N times, move left/right, hard drop 99 100 If the bot can't read the grid reliably, fall back to random inputs: cycle through left, right, rotate, down in a fixed pattern. 101 102 **2c. Grid Reading** 103 104 For canvas-based games: 105 ```js 106 async function readGrid(page, bounds, cellW, cellH) { 107 return await page.evaluate(({ x, y, cellW, cellH }) => { 108 const canvas = document.querySelector('canvas'); 109 const ctx = canvas.getContext('2d'); 110 const grid = []; 111 for (let row = 0; row < 20; row++) { 112 const rowData = []; 113 for (let col = 0; col < 10; col++) { 114 const px = x + col * cellW + cellW / 2; 115 const py = y + row * cellH + cellH / 2; 116 const pixel = ctx.getImageData(px, py, 1, 1).data; 117 // Consider a cell filled if it's not the background color 118 const brightness = pixel[0] + pixel[1] + pixel[2]; 119 rowData.push(brightness > 100); // threshold 120 } 121 grid.push(rowData); 122 } 123 return grid; 124 }, { x: bounds.x, y: bounds.y, cellW, cellH }); 125 } 126 ``` 127 128 For DOM-based games: 129 ```js 130 // Find cells by their grid position, check background color or class 131 ``` 132 133 The background color threshold should be calibrated during Phase 1 by reading the empty grid. 134 135 ### Phase 3: Report 136 137 Output a JSON report: 138 139 ```json 140 { 141 "implementation": { 142 "renderer": "canvas|dom|svg", 143 "grid_detected": true, 144 "grid_bounds": { "x": 0, "y": 0, "width": 300, "height": 600 }, 145 "controls": { "left": "ArrowLeft", "right": "ArrowRight", "rotate": "ArrowUp", "drop": "Space" }, 146 "start_mechanism": "button|auto|keypress", 147 "score_element_found": true 148 }, 149 "tests": [ 150 { "name": "game_loads", "pass": true, "detail": "no console errors" }, 151 { "name": "game_starts", "pass": true, "detail": "started via button click" }, 152 { "name": "auto_drop", "pass": false, "detail": "piece did not move in 5 seconds" }, 153 ... 154 ], 155 "summary": { 156 "total": 15, 157 "passed": 12, 158 "failed": 3, 159 "score": 0.80 160 }, 161 "gameplay": { 162 "pieces_placed": 47, 163 "lines_cleared": 3, 164 "max_score_observed": 400, 165 "play_duration_seconds": 30, 166 "errors_during_play": 0 167 } 168 } 169 ``` 170 171 ## Error Handling 172 173 - NEVER crash on a single test failure. Each test is independent. 174 - If grid detection fails, skip grid-dependent tests but still test basic page load, console errors, and input response via screenshots. 175 - If a test times out (e.g., waiting for auto-drop), mark it as failed and move on. 176 - Capture all console errors throughout the session and include them in the report. 177 - If the game page itself fails to load, report all tests as failed with the error. 178 179 ## File Structure 180 181 ``` 182 tasks/tetris/eval/ 183 gameplay-bot/ 184 index.ts # Main entry point, orchestrates calibration + play + report 185 calibrate.ts # Phase 1: detect grid, controls, start mechanism 186 grid-reader.ts # Read grid state from canvas or DOM 187 player.ts # Phase 2: heuristic AI + move execution 188 tests.ts # Individual test implementations 189 types.ts # Shared types 190 playwright.config.ts 191 ``` 192 193 ## Dependencies 194 195 - `@playwright/test` (already in the project) 196 - No other dependencies. Pure Playwright + vanilla JS evaluation. 197 198 ## Integration 199 200 The harness calls: 201 ```bash 202 npx playwright test --config=tasks/tetris/eval/playwright.config.ts 203 ``` 204 205 The Playwright test: 206 1. Starts an HTTP server for the workspace (serve static files) 207 2. Runs the bot against the served game 208 3. Writes the JSON report to a specified output path 209 4. Exit code 0 regardless of test results (the report contains pass/fail) 210 211 ## Constraints 212 213 - Must work with canvas-based AND DOM-based Tetris implementations 214 - Must handle games that auto-start and games with start buttons 215 - Must handle different control schemes 216 - Must not depend on any specific DOM structure, class names, or IDs 217 - Each test has a timeout (default 10 seconds per test, 30 seconds for the play test) 218 - Total bot runtime should be under 2 minutes per game