COMPETITIVE_PLAY_SPEC.md (7765B)
1 # Competitive Play Phase -- Bot Upgrade Spec 2 3 ## Overview 4 5 Add a new Phase 7 after the existing 16 pass/fail tests. This phase plays the game competitively for 60 seconds, recording detailed gameplay data. It's not pass/fail -- it produces metrics that reveal bugs the basic tests miss. 6 7 ## When it runs 8 9 - Only if Phase 4 (gameplay) succeeded: the bot can place pieces 10 - After Phase 6 (endurance), on a fresh page reload 11 - If Phase 4 failed, skip with empty competitive_play data 12 13 ## What it does 14 15 1. Reload the page, calibrate, start the game 16 2. Play using the AI player for 60 seconds (or until game over) 17 3. Record everything that happens 18 19 ## Data recorded (added to the gameplay bot report as `competitive_play`) 20 21 ```json 22 { 23 "duration_seconds": 45, 24 "pieces_placed": 62, 25 "total_lines_cleared": 18, 26 "single_clears": 12, 27 "double_clears": 2, 28 "triple_clears": 1, 29 "tetris_clears": 0, 30 "max_combo": 3, 31 "score_readings": [0, 100, 200, 500, 800, 1300, ...], 32 "score_final": 4200, 33 "score_increases": [100, 100, 300, 300, 500, ...], 34 "level_readings": [1, 1, 1, 2, 2, 3], 35 "level_final": 3, 36 "lines_display_readings": [0, 1, 2, 4, 5, 8, ...], 37 "game_over_reached": true, 38 "game_over_text_found": "Game Over", 39 "restart_available": true, 40 "next_piece_visible": true, 41 "speed_increased": true, 42 "bugs_detected": [ 43 "multi_line_clear_only_removes_one_row", 44 "score_does_not_scale_with_simultaneous_clears", 45 "level_does_not_increase" 46 ] 47 } 48 ``` 49 50 ## Bug detection logic 51 52 During play, the bot watches for specific anomalies: 53 54 ### 1. Multi-line clear bug 55 - When the grid reader detects 2+ complete rows simultaneously, watch how many rows disappear 56 - If only 1 row disappears when 2+ were complete, flag: `multi_line_clear_only_removes_one_row` 57 58 ### 2. Score scaling bug 59 - Track score before and after each line clear event 60 - For single clears, record the score delta 61 - For multi-line clears (2+ rows), check if the delta is larger than a single clear 62 - If multi-line clear gives the same delta as a single, flag: `score_does_not_scale_with_simultaneous_clears` 63 64 ### 3. Level progression bug 65 - Track score/lines and level readings throughout the session 66 - If lines_cleared reaches 10+ but level stays at 1, flag: `level_does_not_increase` 67 68 ### 4. Speed progression bug 69 - Measure time between auto-drops at the start vs after 10+ lines cleared 70 - If the interval doesn't decrease, flag: `speed_does_not_increase` 71 72 ### 5. Next piece preview 73 - Check for a "next piece" display area (look for a small canvas/div near the main grid showing a single piece) 74 - Record: `next_piece_visible: true/false` 75 76 ### 6. Game over handling 77 - When the grid fills to the top, check if: 78 - Game stops accepting input 79 - "Game Over" or similar text appears 80 - A restart button/prompt appears 81 - Record each separately 82 83 ### 7. Counter-clockwise rotation 84 - During play, occasionally press Z key instead of Up arrow 85 - Check if the piece rotates the opposite direction 86 - Record: `counter_clockwise_rotation_works: true/false` 87 88 ### 8. Soft drop vs hard drop 89 - Verify Down arrow moves piece one row (soft drop) vs Space drops to bottom (hard drop) 90 - If Down arrow drops to bottom same as Space, flag: `soft_drop_acts_as_hard_drop` 91 92 ## Implementation approach 93 94 ### New function: `runCompetitivePlayPhase()` 95 96 ```typescript 97 async function runCompetitivePlayPhase( 98 page: Page, 99 cal: CalibrationResult, 100 session: GameSession, 101 gameplay: GameplayStats 102 ): Promise<CompetitivePlayResult> { 103 const result: CompetitivePlayResult = { ... }; 104 105 // Play using AI with integrated monitoring 106 const startTime = Date.now(); 107 let lastScore = 0; 108 let lastLevel = 1; 109 let lastLines = 0; 110 111 // Use playGame but with a callback on each piece placement 112 // that reads score, level, lines, and checks for anomalies 113 114 // After each piece placement: 115 // 1. Read score element 116 // 2. Read grid for complete rows (before they clear) 117 // 3. Wait for clear animation 118 // 4. Read grid again (after clear) 119 // 5. Count how many rows actually disappeared 120 // 6. Compare to how many were complete 121 122 // Every 10 pieces, try a Z-key rotation 123 // Every 5 pieces, check level display 124 125 return result; 126 } 127 ``` 128 129 ### How to detect multi-line clears 130 131 The critical measurement. Between piece placements: 132 1. Read grid immediately after piece locks (before clear animation) 133 2. Count complete rows (all cells filled) 134 3. Wait 200-500ms for clear animation 135 4. Read grid again 136 5. Count how many rows actually disappeared 137 6. If complete_rows > disappeared_rows, it's the multi-line bug 138 139 The grid reader can count complete rows with `countCompleteRows()` which already exists in grid-reader.ts. 140 141 ### Score monitoring 142 143 During competitive play, read the score element on every piece placement (the gameplay phase already does this with integrated score tracking). Track every score delta. Group deltas by the number of lines cleared in that event. Check if deltas scale: 144 - Single clear delta D 145 - Double clear should be ~3x D 146 - Triple should be ~5x D 147 - Tetris should be ~8x D 148 149 Exact ratios depend on the game's scoring formula, but they should NOT all be equal. 150 151 ### Speed monitoring 152 153 Record timestamps of auto-drops (piece moving down without input). At level 1, the interval should be ~800ms. After level increases, it should decrease. Compare average intervals: first 10 pieces vs last 10 pieces. 154 155 ## Integration with existing report 156 157 Add `competitive_play` as a new field in the gameplay bot report, alongside `tests`, `implementation`, `gameplay`, `session`: 158 159 ```json 160 { 161 "implementation": { ... }, 162 "tests": [ ... ], 163 "summary": { ... }, 164 "gameplay": { ... }, 165 "session": { ... }, 166 "competitive_play": { ... } // NEW 167 } 168 ``` 169 170 ## Additional tests (new pass/fail tests added to the test suite) 171 172 The 8 bug checks become additional tests (17-24) with three possible outcomes: 173 - **pass**: we tested it and it works correctly 174 - **fail**: we tested it and found a bug 175 - **skip**: we didn't get an opportunity to test (e.g., no multi-line clear happened) 176 177 New test names: 178 - `multi_line_clear`: multiple complete rows clear simultaneously 179 - `score_scaling`: score increases proportionally with multi-line clears 180 - `level_progression`: level increases after clearing 10+ lines 181 - `speed_progression`: drop speed increases with level 182 - `next_piece_preview`: next piece display is visible 183 - `game_over_display`: game over message and restart option shown 184 - `counter_clockwise_rotation`: Z key rotates opposite to Up arrow 185 - `soft_drop_distinct`: Down arrow moves one row, not same as hard drop 186 187 These tests are appended to the existing 16. The total becomes up to 24 tests. Score scaling data is tracked regardless (score_readings, score_increases arrays) for analysis. 188 189 ## Dashboard display 190 191 On the run detail page, add a "Competitive Play" detail card showing: 192 - Duration, pieces placed, lines cleared breakdown 193 - Score progression (small sparkline) 194 - Bugs detected (red badges) 195 - Next piece / game over / restart status 196 197 ## Files to modify 198 199 1. `tests.ts` -- add Phase 7 (`runCompetitivePlayPhase`), add `CompetitivePlayResult` type, call it from `runAllTests`, include data in return value 200 2. `types.ts` -- add `CompetitivePlayResult` interface 201 3. `player.ts` -- may need a variant of `playGame` that calls back after each piece for monitoring (or just use the existing one and do monitoring externally via polling) 202 4. `index.ts` -- include competitive_play data in the report output 203 5. `dashboard/src/components/RunDetail.tsx` -- add competitive play detail card 204 205 ## What NOT to change 206 207 - The 16 existing tests and their pass/fail logic 208 - The gameplay bot score calculation updates to include new tests (out of 24 total) 209 - The grid reader or calibrate modules 210 - The AI player heuristics