loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

REFACTOR_SPEC.md (30796B)


      1 # Two-Tier Refactor Spec: Driver + Bot
      2 
      3 ## Problem Statement
      4 
      5 The gameplay bot is ~3500 lines across 6 files, with two distinct concerns tangled
      6 together: understanding the webpage (finding grids, clicking buttons, reading pixels,
      7 sending keystrokes) and playing Tetris (phase orchestration, AI decisions, test
      8 derivation, bug detection). The boundary between them is blurred:
      9 
     10 - `calibrate.ts` handles grid detection, start mechanism detection, control detection,
     11   overlay detection, interactivity verification, screenshot sampling, visual change
     12   detection, and page surveying -- all in one 1300-line file.
     13 - `tests.ts` does phase orchestration, BUT ALSO calls `readGrid` directly during
     14   mechanics tests, reads score elements, detects game over text, measures drop
     15   intervals, detects next piece previews, and reads level displays.
     16 - `player.ts` calls both `readGrid` and `page.keyboard.press` directly, coupling
     17   AI logic to the Playwright API.
     18 - `grid-reader.ts` is the cleanest module but still exports low-level grid analysis
     19   utilities (bounding boxes, cell counts, piece identification) that the bot calls
     20   directly instead of going through an abstraction.
     21 
     22 The result: any change to how the page is read ripples through all files. You cannot
     23 test the AI player without a live Playwright page. You cannot swap the grid reader
     24 without touching the test orchestrator.
     25 
     26 ## Proposed Architecture
     27 
     28 ```
     29                  +------------------+
     30                  |    index.ts      |  Entry point: HTTP server, Playwright test,
     31                  |                  |  report output. Unchanged.
     32                  +--------+---------+
     33                           |
     34                           v
     35                  +------------------+
     36                  |     bot.ts       |  Layer 2: "The Brain"
     37                  |                  |  Phase orchestration, AI decisions, test
     38                  |                  |  derivation, competitive play, bug detection.
     39                  |                  |  Calls only the Driver interface.
     40                  +--------+---------+
     41                           |
     42                           v
     43                  +------------------+
     44                  |    driver.ts     |  Layer 1: "The Eyes and Hands"
     45                  |                  |  Abstracts the webpage. Exposes a clean API.
     46                  |                  |  Handles grid reading, start detection,
     47                  |                  |  control detection, keyboard input.
     48                  +--------+---------+
     49                           |
     50                 +---------+---------+
     51                 |                   |
     52                 v                   v
     53         +-----------+       +------------+
     54         | types.ts  |       | player.ts  |  Pure Tetris logic: AI heuristics,
     55         |           |       |            |  board simulation, placement finding.
     56         +-----------+       |            |  NO Playwright imports. NO page access.
     57                             +------------+
     58 ```
     59 
     60 ### What goes where
     61 
     62 **driver.ts** -- "I can see and interact with this webpage"
     63 - Grid detection (finding the grid on the page)
     64 - Grid reading (10x20 boolean matrix from canvas/DOM/SVG)
     65 - Start mechanism detection (the 5-phase cascade)
     66 - Control detection (which keys the game responds to)
     67 - Score/level/lines reading
     68 - Keyboard input (move, rotate, drop)
     69 - Screenshot capture
     70 - Interactivity verification
     71 - Page surveying (pre-test data collection)
     72 - Background color sampling
     73 - Visual change detection
     74 - Next piece preview detection
     75 - Game over text detection
     76 - Re-calibration
     77 
     78 **bot.ts** -- "I know Tetris rules and test logic"
     79 - Phase orchestration (the 8 conditional phases)
     80 - Test derivation from session data (the 24 tests)
     81 - Score/timing/event tracking (GameSession bookkeeping)
     82 - Competitive play with bug detection
     83 - Line clear detection logic (watching grid state transitions)
     84 - Game over triggering strategy (stack pieces to fill grid)
     85 - Endurance testing
     86 - Report assembly (BotReport construction)
     87 
     88 **player.ts** -- "I know where to put pieces" (pure computation, no I/O)
     89 - 4-heuristic scoring (aggregate height, lines, holes, bumpiness)
     90 - Piece definitions (rotations, dimensions)
     91 - Board simulation (drop piece, clear lines)
     92 - Best placement finding
     93 - No `Page` import, no `readGrid` call, no `keyboard.press`
     94 
     95 **types.ts** -- unchanged, all interfaces stay
     96 
     97 **grid-reader.ts** -- absorbed into driver.ts (see migration plan)
     98 
     99 **index.ts** -- unchanged except it calls bot.ts instead of tests.ts
    100 
    101 ---
    102 
    103 ## Driver Interface
    104 
    105 ```typescript
    106 import type { Page } from "@playwright/test";
    107 import type {
    108   Grid,
    109   GridBounds,
    110   RendererType,
    111   Controls,
    112   StartMechanism,
    113   SurveyData,
    114   PieceType,
    115 } from "./types";
    116 
    117 // ---------------------------------------------------------------------------
    118 // Configuration returned by calibration, passed through subsequent calls.
    119 // Replaces CalibrationResult for internal use within the Driver.
    120 // ---------------------------------------------------------------------------
    121 
    122 export interface DriverCalibration {
    123   renderer: RendererType;
    124   gridDetected: boolean;
    125   gridBounds: GridBounds | null;
    126   cellWidth: number;
    127   cellHeight: number;
    128   controls: Controls;
    129   startMechanism: StartMechanism;
    130   scoreElementSelector: string | null;
    131   levelElementSelector: string | null;
    132   backgroundColor: [number, number, number] | null;
    133   consoleErrors: string[];
    134   gridConfidence: number;
    135   startButton?: {
    136     selector: string;
    137     text: string;
    138     disappeared: boolean;
    139     position: { x: number; y: number };
    140   };
    141 }
    142 
    143 // ---------------------------------------------------------------------------
    144 // Grid snapshot: the grid state plus derived information the bot needs.
    145 // ---------------------------------------------------------------------------
    146 
    147 export interface GridSnapshot {
    148   /** The 10x20 boolean grid. null if reading failed. */
    149   grid: Grid | null;
    150   /** Total filled cells. 0 if grid is null. */
    151   filledCount: number;
    152   /** Filled cells in the bottom N rows. */
    153   filledInBottom(rows: number): number;
    154   /** Whether any cell in the top N rows is filled. */
    155   hasFilledInTop(rows: number): boolean;
    156   /** Number of fully complete rows. */
    157   completeRows: number;
    158   /** Active piece cells (diff against settled grid). null if undetectable. */
    159   activePieceCells: [number, number][] | null;
    160   /** Identified piece type from active piece cells. null if no active piece. */
    161   activePieceType: PieceType | null;
    162 }
    163 
    164 // ---------------------------------------------------------------------------
    165 // The Driver interface. This is what the Bot sees.
    166 // ---------------------------------------------------------------------------
    167 
    168 export interface TetrisDriver {
    169   // -- Lifecycle --
    170 
    171   /**
    172    * Navigate to the game URL, wait for load, begin console error collection.
    173    * Returns false if the page failed to load.
    174    */
    175   loadPage(url: string): Promise<{ loaded: boolean; detail: string; errorsOnLoad: number }>;
    176 
    177   /**
    178    * Survey the page structure before any interaction.
    179    * Returns information about overlays, canvas elements, DOM grids, visible text.
    180    */
    181   surveyPage(): Promise<SurveyData>;
    182 
    183   /**
    184    * Run full calibration: grid detection, start mechanism detection,
    185    * control detection, score element detection, grid confidence measurement.
    186    * Includes re-calibration fallback if initial detection fails.
    187    * Never throws.
    188    */
    189   calibrate(): Promise<DriverCalibration>;
    190 
    191   /**
    192    * Re-run calibration after the game state may have changed
    193    * (e.g., after starting, grid might appear that wasn't there before).
    194    * Keeps the current calibration if re-calibration finds nothing better.
    195    */
    196   recalibrate(): Promise<DriverCalibration>;
    197 
    198   /**
    199    * Get the current calibration. Throws if calibrate() hasn't been called.
    200    */
    201   getCalibration(): DriverCalibration;
    202 
    203   // -- Grid Reading --
    204 
    205   /**
    206    * Read the current grid state. Returns a GridSnapshot with the raw grid
    207    * and derived metrics. If settled grid is provided, active piece detection
    208    * is diffed against it.
    209    *
    210    * Returns a snapshot with grid: null if reading fails.
    211    */
    212   readGrid(settledGrid?: Grid | null): Promise<GridSnapshot>;
    213 
    214   /**
    215    * Compare two grids for equality. True if they differ.
    216    */
    217   gridsAreDifferent(a: Grid | null, b: Grid | null): boolean;
    218 
    219   // -- Input --
    220 
    221   /**
    222    * Press a game control key. Uses the controls detected during calibration.
    223    */
    224   pressKey(action: "left" | "right" | "down" | "rotate" | "drop"): Promise<void>;
    225 
    226   /**
    227    * Press an arbitrary key (for testing CCW rotation with 'z', etc.).
    228    */
    229   pressRawKey(key: string): Promise<void>;
    230 
    231   /**
    232    * Wait for a specified duration (milliseconds).
    233    */
    234   wait(ms: number): Promise<void>;
    235 
    236   // -- Score/Level/Lines Reading --
    237 
    238   /**
    239    * Read the current score from the detected score element.
    240    * Returns null if no score element was found or reading fails.
    241    */
    242   readScore(): Promise<number | null>;
    243 
    244   /**
    245    * Read the current level from the page.
    246    * Returns null if no level display found or reading fails.
    247    */
    248   readLevel(): Promise<number | null>;
    249 
    250   // -- Page State Queries --
    251 
    252   /**
    253    * Check if "Game Over" (or equivalent) text is visible on the page.
    254    * Returns the matched text, or null if not found.
    255    */
    256   detectGameOverText(): Promise<string | null>;
    257 
    258   /**
    259    * Check if a restart button/prompt is visible.
    260    */
    261   detectRestartOption(): Promise<boolean>;
    262 
    263   /**
    264    * Check if a next piece preview display exists.
    265    */
    266   detectNextPiecePreview(): Promise<boolean>;
    267 
    268   /**
    269    * Get all console errors collected since loadPage() was called.
    270    */
    271   getConsoleErrors(): string[];
    272 
    273   // -- Screenshots --
    274 
    275   /**
    276    * Take a screenshot. Returns raw PNG buffer.
    277    */
    278   screenshot(): Promise<Buffer>;
    279 
    280   /**
    281    * Measure the auto-drop interval (time between gravity-driven grid changes
    282    * with no input). Returns average interval in ms, or 0 if unmeasurable.
    283    */
    284   measureDropInterval(): Promise<number>;
    285 }
    286 ```
    287 
    288 ### Method-to-Source Mapping
    289 
    290 Each Driver method maps to existing code as follows:
    291 
    292 | Driver Method | Current Source | Current Function(s) |
    293 |---|---|---|
    294 | `loadPage()` | tests.ts:277-303 | `loadAndCheckPage()`, `loadGamePage()` |
    295 | `surveyPage()` | calibrate.ts:1300-1393 | `surveyPage()` |
    296 | `calibrate()` | calibrate.ts:24-94 | `calibrate()`, `detectGrid()`, `detectStartMechanism()`, `detectControls()`, `detectScoreElement()`, `measureGridConfidence()` |
    297 | `recalibrate()` | tests.ts:152-163 | inline re-calibration after start |
    298 | `readGrid()` | grid-reader.ts:15-38, 46-118, 142-364 | `readGrid()`, `readCanvasGrid()`, `readDomGrid()`, plus `countFilled()`, `countFilledInBottomRows()`, `hasFilledInTopRows()`, `countCompleteRows()`, `detectActivePieceCells()`, `identifyPieceType()` |
    299 | `gridsAreDifferent()` | grid-reader.ts:400-410 | `gridsAreDifferent()` |
    300 | `pressKey()` | player.ts:251-277 | inline `page.keyboard.press()` calls using `cal.controls` |
    301 | `pressRawKey()` | tests.ts:841-842 | inline `page.keyboard.press("z")` |
    302 | `wait()` | everywhere | `page.waitForTimeout()` |
    303 | `readScore()` | tests.ts:490-497, 529-538, 743-749 | inline score element reading |
    304 | `readLevel()` | tests.ts:1597-1630 | `readLevelFromPage()` |
    305 | `detectGameOverText()` | tests.ts:929-940 | inline `page.evaluate()` for game over text |
    306 | `detectRestartOption()` | tests.ts:943-955 | inline `page.evaluate()` for restart buttons |
    307 | `detectNextPiecePreview()` | tests.ts:1669-1717 | `detectNextPiecePreview()` |
    308 | `getConsoleErrors()` | tests.ts:94-98 | `consoleErrors` array |
    309 | `screenshot()` | player.ts:370-371 | `page.screenshot()` |
    310 | `measureDropInterval()` | tests.ts:1636-1664 | `measureDropInterval()` |
    311 
    312 ### How the Driver handles different renderers
    313 
    314 The Driver encapsulates renderer differences entirely. The Bot never knows or cares
    315 whether the game uses canvas, DOM, SVG, or WebGL.
    316 
    317 ```
    318 readGrid() internally:
    319   if renderer === "canvas" && gridBounds:
    320     -> readCanvasGrid() via page.evaluate(getImageData)
    321   if renderer === "dom":
    322     -> readDomGrid() via page.evaluate(DOM traversal)
    323   if renderer === "svg":
    324     -> future: readSvgGrid()
    325   fallback:
    326     -> try canvas if bounds exist, then try DOM
    327 ```
    328 
    329 The `GridSnapshot` returned to the Bot is always the same shape regardless of renderer.
    330 
    331 ### Re-calibration
    332 
    333 The Driver maintains mutable internal state:
    334 
    335 ```typescript
    336 class PlaywrightDriver implements TetrisDriver {
    337   private page: Page;
    338   private cal: DriverCalibration | null = null;
    339   private consoleErrors: string[] = [];
    340 }
    341 ```
    342 
    343 `recalibrate()` re-runs grid detection and start detection, but preserves
    344 the existing calibration if the new one is worse (e.g., grid detection fails
    345 on re-calibration but worked initially). This handles:
    346 
    347 - Games where the grid appears only after clicking "Start"
    348 - Games where the grid is rebuilt on game restart (new DOM elements)
    349 - Games where the canvas resizes after initialization
    350 
    351 ### Error handling
    352 
    353 | Scenario | Driver behavior |
    354 |---|---|
    355 | Grid read returns null | `readGrid()` returns `GridSnapshot` with `grid: null`, `filledCount: 0` |
    356 | Grid read throws | Same as null -- caught internally, never thrown to Bot |
    357 | No score element found | `readScore()` returns `null` |
    358 | Score element disappeared | `readScore()` returns `null` (caught internally) |
    359 | Console error during play | Accumulated in `consoleErrors`, accessible via `getConsoleErrors()` |
    360 | Page navigation fails | `loadPage()` returns `{ loaded: false, detail: "..." }` |
    361 | Canvas getImageData all zeros (no GPU) | Grid validation rejects (>60% filled), returns null |
    362 | Calibration finds nothing | Returns calibration with `gridDetected: false`, `startMechanism: "unknown"` |
    363 
    364 The Driver never throws. All errors are represented in return values.
    365 
    366 ---
    367 
    368 ## Bot Interface
    369 
    370 ### How the Bot calls the Driver
    371 
    372 The Bot receives a `TetrisDriver` instance. It never imports `Page` or
    373 anything from Playwright. It never calls `page.evaluate()`, `page.keyboard`,
    374 or `page.screenshot()` directly.
    375 
    376 ```typescript
    377 // bot.ts
    378 import type { TetrisDriver, DriverCalibration, GridSnapshot } from "./driver";
    379 import type {
    380   TestResult,
    381   GameplayStats,
    382   GameSession,
    383   CompetitivePlayResult,
    384   SurveyData,
    385   BotReport,
    386   Grid,
    387 } from "./types";
    388 import { findBestPlacement } from "./player";
    389 
    390 export async function runAllTests(
    391   driver: TetrisDriver,
    392   serverUrl: string
    393 ): Promise<{
    394   testResults: TestResult[];
    395   calibration: DriverCalibration;
    396   gameplay: GameplayStats;
    397   session: GameSession;
    398   survey: SurveyData;
    399   competitivePlay: CompetitivePlayResult | null;
    400 }> {
    401   // Phase 1: Load
    402   const loadResult = await driver.loadPage(serverUrl);
    403   // ...
    404 
    405   // Phase 2: Calibrate
    406   const cal = await driver.calibrate();
    407   // ...
    408 
    409   // Phase 3-8: Use only driver.readGrid(), driver.pressKey(), etc.
    410 }
    411 ```
    412 
    413 ### Phase execution flow using Driver methods
    414 
    415 **Phase 1: Page Load**
    416 ```
    417 driver.loadPage(url) -> { loaded, detail, errorsOnLoad }
    418 driver.wait(3000)
    419 ```
    420 
    421 **Phase 2: Calibrate + Start**
    422 ```
    423 survey = driver.surveyPage()
    424 cal = driver.calibrate()
    425   // Internally: detectStartMechanism(), detectGrid(), etc.
    426 if cal.startMechanism === "unknown" || !cal.gridDetected:
    427   cal = driver.recalibrate()
    428 ```
    429 
    430 **Phase 3: Basic Mechanics**
    431 ```
    432 // Auto-drop test
    433 snap0 = driver.readGrid()
    434 driver.wait(5000)
    435 snap1 = driver.readGrid()
    436 gridChanged = driver.gridsAreDifferent(snap0.grid, snap1.grid)
    437 
    438 // Movement tests
    439 for dir in [left, right, down]:
    440   snapBefore = driver.readGrid()
    441   driver.pressKey(dir)
    442   driver.wait(300)
    443   snapAfter = driver.readGrid()
    444   // compare
    445 
    446 // Rotation test
    447 snapBefore = driver.readGrid()
    448 driver.pressKey("rotate")
    449 driver.wait(300)
    450 snapAfter = driver.readGrid()
    451 // compare bounding boxes of active piece cells
    452 
    453 // Hard drop test
    454 driver.pressKey("drop")
    455 driver.wait(500)
    456 snapAfter = driver.readGrid()
    457 // check bottom rows
    458 ```
    459 
    460 **Phase 4: Piece Lifecycle**
    461 ```
    462 // Already tested during Phase 3 mechanics
    463 // Piece locks: bottom cells persist across reads
    464 // New piece spawns: top rows have cells after drop
    465 // Multiple pieces: piecesLocked counter >= 3
    466 ```
    467 
    468 **Phase 5: Gameplay**
    469 ```
    470 driver.loadPage(url)
    471 cal = driver.calibrate()
    472 initialScore = driver.readScore()
    473 // Play loop (60 pieces / 45s):
    474 while pieces < 60 && elapsed < 45s:
    475   snap = driver.readGrid(settledGrid)
    476   if snap.activePieceCells:
    477     placement = findBestPlacement(settledGrid, snap.activePieceType)
    478     // Execute placement using driver.pressKey()
    479     for i in 0..placement.rotations:
    480       driver.pressKey("rotate")
    481       driver.wait(50)
    482     // Move to column
    483     driver.pressKey("left" or "right") * N
    484     driver.pressKey("drop")
    485     driver.wait(100)
    486     settledGrid = (await driver.readGrid()).grid
    487   driver.wait(60)
    488 finalScore = driver.readScore()
    489 ```
    490 
    491 **Phase 6: Game Over**
    492 ```
    493 driver.loadPage(url)
    494 driver.calibrate()
    495 // Hard drop 40 times, checking grid after every 5
    496 for i in 0..40:
    497   driver.pressKey("drop")
    498   driver.wait(150)
    499   if i % 5 === 0:
    500     snap = driver.readGrid()
    501     if snap.hasFilledInTop(4):
    502       driver.pressKey("drop")
    503       driver.wait(300)
    504       snap2 = driver.readGrid()
    505       if !driver.gridsAreDifferent(snap.grid, snap2.grid):
    506         // Game over detected
    507 gameOverText = driver.detectGameOverText()
    508 ```
    509 
    510 **Phase 7: Endurance**
    511 ```
    512 driver.loadPage(url)
    513 driver.calibrate()
    514 // Play for 30 seconds using same play loop as Phase 5
    515 ```
    516 
    517 **Phase 8: Competitive Play**
    518 ```
    519 driver.loadPage(url)
    520 driver.calibrate()
    521 initialDropInterval = driver.measureDropInterval()
    522 initialLevel = driver.readLevel()
    523 // Play for 60 seconds with detailed tracking
    524 // Every 5th poll: driver.readScore()
    525 // Every 10th poll: driver.readLevel()
    526 // Periodic: driver.pressRawKey("z") for CCW test
    527 // Periodic: soft drop test via driver.pressKey("down")
    528 finalDropInterval = driver.measureDropInterval()
    529 nextPieceVisible = driver.detectNextPiecePreview()
    530 gameOverText = driver.detectGameOverText()
    531 restartAvailable = driver.detectRestartOption()
    532 ```
    533 
    534 ### Test derivation
    535 
    536 `deriveTestResults()` stays in bot.ts. It receives the `GameSession` data
    537 that the Bot accumulated during phases, and produces the 24 `TestResult[]` array.
    538 It does not need the Driver at all -- it operates on pure data.
    539 
    540 The function signature is unchanged:
    541 
    542 ```typescript
    543 function deriveTestResults(
    544   session: GameSession,
    545   cal: DriverCalibration,
    546   loadResult: LoadResult,
    547   consoleErrors: string[],
    548   gameplay: GameplayStats,
    549   phaseState: PhaseState,
    550   competitivePlay: CompetitivePlayResult | null
    551 ): TestResult[]
    552 ```
    553 
    554 ### Where the AI player logic lives
    555 
    556 `player.ts` becomes a pure computation module. It keeps:
    557 
    558 - `PIECES` definitions
    559 - `findBestPlacement()` (exported)
    560 - `findBestPlacementGeneric()`
    561 - `simulateDropPiece()`
    562 - `clearLines()`
    563 - `aggregateHeight()`, `countHoles()`, `bumpiness()`
    564 - `stripActivePiece()` (exported)
    565 - `Placement` interface (exported)
    566 
    567 It loses:
    568 
    569 - `playGame()` -- moves to bot.ts (it orchestrates grid reads + AI + key presses)
    570 - `hardDrop()` -- replaced by `driver.pressKey("drop")`
    571 - `playRandomMove()` -- moves to bot.ts
    572 - `playRandomForDuration()` -- moves to bot.ts
    573 - `tryFillRow()` -- moves to bot.ts
    574 - `stackToGameOver()` -- moves to bot.ts
    575 - `executePlacement()` -- moves to bot.ts (it calls driver.pressKey)
    576 - `countTotalFilled()` -- redundant with GridSnapshot.filledCount
    577 
    578 After refactor, `player.ts` has zero Playwright imports.
    579 
    580 ---
    581 
    582 ## Migration Plan
    583 
    584 ### New files created
    585 
    586 | File | Purpose | Est. lines |
    587 |---|---|---|
    588 | `driver.ts` | TetrisDriver interface + PlaywrightDriver implementation | ~900 |
    589 | `bot.ts` | Phase orchestration, play loops, test derivation | ~1100 |
    590 
    591 ### Files modified
    592 
    593 | File | Change |
    594 |---|---|
    595 | `player.ts` | Remove all Playwright-dependent functions, keep pure AI logic | ~350 -> ~250 |
    596 | `types.ts` | Add `DriverCalibration`, `GridSnapshot` interfaces (or keep in driver.ts). Minor additions. | ~205 -> ~220 |
    597 | `index.ts` | Change import from `tests.ts` to `bot.ts`, instantiate `PlaywrightDriver`, pass to `runAllTests`. | ~260 -> ~270 |
    598 
    599 ### Files deleted
    600 
    601 | File | Reason |
    602 |---|---|
    603 | `calibrate.ts` | Absorbed into `driver.ts` |
    604 | `grid-reader.ts` | Absorbed into `driver.ts` |
    605 | `tests.ts` | Replaced by `bot.ts` |
    606 
    607 ### What stays
    608 
    609 - `types.ts` -- interfaces stay the same, report format unchanged
    610 - `index.ts` -- HTTP server, Playwright test structure, report writing all stay
    611 - `SPEC.md` -- unchanged
    612 - `COMPETITIVE_PLAY_SPEC.md` -- unchanged
    613 - Report format (`BotReport`) -- identical JSON output
    614 
    615 ### Incremental migration (4 phases)
    616 
    617 **Phase A: Create driver.ts with the interface + implementation (no callers yet)**
    618 
    619 1. Create `driver.ts` with `TetrisDriver` interface and `PlaywrightDriver` class.
    620 2. Move into it from `calibrate.ts`:
    621    - `detectStartMechanism()` and its sub-functions (`tryKeyboardTriggers`, `tryDomButtons`, `tryCanvasClicks`)
    622    - `detectGrid()`
    623    - `detectControls()`
    624    - `detectScoreElement()`
    625    - `measureGridConfidence()`
    626    - `surveyPage()`
    627    - `sampleScreenshot()`
    628    - `detectVisualChange()`
    629    - `verifyInteractivity()`
    630    - `clusterPoints()`
    631    - `recalibrateWithRetry()`
    632 3. Move into it from `grid-reader.ts`:
    633    - `readGrid()`, `readCanvasGrid()`, `readDomGrid()`
    634    - `sampleBackgroundColor()`
    635    - `validateGridBounds()`
    636    - `gridsAreDifferent()`
    637    - `countFilled()`, `countFilledInBottomRows()`, `hasFilledInTopRows()`
    638    - `countCompleteRows()`, `isRowComplete()`
    639    - `getColumnHeights()`
    640    - `detectActivePieceCells()`, `identifyPieceType()`
    641 4. Move into it from `tests.ts`:
    642    - `readLevelFromPage()`
    643    - `measureDropInterval()`
    644    - `detectNextPiecePreview()`
    645    - `extractScoreFromText()` (internal helper)
    646 5. Wrap everything behind `PlaywrightDriver` methods.
    647 6. Export both the interface and the class.
    648 7. At this point, old code still works -- `calibrate.ts`, `grid-reader.ts`, and `tests.ts` are unchanged.
    649 
    650 **Commit A**: "Add driver.ts: TetrisDriver interface and PlaywrightDriver implementation"
    651 
    652 **Phase B: Create bot.ts (calls driver.ts, replaces tests.ts)**
    653 
    654 1. Create `bot.ts` with the new `runAllTests()` that accepts `TetrisDriver`.
    655 2. Move into it from `tests.ts`:
    656    - `runAllTests()` (rewritten to call Driver instead of Playwright directly)
    657    - `runBasicMechanicsPhase()`
    658    - `runGameplayPhase()`
    659    - `runGameOverPhase()`
    660    - `runEndurancePhase()`
    661    - `runCompetitivePlayPhase()`
    662    - `deriveTestResults()`
    663    - `ALL_TEST_NAMES`
    664    - `emptyCalibration()` (adapted to return `DriverCalibration`)
    665    - `loadAndCheckPage()` (replaced by `driver.loadPage()`)
    666    - `boundingBox()` helper
    667    - `countFilledInTopRows()` helper (local in tests.ts, replaced by GridSnapshot method)
    668 3. Move into it from `player.ts`:
    669    - `playGame()` (rewritten to call Driver)
    670    - `executePlacement()` (rewritten to call Driver)
    671    - `playRandomMove()` (rewritten to call Driver)
    672    - `playRandomForDuration()` (rewritten to call Driver)
    673    - `tryFillRow()` (rewritten to call Driver)
    674    - `stackToGameOver()` (rewritten to call Driver)
    675 4. bot.ts imports `findBestPlacement`, `stripActivePiece`, `Placement` from `player.ts`
    676    and everything else from `driver.ts`.
    677 
    678 **Commit B**: "Add bot.ts: phase orchestration using TetrisDriver"
    679 
    680 **Phase C: Rewire index.ts, slim player.ts**
    681 
    682 1. Update `index.ts`:
    683    - Import `PlaywrightDriver` from `./driver`
    684    - Import `runAllTests` from `./bot` (not `./tests`)
    685    - In the test body: `const driver = new PlaywrightDriver(page); const results = await runAllTests(driver, serverUrl);`
    686 2. Remove from `player.ts`:
    687    - `playGame()`, `hardDrop()`, `executePlacement()`, `playRandomMove()`, `playRandomForDuration()`, `tryFillRow()`, `stackToGameOver()`
    688    - `import type { Page }` and `import { readGrid, ... }` from grid-reader
    689    - `countTotalFilled()` (redundant)
    690 3. `player.ts` now exports only:
    691    - `findBestPlacement()` (accepts `Grid` and `PieceType`, returns `Placement | null`)
    692    - `stripActivePiece()` (accepts `Grid` and cells, returns `Grid`)
    693    - `Placement` interface
    694 
    695 **Commit C**: "Rewire index.ts to use bot.ts + driver.ts, slim player.ts"
    696 
    697 **Phase D: Delete old files**
    698 
    699 1. Delete `calibrate.ts`
    700 2. Delete `grid-reader.ts`
    701 3. Delete `tests.ts`
    702 4. Verify all imports resolve
    703 5. Run the full eval pipeline against a known artifact to confirm identical report output
    704 
    705 **Commit D**: "Remove old calibrate.ts, grid-reader.ts, tests.ts"
    706 
    707 ### Backwards compatibility
    708 
    709 The report format (`BotReport`) does not change. The JSON output is byte-identical
    710 for the same game input. The summary score calculation is unchanged. The test names
    711 are unchanged. The competitive play data structure is unchanged.
    712 
    713 The only external-facing change is the internal file structure. Nothing downstream
    714 (the scoring pipeline, the dashboard, the harness) needs to change.
    715 
    716 ---
    717 
    718 ## File Structure After Refactor
    719 
    720 ```
    721 gameplay-bot/
    722   types.ts          ~220 lines   Interfaces (unchanged)
    723   driver.ts         ~900 lines   TetrisDriver interface + PlaywrightDriver class
    724   player.ts         ~250 lines   Pure AI: heuristics, simulation, placement finding
    725   bot.ts           ~1100 lines   Phases, play loops, test derivation, competitive play
    726   index.ts          ~270 lines   Playwright test entry, HTTP server, report output
    727   SPEC.md                        Unchanged
    728   COMPETITIVE_PLAY_SPEC.md       Unchanged
    729   REFACTOR_SPEC.md               This document
    730 ```
    731 
    732 Total: ~2740 lines (down from ~3500 because of deduplication and removing
    733 redundant helpers that now live behind the Driver).
    734 
    735 ### Import/dependency graph
    736 
    737 ```
    738 index.ts
    739   -> driver.ts (PlaywrightDriver constructor)
    740   -> bot.ts (runAllTests)
    741   -> types.ts (BotReport)
    742 
    743 bot.ts
    744   -> driver.ts (TetrisDriver interface, DriverCalibration, GridSnapshot)
    745   -> player.ts (findBestPlacement, stripActivePiece, Placement)
    746   -> types.ts (all data interfaces)
    747 
    748 driver.ts
    749   -> types.ts (Grid, GridBounds, RendererType, Controls, etc.)
    750   -> @playwright/test (Page)
    751 
    752 player.ts
    753   -> types.ts (Grid, PieceType)
    754   (NO @playwright/test import)
    755 ```
    756 
    757 Key constraint: `bot.ts` does NOT import `@playwright/test`. It depends on the
    758 `TetrisDriver` interface, not the implementation. This means the Bot can be tested
    759 with a mock driver that returns canned grid states -- no browser needed.
    760 
    761 ---
    762 
    763 ## Edge Cases
    764 
    765 ### Games that need re-calibration mid-session
    766 
    767 **Scenario**: Grid appears only after clicking "Start". On page load, there is no
    768 canvas and no DOM grid -- just a splash screen.
    769 
    770 **Current behavior**: `calibrate()` runs on the splash screen, finds nothing.
    771 Then `tests.ts` tries start mechanisms, and after starting, re-runs `calibrate()`.
    772 
    773 **Driver behavior**: `calibrate()` includes start detection. If it starts the game
    774 but finds no grid, it waits and re-scans. `recalibrate()` is also available for the
    775 Bot to call explicitly after any phase reload.
    776 
    777 **Bot flow**:
    778 ```
    779 cal = driver.calibrate()
    780 if cal.gridDetected === false && cal.startMechanism !== "unknown":
    781   // Game started but grid not found yet -- wait and retry
    782   driver.wait(500)
    783   cal = driver.recalibrate()
    784 ```
    785 
    786 ### Games where the Driver cannot read the grid at all
    787 
    788 **Scenario**: Canvas game without GPU access. `getImageData()` returns all zeros.
    789 
    790 **Driver behavior**: `readGrid()` returns `GridSnapshot { grid: null }` every time.
    791 The Bot sees grid failures accumulate.
    792 
    793 **Bot flow**: Phase 3 (mechanics) detects that `gridReadSuccess === 0`. The Bot
    794 marks all grid-dependent tests as failed with detail "grid reader unavailable".
    795 It does NOT fall back to screenshot-only testing (per the "NO FALSE POSITIVES" rule).
    796 Competitive play is skipped.
    797 
    798 ### Games that pause themselves
    799 
    800 **Scenario**: Player accidentally triggers a pause menu (Escape key, or a pause
    801 button that overlaps with the game area).
    802 
    803 **Driver behavior**: `readGrid()` may return null (if an overlay covers the grid)
    804 or return a static grid (same state on every read). The Driver does not know about
    805 pausing -- it just reports what it sees.
    806 
    807 **Bot flow**: The play loop in bot.ts already handles stale grids. If the grid
    808 hasn't changed for 8 seconds, it tries pressing the drop key (which may unpause).
    809 If grid reads start returning null, the Bot counts consecutive failures. After 10
    810 consecutive null reads, it falls back to random key presses for a brief period,
    811 then re-reads.
    812 
    813 The Bot could also try pressing Escape or P to dismiss a pause screen:
    814 ```
    815 if consecutiveUnchanged > 80: // 80 polls * 60ms = ~5 seconds
    816   driver.pressRawKey("Escape")
    817   driver.wait(500)
    818   driver.pressRawKey("p")
    819   driver.wait(500)
    820 ```
    821 
    822 ### Games with overlays that block gameplay
    823 
    824 **Scenario**: A modal overlay (tutorial, cookie consent, "enter your name" dialog)
    825 appears on top of the game, blocking input.
    826 
    827 **Driver behavior**: `surveyPage()` detects overlays (positioned elements covering
    828 >50% of viewport). The start mechanism detection already tries clicking overlays
    829 and pressing Escape to dismiss them.
    830 
    831 **Bot flow**: If the game started but mechanics tests show no response to input
    832 (movementsObserved === 0), the Bot can request a recalibrate, which may re-run
    833 start detection and dismiss a new overlay.
    834 
    835 ### Games in different languages
    836 
    837 **Scenario**: The game UI is in Spanish, Japanese, or any non-English language.
    838 "Start", "Game Over", "Score" have different text.
    839 
    840 **Driver behavior**: Start mechanism detection is already fully language-agnostic
    841 (visual change detection + interactivity verification, no text matching). Score
    842 element detection falls back from labeled text ("Score: 0") to structural heuristics
    843 (leaf element containing a standalone number). Game over text detection checks
    844 multiple languages ("game over", "fin del juego", etc.) or falls back to
    845 grid-state-based detection (grid frozen after filling to top).
    846 
    847 **Bot flow**: The Bot does not do any text matching. It delegates all text-based
    848 detection to the Driver. Tests like `game_over` use `driver.detectGameOverText()`
    849 which is the Driver's responsibility. The Bot adds a grid-based game over check
    850 (frozen grid after stacking) as a secondary signal that doesn't depend on language.
    851 
    852 The `detectGameOverText()` method could be extended with more languages:
    853 ```typescript
    854 // Inside driver.ts
    855 const gameOverPatterns = [
    856   "game over", "gameover", "you lose", "try again",
    857   "play again", "restart", "fin del juego", "juego terminado",
    858   "ゲームオーバー", "游戏结束"
    859 ];
    860 ```
    861 
    862 But the primary game over detection in bot.ts (Phase 6) does not depend on text --
    863 it watches the grid freeze after filling to the top.
    864 
    865 ---
    866 
    867 ## What This Spec Does NOT Cover
    868 
    869 - WebGL grid reading (not implemented yet, out of scope)
    870 - New tests beyond the existing 24
    871 - Changes to the report format or scoring
    872 - Dashboard changes
    873 - Harness changes
    874 - Performance optimization of grid reading
    875 - Testability improvements beyond the Driver/Bot split (e.g., mock Driver tests)
    876 
    877 These are natural follow-ups after the refactor lands, but they are separate work items.

Impressum · Datenschutz