commit 67bd49c6e259f78aade3caeae40c3418dedf8071
parent 7fbe88ce2a1febb0954305d10f4e1878570e0f14
Author: Brian Graham <brian@buildingbetterteams.de>
Date: Thu, 9 Apr 2026 12:56:29 +0200
Add two-tier architecture refactor spec for gameplay bot
Driver (webpage abstraction) + Bot (game logic) separation.
17-method TetrisDriver interface, 4-commit incremental migration plan,
~2740 lines (down from 3500). Bot never imports Playwright.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat:
1 file changed, 877 insertions(+), 0 deletions(-)
diff --git a/tasks/tetris/eval/gameplay-bot/REFACTOR_SPEC.md b/tasks/tetris/eval/gameplay-bot/REFACTOR_SPEC.md
@@ -0,0 +1,877 @@
+# Two-Tier Refactor Spec: Driver + Bot
+
+## Problem Statement
+
+The gameplay bot is ~3500 lines across 6 files, with two distinct concerns tangled
+together: understanding the webpage (finding grids, clicking buttons, reading pixels,
+sending keystrokes) and playing Tetris (phase orchestration, AI decisions, test
+derivation, bug detection). The boundary between them is blurred:
+
+- `calibrate.ts` handles grid detection, start mechanism detection, control detection,
+ overlay detection, interactivity verification, screenshot sampling, visual change
+ detection, and page surveying -- all in one 1300-line file.
+- `tests.ts` does phase orchestration, BUT ALSO calls `readGrid` directly during
+ mechanics tests, reads score elements, detects game over text, measures drop
+ intervals, detects next piece previews, and reads level displays.
+- `player.ts` calls both `readGrid` and `page.keyboard.press` directly, coupling
+ AI logic to the Playwright API.
+- `grid-reader.ts` is the cleanest module but still exports low-level grid analysis
+ utilities (bounding boxes, cell counts, piece identification) that the bot calls
+ directly instead of going through an abstraction.
+
+The result: any change to how the page is read ripples through all files. You cannot
+test the AI player without a live Playwright page. You cannot swap the grid reader
+without touching the test orchestrator.
+
+## Proposed Architecture
+
+```
+ +------------------+
+ | index.ts | Entry point: HTTP server, Playwright test,
+ | | report output. Unchanged.
+ +--------+---------+
+ |
+ v
+ +------------------+
+ | bot.ts | Layer 2: "The Brain"
+ | | Phase orchestration, AI decisions, test
+ | | derivation, competitive play, bug detection.
+ | | Calls only the Driver interface.
+ +--------+---------+
+ |
+ v
+ +------------------+
+ | driver.ts | Layer 1: "The Eyes and Hands"
+ | | Abstracts the webpage. Exposes a clean API.
+ | | Handles grid reading, start detection,
+ | | control detection, keyboard input.
+ +--------+---------+
+ |
+ +---------+---------+
+ | |
+ v v
+ +-----------+ +------------+
+ | types.ts | | player.ts | Pure Tetris logic: AI heuristics,
+ | | | | board simulation, placement finding.
+ +-----------+ | | NO Playwright imports. NO page access.
+ +------------+
+```
+
+### What goes where
+
+**driver.ts** -- "I can see and interact with this webpage"
+- Grid detection (finding the grid on the page)
+- Grid reading (10x20 boolean matrix from canvas/DOM/SVG)
+- Start mechanism detection (the 5-phase cascade)
+- Control detection (which keys the game responds to)
+- Score/level/lines reading
+- Keyboard input (move, rotate, drop)
+- Screenshot capture
+- Interactivity verification
+- Page surveying (pre-test data collection)
+- Background color sampling
+- Visual change detection
+- Next piece preview detection
+- Game over text detection
+- Re-calibration
+
+**bot.ts** -- "I know Tetris rules and test logic"
+- Phase orchestration (the 8 conditional phases)
+- Test derivation from session data (the 24 tests)
+- Score/timing/event tracking (GameSession bookkeeping)
+- Competitive play with bug detection
+- Line clear detection logic (watching grid state transitions)
+- Game over triggering strategy (stack pieces to fill grid)
+- Endurance testing
+- Report assembly (BotReport construction)
+
+**player.ts** -- "I know where to put pieces" (pure computation, no I/O)
+- 4-heuristic scoring (aggregate height, lines, holes, bumpiness)
+- Piece definitions (rotations, dimensions)
+- Board simulation (drop piece, clear lines)
+- Best placement finding
+- No `Page` import, no `readGrid` call, no `keyboard.press`
+
+**types.ts** -- unchanged, all interfaces stay
+
+**grid-reader.ts** -- absorbed into driver.ts (see migration plan)
+
+**index.ts** -- unchanged except it calls bot.ts instead of tests.ts
+
+---
+
+## Driver Interface
+
+```typescript
+import type { Page } from "@playwright/test";
+import type {
+ Grid,
+ GridBounds,
+ RendererType,
+ Controls,
+ StartMechanism,
+ SurveyData,
+ PieceType,
+} from "./types";
+
+// ---------------------------------------------------------------------------
+// Configuration returned by calibration, passed through subsequent calls.
+// Replaces CalibrationResult for internal use within the Driver.
+// ---------------------------------------------------------------------------
+
+export interface DriverCalibration {
+ renderer: RendererType;
+ gridDetected: boolean;
+ gridBounds: GridBounds | null;
+ cellWidth: number;
+ cellHeight: number;
+ controls: Controls;
+ startMechanism: StartMechanism;
+ scoreElementSelector: string | null;
+ levelElementSelector: string | null;
+ backgroundColor: [number, number, number] | null;
+ consoleErrors: string[];
+ gridConfidence: number;
+ startButton?: {
+ selector: string;
+ text: string;
+ disappeared: boolean;
+ position: { x: number; y: number };
+ };
+}
+
+// ---------------------------------------------------------------------------
+// Grid snapshot: the grid state plus derived information the bot needs.
+// ---------------------------------------------------------------------------
+
+export interface GridSnapshot {
+ /** The 10x20 boolean grid. null if reading failed. */
+ grid: Grid | null;
+ /** Total filled cells. 0 if grid is null. */
+ filledCount: number;
+ /** Filled cells in the bottom N rows. */
+ filledInBottom(rows: number): number;
+ /** Whether any cell in the top N rows is filled. */
+ hasFilledInTop(rows: number): boolean;
+ /** Number of fully complete rows. */
+ completeRows: number;
+ /** Active piece cells (diff against settled grid). null if undetectable. */
+ activePieceCells: [number, number][] | null;
+ /** Identified piece type from active piece cells. null if no active piece. */
+ activePieceType: PieceType | null;
+}
+
+// ---------------------------------------------------------------------------
+// The Driver interface. This is what the Bot sees.
+// ---------------------------------------------------------------------------
+
+export interface TetrisDriver {
+ // -- Lifecycle --
+
+ /**
+ * Navigate to the game URL, wait for load, begin console error collection.
+ * Returns false if the page failed to load.
+ */
+ loadPage(url: string): Promise<{ loaded: boolean; detail: string; errorsOnLoad: number }>;
+
+ /**
+ * Survey the page structure before any interaction.
+ * Returns information about overlays, canvas elements, DOM grids, visible text.
+ */
+ surveyPage(): Promise<SurveyData>;
+
+ /**
+ * Run full calibration: grid detection, start mechanism detection,
+ * control detection, score element detection, grid confidence measurement.
+ * Includes re-calibration fallback if initial detection fails.
+ * Never throws.
+ */
+ calibrate(): Promise<DriverCalibration>;
+
+ /**
+ * Re-run calibration after the game state may have changed
+ * (e.g., after starting, grid might appear that wasn't there before).
+ * Keeps the current calibration if re-calibration finds nothing better.
+ */
+ recalibrate(): Promise<DriverCalibration>;
+
+ /**
+ * Get the current calibration. Throws if calibrate() hasn't been called.
+ */
+ getCalibration(): DriverCalibration;
+
+ // -- Grid Reading --
+
+ /**
+ * Read the current grid state. Returns a GridSnapshot with the raw grid
+ * and derived metrics. If settled grid is provided, active piece detection
+ * is diffed against it.
+ *
+ * Returns a snapshot with grid: null if reading fails.
+ */
+ readGrid(settledGrid?: Grid | null): Promise<GridSnapshot>;
+
+ /**
+ * Compare two grids for equality. True if they differ.
+ */
+ gridsAreDifferent(a: Grid | null, b: Grid | null): boolean;
+
+ // -- Input --
+
+ /**
+ * Press a game control key. Uses the controls detected during calibration.
+ */
+ pressKey(action: "left" | "right" | "down" | "rotate" | "drop"): Promise<void>;
+
+ /**
+ * Press an arbitrary key (for testing CCW rotation with 'z', etc.).
+ */
+ pressRawKey(key: string): Promise<void>;
+
+ /**
+ * Wait for a specified duration (milliseconds).
+ */
+ wait(ms: number): Promise<void>;
+
+ // -- Score/Level/Lines Reading --
+
+ /**
+ * Read the current score from the detected score element.
+ * Returns null if no score element was found or reading fails.
+ */
+ readScore(): Promise<number | null>;
+
+ /**
+ * Read the current level from the page.
+ * Returns null if no level display found or reading fails.
+ */
+ readLevel(): Promise<number | null>;
+
+ // -- Page State Queries --
+
+ /**
+ * Check if "Game Over" (or equivalent) text is visible on the page.
+ * Returns the matched text, or null if not found.
+ */
+ detectGameOverText(): Promise<string | null>;
+
+ /**
+ * Check if a restart button/prompt is visible.
+ */
+ detectRestartOption(): Promise<boolean>;
+
+ /**
+ * Check if a next piece preview display exists.
+ */
+ detectNextPiecePreview(): Promise<boolean>;
+
+ /**
+ * Get all console errors collected since loadPage() was called.
+ */
+ getConsoleErrors(): string[];
+
+ // -- Screenshots --
+
+ /**
+ * Take a screenshot. Returns raw PNG buffer.
+ */
+ screenshot(): Promise<Buffer>;
+
+ /**
+ * Measure the auto-drop interval (time between gravity-driven grid changes
+ * with no input). Returns average interval in ms, or 0 if unmeasurable.
+ */
+ measureDropInterval(): Promise<number>;
+}
+```
+
+### Method-to-Source Mapping
+
+Each Driver method maps to existing code as follows:
+
+| Driver Method | Current Source | Current Function(s) |
+|---|---|---|
+| `loadPage()` | tests.ts:277-303 | `loadAndCheckPage()`, `loadGamePage()` |
+| `surveyPage()` | calibrate.ts:1300-1393 | `surveyPage()` |
+| `calibrate()` | calibrate.ts:24-94 | `calibrate()`, `detectGrid()`, `detectStartMechanism()`, `detectControls()`, `detectScoreElement()`, `measureGridConfidence()` |
+| `recalibrate()` | tests.ts:152-163 | inline re-calibration after start |
+| `readGrid()` | grid-reader.ts:15-38, 46-118, 142-364 | `readGrid()`, `readCanvasGrid()`, `readDomGrid()`, plus `countFilled()`, `countFilledInBottomRows()`, `hasFilledInTopRows()`, `countCompleteRows()`, `detectActivePieceCells()`, `identifyPieceType()` |
+| `gridsAreDifferent()` | grid-reader.ts:400-410 | `gridsAreDifferent()` |
+| `pressKey()` | player.ts:251-277 | inline `page.keyboard.press()` calls using `cal.controls` |
+| `pressRawKey()` | tests.ts:841-842 | inline `page.keyboard.press("z")` |
+| `wait()` | everywhere | `page.waitForTimeout()` |
+| `readScore()` | tests.ts:490-497, 529-538, 743-749 | inline score element reading |
+| `readLevel()` | tests.ts:1597-1630 | `readLevelFromPage()` |
+| `detectGameOverText()` | tests.ts:929-940 | inline `page.evaluate()` for game over text |
+| `detectRestartOption()` | tests.ts:943-955 | inline `page.evaluate()` for restart buttons |
+| `detectNextPiecePreview()` | tests.ts:1669-1717 | `detectNextPiecePreview()` |
+| `getConsoleErrors()` | tests.ts:94-98 | `consoleErrors` array |
+| `screenshot()` | player.ts:370-371 | `page.screenshot()` |
+| `measureDropInterval()` | tests.ts:1636-1664 | `measureDropInterval()` |
+
+### How the Driver handles different renderers
+
+The Driver encapsulates renderer differences entirely. The Bot never knows or cares
+whether the game uses canvas, DOM, SVG, or WebGL.
+
+```
+readGrid() internally:
+ if renderer === "canvas" && gridBounds:
+ -> readCanvasGrid() via page.evaluate(getImageData)
+ if renderer === "dom":
+ -> readDomGrid() via page.evaluate(DOM traversal)
+ if renderer === "svg":
+ -> future: readSvgGrid()
+ fallback:
+ -> try canvas if bounds exist, then try DOM
+```
+
+The `GridSnapshot` returned to the Bot is always the same shape regardless of renderer.
+
+### Re-calibration
+
+The Driver maintains mutable internal state:
+
+```typescript
+class PlaywrightDriver implements TetrisDriver {
+ private page: Page;
+ private cal: DriverCalibration | null = null;
+ private consoleErrors: string[] = [];
+}
+```
+
+`recalibrate()` re-runs grid detection and start detection, but preserves
+the existing calibration if the new one is worse (e.g., grid detection fails
+on re-calibration but worked initially). This handles:
+
+- Games where the grid appears only after clicking "Start"
+- Games where the grid is rebuilt on game restart (new DOM elements)
+- Games where the canvas resizes after initialization
+
+### Error handling
+
+| Scenario | Driver behavior |
+|---|---|
+| Grid read returns null | `readGrid()` returns `GridSnapshot` with `grid: null`, `filledCount: 0` |
+| Grid read throws | Same as null -- caught internally, never thrown to Bot |
+| No score element found | `readScore()` returns `null` |
+| Score element disappeared | `readScore()` returns `null` (caught internally) |
+| Console error during play | Accumulated in `consoleErrors`, accessible via `getConsoleErrors()` |
+| Page navigation fails | `loadPage()` returns `{ loaded: false, detail: "..." }` |
+| Canvas getImageData all zeros (no GPU) | Grid validation rejects (>60% filled), returns null |
+| Calibration finds nothing | Returns calibration with `gridDetected: false`, `startMechanism: "unknown"` |
+
+The Driver never throws. All errors are represented in return values.
+
+---
+
+## Bot Interface
+
+### How the Bot calls the Driver
+
+The Bot receives a `TetrisDriver` instance. It never imports `Page` or
+anything from Playwright. It never calls `page.evaluate()`, `page.keyboard`,
+or `page.screenshot()` directly.
+
+```typescript
+// bot.ts
+import type { TetrisDriver, DriverCalibration, GridSnapshot } from "./driver";
+import type {
+ TestResult,
+ GameplayStats,
+ GameSession,
+ CompetitivePlayResult,
+ SurveyData,
+ BotReport,
+ Grid,
+} from "./types";
+import { findBestPlacement } from "./player";
+
+export async function runAllTests(
+ driver: TetrisDriver,
+ serverUrl: string
+): Promise<{
+ testResults: TestResult[];
+ calibration: DriverCalibration;
+ gameplay: GameplayStats;
+ session: GameSession;
+ survey: SurveyData;
+ competitivePlay: CompetitivePlayResult | null;
+}> {
+ // Phase 1: Load
+ const loadResult = await driver.loadPage(serverUrl);
+ // ...
+
+ // Phase 2: Calibrate
+ const cal = await driver.calibrate();
+ // ...
+
+ // Phase 3-8: Use only driver.readGrid(), driver.pressKey(), etc.
+}
+```
+
+### Phase execution flow using Driver methods
+
+**Phase 1: Page Load**
+```
+driver.loadPage(url) -> { loaded, detail, errorsOnLoad }
+driver.wait(3000)
+```
+
+**Phase 2: Calibrate + Start**
+```
+survey = driver.surveyPage()
+cal = driver.calibrate()
+ // Internally: detectStartMechanism(), detectGrid(), etc.
+if cal.startMechanism === "unknown" || !cal.gridDetected:
+ cal = driver.recalibrate()
+```
+
+**Phase 3: Basic Mechanics**
+```
+// Auto-drop test
+snap0 = driver.readGrid()
+driver.wait(5000)
+snap1 = driver.readGrid()
+gridChanged = driver.gridsAreDifferent(snap0.grid, snap1.grid)
+
+// Movement tests
+for dir in [left, right, down]:
+ snapBefore = driver.readGrid()
+ driver.pressKey(dir)
+ driver.wait(300)
+ snapAfter = driver.readGrid()
+ // compare
+
+// Rotation test
+snapBefore = driver.readGrid()
+driver.pressKey("rotate")
+driver.wait(300)
+snapAfter = driver.readGrid()
+// compare bounding boxes of active piece cells
+
+// Hard drop test
+driver.pressKey("drop")
+driver.wait(500)
+snapAfter = driver.readGrid()
+// check bottom rows
+```
+
+**Phase 4: Piece Lifecycle**
+```
+// Already tested during Phase 3 mechanics
+// Piece locks: bottom cells persist across reads
+// New piece spawns: top rows have cells after drop
+// Multiple pieces: piecesLocked counter >= 3
+```
+
+**Phase 5: Gameplay**
+```
+driver.loadPage(url)
+cal = driver.calibrate()
+initialScore = driver.readScore()
+// Play loop (60 pieces / 45s):
+while pieces < 60 && elapsed < 45s:
+ snap = driver.readGrid(settledGrid)
+ if snap.activePieceCells:
+ placement = findBestPlacement(settledGrid, snap.activePieceType)
+ // Execute placement using driver.pressKey()
+ for i in 0..placement.rotations:
+ driver.pressKey("rotate")
+ driver.wait(50)
+ // Move to column
+ driver.pressKey("left" or "right") * N
+ driver.pressKey("drop")
+ driver.wait(100)
+ settledGrid = (await driver.readGrid()).grid
+ driver.wait(60)
+finalScore = driver.readScore()
+```
+
+**Phase 6: Game Over**
+```
+driver.loadPage(url)
+driver.calibrate()
+// Hard drop 40 times, checking grid after every 5
+for i in 0..40:
+ driver.pressKey("drop")
+ driver.wait(150)
+ if i % 5 === 0:
+ snap = driver.readGrid()
+ if snap.hasFilledInTop(4):
+ driver.pressKey("drop")
+ driver.wait(300)
+ snap2 = driver.readGrid()
+ if !driver.gridsAreDifferent(snap.grid, snap2.grid):
+ // Game over detected
+gameOverText = driver.detectGameOverText()
+```
+
+**Phase 7: Endurance**
+```
+driver.loadPage(url)
+driver.calibrate()
+// Play for 30 seconds using same play loop as Phase 5
+```
+
+**Phase 8: Competitive Play**
+```
+driver.loadPage(url)
+driver.calibrate()
+initialDropInterval = driver.measureDropInterval()
+initialLevel = driver.readLevel()
+// Play for 60 seconds with detailed tracking
+// Every 5th poll: driver.readScore()
+// Every 10th poll: driver.readLevel()
+// Periodic: driver.pressRawKey("z") for CCW test
+// Periodic: soft drop test via driver.pressKey("down")
+finalDropInterval = driver.measureDropInterval()
+nextPieceVisible = driver.detectNextPiecePreview()
+gameOverText = driver.detectGameOverText()
+restartAvailable = driver.detectRestartOption()
+```
+
+### Test derivation
+
+`deriveTestResults()` stays in bot.ts. It receives the `GameSession` data
+that the Bot accumulated during phases, and produces the 24 `TestResult[]` array.
+It does not need the Driver at all -- it operates on pure data.
+
+The function signature is unchanged:
+
+```typescript
+function deriveTestResults(
+ session: GameSession,
+ cal: DriverCalibration,
+ loadResult: LoadResult,
+ consoleErrors: string[],
+ gameplay: GameplayStats,
+ phaseState: PhaseState,
+ competitivePlay: CompetitivePlayResult | null
+): TestResult[]
+```
+
+### Where the AI player logic lives
+
+`player.ts` becomes a pure computation module. It keeps:
+
+- `PIECES` definitions
+- `findBestPlacement()` (exported)
+- `findBestPlacementGeneric()`
+- `simulateDropPiece()`
+- `clearLines()`
+- `aggregateHeight()`, `countHoles()`, `bumpiness()`
+- `stripActivePiece()` (exported)
+- `Placement` interface (exported)
+
+It loses:
+
+- `playGame()` -- moves to bot.ts (it orchestrates grid reads + AI + key presses)
+- `hardDrop()` -- replaced by `driver.pressKey("drop")`
+- `playRandomMove()` -- moves to bot.ts
+- `playRandomForDuration()` -- moves to bot.ts
+- `tryFillRow()` -- moves to bot.ts
+- `stackToGameOver()` -- moves to bot.ts
+- `executePlacement()` -- moves to bot.ts (it calls driver.pressKey)
+- `countTotalFilled()` -- redundant with GridSnapshot.filledCount
+
+After refactor, `player.ts` has zero Playwright imports.
+
+---
+
+## Migration Plan
+
+### New files created
+
+| File | Purpose | Est. lines |
+|---|---|---|
+| `driver.ts` | TetrisDriver interface + PlaywrightDriver implementation | ~900 |
+| `bot.ts` | Phase orchestration, play loops, test derivation | ~1100 |
+
+### Files modified
+
+| File | Change |
+|---|---|
+| `player.ts` | Remove all Playwright-dependent functions, keep pure AI logic | ~350 -> ~250 |
+| `types.ts` | Add `DriverCalibration`, `GridSnapshot` interfaces (or keep in driver.ts). Minor additions. | ~205 -> ~220 |
+| `index.ts` | Change import from `tests.ts` to `bot.ts`, instantiate `PlaywrightDriver`, pass to `runAllTests`. | ~260 -> ~270 |
+
+### Files deleted
+
+| File | Reason |
+|---|---|
+| `calibrate.ts` | Absorbed into `driver.ts` |
+| `grid-reader.ts` | Absorbed into `driver.ts` |
+| `tests.ts` | Replaced by `bot.ts` |
+
+### What stays
+
+- `types.ts` -- interfaces stay the same, report format unchanged
+- `index.ts` -- HTTP server, Playwright test structure, report writing all stay
+- `SPEC.md` -- unchanged
+- `COMPETITIVE_PLAY_SPEC.md` -- unchanged
+- Report format (`BotReport`) -- identical JSON output
+
+### Incremental migration (4 phases)
+
+**Phase A: Create driver.ts with the interface + implementation (no callers yet)**
+
+1. Create `driver.ts` with `TetrisDriver` interface and `PlaywrightDriver` class.
+2. Move into it from `calibrate.ts`:
+ - `detectStartMechanism()` and its sub-functions (`tryKeyboardTriggers`, `tryDomButtons`, `tryCanvasClicks`)
+ - `detectGrid()`
+ - `detectControls()`
+ - `detectScoreElement()`
+ - `measureGridConfidence()`
+ - `surveyPage()`
+ - `sampleScreenshot()`
+ - `detectVisualChange()`
+ - `verifyInteractivity()`
+ - `clusterPoints()`
+ - `recalibrateWithRetry()`
+3. Move into it from `grid-reader.ts`:
+ - `readGrid()`, `readCanvasGrid()`, `readDomGrid()`
+ - `sampleBackgroundColor()`
+ - `validateGridBounds()`
+ - `gridsAreDifferent()`
+ - `countFilled()`, `countFilledInBottomRows()`, `hasFilledInTopRows()`
+ - `countCompleteRows()`, `isRowComplete()`
+ - `getColumnHeights()`
+ - `detectActivePieceCells()`, `identifyPieceType()`
+4. Move into it from `tests.ts`:
+ - `readLevelFromPage()`
+ - `measureDropInterval()`
+ - `detectNextPiecePreview()`
+ - `extractScoreFromText()` (internal helper)
+5. Wrap everything behind `PlaywrightDriver` methods.
+6. Export both the interface and the class.
+7. At this point, old code still works -- `calibrate.ts`, `grid-reader.ts`, and `tests.ts` are unchanged.
+
+**Commit A**: "Add driver.ts: TetrisDriver interface and PlaywrightDriver implementation"
+
+**Phase B: Create bot.ts (calls driver.ts, replaces tests.ts)**
+
+1. Create `bot.ts` with the new `runAllTests()` that accepts `TetrisDriver`.
+2. Move into it from `tests.ts`:
+ - `runAllTests()` (rewritten to call Driver instead of Playwright directly)
+ - `runBasicMechanicsPhase()`
+ - `runGameplayPhase()`
+ - `runGameOverPhase()`
+ - `runEndurancePhase()`
+ - `runCompetitivePlayPhase()`
+ - `deriveTestResults()`
+ - `ALL_TEST_NAMES`
+ - `emptyCalibration()` (adapted to return `DriverCalibration`)
+ - `loadAndCheckPage()` (replaced by `driver.loadPage()`)
+ - `boundingBox()` helper
+ - `countFilledInTopRows()` helper (local in tests.ts, replaced by GridSnapshot method)
+3. Move into it from `player.ts`:
+ - `playGame()` (rewritten to call Driver)
+ - `executePlacement()` (rewritten to call Driver)
+ - `playRandomMove()` (rewritten to call Driver)
+ - `playRandomForDuration()` (rewritten to call Driver)
+ - `tryFillRow()` (rewritten to call Driver)
+ - `stackToGameOver()` (rewritten to call Driver)
+4. bot.ts imports `findBestPlacement`, `stripActivePiece`, `Placement` from `player.ts`
+ and everything else from `driver.ts`.
+
+**Commit B**: "Add bot.ts: phase orchestration using TetrisDriver"
+
+**Phase C: Rewire index.ts, slim player.ts**
+
+1. Update `index.ts`:
+ - Import `PlaywrightDriver` from `./driver`
+ - Import `runAllTests` from `./bot` (not `./tests`)
+ - In the test body: `const driver = new PlaywrightDriver(page); const results = await runAllTests(driver, serverUrl);`
+2. Remove from `player.ts`:
+ - `playGame()`, `hardDrop()`, `executePlacement()`, `playRandomMove()`, `playRandomForDuration()`, `tryFillRow()`, `stackToGameOver()`
+ - `import type { Page }` and `import { readGrid, ... }` from grid-reader
+ - `countTotalFilled()` (redundant)
+3. `player.ts` now exports only:
+ - `findBestPlacement()` (accepts `Grid` and `PieceType`, returns `Placement | null`)
+ - `stripActivePiece()` (accepts `Grid` and cells, returns `Grid`)
+ - `Placement` interface
+
+**Commit C**: "Rewire index.ts to use bot.ts + driver.ts, slim player.ts"
+
+**Phase D: Delete old files**
+
+1. Delete `calibrate.ts`
+2. Delete `grid-reader.ts`
+3. Delete `tests.ts`
+4. Verify all imports resolve
+5. Run the full eval pipeline against a known artifact to confirm identical report output
+
+**Commit D**: "Remove old calibrate.ts, grid-reader.ts, tests.ts"
+
+### Backwards compatibility
+
+The report format (`BotReport`) does not change. The JSON output is byte-identical
+for the same game input. The summary score calculation is unchanged. The test names
+are unchanged. The competitive play data structure is unchanged.
+
+The only external-facing change is the internal file structure. Nothing downstream
+(the scoring pipeline, the dashboard, the harness) needs to change.
+
+---
+
+## File Structure After Refactor
+
+```
+gameplay-bot/
+ types.ts ~220 lines Interfaces (unchanged)
+ driver.ts ~900 lines TetrisDriver interface + PlaywrightDriver class
+ player.ts ~250 lines Pure AI: heuristics, simulation, placement finding
+ bot.ts ~1100 lines Phases, play loops, test derivation, competitive play
+ index.ts ~270 lines Playwright test entry, HTTP server, report output
+ SPEC.md Unchanged
+ COMPETITIVE_PLAY_SPEC.md Unchanged
+ REFACTOR_SPEC.md This document
+```
+
+Total: ~2740 lines (down from ~3500 because of deduplication and removing
+redundant helpers that now live behind the Driver).
+
+### Import/dependency graph
+
+```
+index.ts
+ -> driver.ts (PlaywrightDriver constructor)
+ -> bot.ts (runAllTests)
+ -> types.ts (BotReport)
+
+bot.ts
+ -> driver.ts (TetrisDriver interface, DriverCalibration, GridSnapshot)
+ -> player.ts (findBestPlacement, stripActivePiece, Placement)
+ -> types.ts (all data interfaces)
+
+driver.ts
+ -> types.ts (Grid, GridBounds, RendererType, Controls, etc.)
+ -> @playwright/test (Page)
+
+player.ts
+ -> types.ts (Grid, PieceType)
+ (NO @playwright/test import)
+```
+
+Key constraint: `bot.ts` does NOT import `@playwright/test`. It depends on the
+`TetrisDriver` interface, not the implementation. This means the Bot can be tested
+with a mock driver that returns canned grid states -- no browser needed.
+
+---
+
+## Edge Cases
+
+### Games that need re-calibration mid-session
+
+**Scenario**: Grid appears only after clicking "Start". On page load, there is no
+canvas and no DOM grid -- just a splash screen.
+
+**Current behavior**: `calibrate()` runs on the splash screen, finds nothing.
+Then `tests.ts` tries start mechanisms, and after starting, re-runs `calibrate()`.
+
+**Driver behavior**: `calibrate()` includes start detection. If it starts the game
+but finds no grid, it waits and re-scans. `recalibrate()` is also available for the
+Bot to call explicitly after any phase reload.
+
+**Bot flow**:
+```
+cal = driver.calibrate()
+if cal.gridDetected === false && cal.startMechanism !== "unknown":
+ // Game started but grid not found yet -- wait and retry
+ driver.wait(500)
+ cal = driver.recalibrate()
+```
+
+### Games where the Driver cannot read the grid at all
+
+**Scenario**: Canvas game without GPU access. `getImageData()` returns all zeros.
+
+**Driver behavior**: `readGrid()` returns `GridSnapshot { grid: null }` every time.
+The Bot sees grid failures accumulate.
+
+**Bot flow**: Phase 3 (mechanics) detects that `gridReadSuccess === 0`. The Bot
+marks all grid-dependent tests as failed with detail "grid reader unavailable".
+It does NOT fall back to screenshot-only testing (per the "NO FALSE POSITIVES" rule).
+Competitive play is skipped.
+
+### Games that pause themselves
+
+**Scenario**: Player accidentally triggers a pause menu (Escape key, or a pause
+button that overlaps with the game area).
+
+**Driver behavior**: `readGrid()` may return null (if an overlay covers the grid)
+or return a static grid (same state on every read). The Driver does not know about
+pausing -- it just reports what it sees.
+
+**Bot flow**: The play loop in bot.ts already handles stale grids. If the grid
+hasn't changed for 8 seconds, it tries pressing the drop key (which may unpause).
+If grid reads start returning null, the Bot counts consecutive failures. After 10
+consecutive null reads, it falls back to random key presses for a brief period,
+then re-reads.
+
+The Bot could also try pressing Escape or P to dismiss a pause screen:
+```
+if consecutiveUnchanged > 80: // 80 polls * 60ms = ~5 seconds
+ driver.pressRawKey("Escape")
+ driver.wait(500)
+ driver.pressRawKey("p")
+ driver.wait(500)
+```
+
+### Games with overlays that block gameplay
+
+**Scenario**: A modal overlay (tutorial, cookie consent, "enter your name" dialog)
+appears on top of the game, blocking input.
+
+**Driver behavior**: `surveyPage()` detects overlays (positioned elements covering
+>50% of viewport). The start mechanism detection already tries clicking overlays
+and pressing Escape to dismiss them.
+
+**Bot flow**: If the game started but mechanics tests show no response to input
+(movementsObserved === 0), the Bot can request a recalibrate, which may re-run
+start detection and dismiss a new overlay.
+
+### Games in different languages
+
+**Scenario**: The game UI is in Spanish, Japanese, or any non-English language.
+"Start", "Game Over", "Score" have different text.
+
+**Driver behavior**: Start mechanism detection is already fully language-agnostic
+(visual change detection + interactivity verification, no text matching). Score
+element detection falls back from labeled text ("Score: 0") to structural heuristics
+(leaf element containing a standalone number). Game over text detection checks
+multiple languages ("game over", "fin del juego", etc.) or falls back to
+grid-state-based detection (grid frozen after filling to top).
+
+**Bot flow**: The Bot does not do any text matching. It delegates all text-based
+detection to the Driver. Tests like `game_over` use `driver.detectGameOverText()`
+which is the Driver's responsibility. The Bot adds a grid-based game over check
+(frozen grid after stacking) as a secondary signal that doesn't depend on language.
+
+The `detectGameOverText()` method could be extended with more languages:
+```typescript
+// Inside driver.ts
+const gameOverPatterns = [
+ "game over", "gameover", "you lose", "try again",
+ "play again", "restart", "fin del juego", "juego terminado",
+ "ゲームオーバー", "游戏结束"
+];
+```
+
+But the primary game over detection in bot.ts (Phase 6) does not depend on text --
+it watches the grid freeze after filling to the top.
+
+---
+
+## What This Spec Does NOT Cover
+
+- WebGL grid reading (not implemented yet, out of scope)
+- New tests beyond the existing 24
+- Changes to the report format or scoring
+- Dashboard changes
+- Harness changes
+- Performance optimization of grid reading
+- Testability improvements beyond the Driver/Bot split (e.g., mock Driver tests)
+
+These are natural follow-ups after the refactor lands, but they are separate work items.