loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

commit 67bd49c6e259f78aade3caeae40c3418dedf8071
parent 7fbe88ce2a1febb0954305d10f4e1878570e0f14
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Thu,  9 Apr 2026 12:56:29 +0200

Add two-tier architecture refactor spec for gameplay bot

Driver (webpage abstraction) + Bot (game logic) separation.
17-method TetrisDriver interface, 4-commit incremental migration plan,
~2740 lines (down from 3500). Bot never imports Playwright.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Atasks/tetris/eval/gameplay-bot/REFACTOR_SPEC.md | 877+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 877 insertions(+), 0 deletions(-)

diff --git a/tasks/tetris/eval/gameplay-bot/REFACTOR_SPEC.md b/tasks/tetris/eval/gameplay-bot/REFACTOR_SPEC.md @@ -0,0 +1,877 @@ +# Two-Tier Refactor Spec: Driver + Bot + +## Problem Statement + +The gameplay bot is ~3500 lines across 6 files, with two distinct concerns tangled +together: understanding the webpage (finding grids, clicking buttons, reading pixels, +sending keystrokes) and playing Tetris (phase orchestration, AI decisions, test +derivation, bug detection). The boundary between them is blurred: + +- `calibrate.ts` handles grid detection, start mechanism detection, control detection, + overlay detection, interactivity verification, screenshot sampling, visual change + detection, and page surveying -- all in one 1300-line file. +- `tests.ts` does phase orchestration, BUT ALSO calls `readGrid` directly during + mechanics tests, reads score elements, detects game over text, measures drop + intervals, detects next piece previews, and reads level displays. +- `player.ts` calls both `readGrid` and `page.keyboard.press` directly, coupling + AI logic to the Playwright API. +- `grid-reader.ts` is the cleanest module but still exports low-level grid analysis + utilities (bounding boxes, cell counts, piece identification) that the bot calls + directly instead of going through an abstraction. + +The result: any change to how the page is read ripples through all files. You cannot +test the AI player without a live Playwright page. You cannot swap the grid reader +without touching the test orchestrator. + +## Proposed Architecture + +``` + +------------------+ + | index.ts | Entry point: HTTP server, Playwright test, + | | report output. Unchanged. + +--------+---------+ + | + v + +------------------+ + | bot.ts | Layer 2: "The Brain" + | | Phase orchestration, AI decisions, test + | | derivation, competitive play, bug detection. + | | Calls only the Driver interface. + +--------+---------+ + | + v + +------------------+ + | driver.ts | Layer 1: "The Eyes and Hands" + | | Abstracts the webpage. Exposes a clean API. + | | Handles grid reading, start detection, + | | control detection, keyboard input. + +--------+---------+ + | + +---------+---------+ + | | + v v + +-----------+ +------------+ + | types.ts | | player.ts | Pure Tetris logic: AI heuristics, + | | | | board simulation, placement finding. + +-----------+ | | NO Playwright imports. NO page access. + +------------+ +``` + +### What goes where + +**driver.ts** -- "I can see and interact with this webpage" +- Grid detection (finding the grid on the page) +- Grid reading (10x20 boolean matrix from canvas/DOM/SVG) +- Start mechanism detection (the 5-phase cascade) +- Control detection (which keys the game responds to) +- Score/level/lines reading +- Keyboard input (move, rotate, drop) +- Screenshot capture +- Interactivity verification +- Page surveying (pre-test data collection) +- Background color sampling +- Visual change detection +- Next piece preview detection +- Game over text detection +- Re-calibration + +**bot.ts** -- "I know Tetris rules and test logic" +- Phase orchestration (the 8 conditional phases) +- Test derivation from session data (the 24 tests) +- Score/timing/event tracking (GameSession bookkeeping) +- Competitive play with bug detection +- Line clear detection logic (watching grid state transitions) +- Game over triggering strategy (stack pieces to fill grid) +- Endurance testing +- Report assembly (BotReport construction) + +**player.ts** -- "I know where to put pieces" (pure computation, no I/O) +- 4-heuristic scoring (aggregate height, lines, holes, bumpiness) +- Piece definitions (rotations, dimensions) +- Board simulation (drop piece, clear lines) +- Best placement finding +- No `Page` import, no `readGrid` call, no `keyboard.press` + +**types.ts** -- unchanged, all interfaces stay + +**grid-reader.ts** -- absorbed into driver.ts (see migration plan) + +**index.ts** -- unchanged except it calls bot.ts instead of tests.ts + +--- + +## Driver Interface + +```typescript +import type { Page } from "@playwright/test"; +import type { + Grid, + GridBounds, + RendererType, + Controls, + StartMechanism, + SurveyData, + PieceType, +} from "./types"; + +// --------------------------------------------------------------------------- +// Configuration returned by calibration, passed through subsequent calls. +// Replaces CalibrationResult for internal use within the Driver. +// --------------------------------------------------------------------------- + +export interface DriverCalibration { + renderer: RendererType; + gridDetected: boolean; + gridBounds: GridBounds | null; + cellWidth: number; + cellHeight: number; + controls: Controls; + startMechanism: StartMechanism; + scoreElementSelector: string | null; + levelElementSelector: string | null; + backgroundColor: [number, number, number] | null; + consoleErrors: string[]; + gridConfidence: number; + startButton?: { + selector: string; + text: string; + disappeared: boolean; + position: { x: number; y: number }; + }; +} + +// --------------------------------------------------------------------------- +// Grid snapshot: the grid state plus derived information the bot needs. +// --------------------------------------------------------------------------- + +export interface GridSnapshot { + /** The 10x20 boolean grid. null if reading failed. */ + grid: Grid | null; + /** Total filled cells. 0 if grid is null. */ + filledCount: number; + /** Filled cells in the bottom N rows. */ + filledInBottom(rows: number): number; + /** Whether any cell in the top N rows is filled. */ + hasFilledInTop(rows: number): boolean; + /** Number of fully complete rows. */ + completeRows: number; + /** Active piece cells (diff against settled grid). null if undetectable. */ + activePieceCells: [number, number][] | null; + /** Identified piece type from active piece cells. null if no active piece. */ + activePieceType: PieceType | null; +} + +// --------------------------------------------------------------------------- +// The Driver interface. This is what the Bot sees. +// --------------------------------------------------------------------------- + +export interface TetrisDriver { + // -- Lifecycle -- + + /** + * Navigate to the game URL, wait for load, begin console error collection. + * Returns false if the page failed to load. + */ + loadPage(url: string): Promise<{ loaded: boolean; detail: string; errorsOnLoad: number }>; + + /** + * Survey the page structure before any interaction. + * Returns information about overlays, canvas elements, DOM grids, visible text. + */ + surveyPage(): Promise<SurveyData>; + + /** + * Run full calibration: grid detection, start mechanism detection, + * control detection, score element detection, grid confidence measurement. + * Includes re-calibration fallback if initial detection fails. + * Never throws. + */ + calibrate(): Promise<DriverCalibration>; + + /** + * Re-run calibration after the game state may have changed + * (e.g., after starting, grid might appear that wasn't there before). + * Keeps the current calibration if re-calibration finds nothing better. + */ + recalibrate(): Promise<DriverCalibration>; + + /** + * Get the current calibration. Throws if calibrate() hasn't been called. + */ + getCalibration(): DriverCalibration; + + // -- Grid Reading -- + + /** + * Read the current grid state. Returns a GridSnapshot with the raw grid + * and derived metrics. If settled grid is provided, active piece detection + * is diffed against it. + * + * Returns a snapshot with grid: null if reading fails. + */ + readGrid(settledGrid?: Grid | null): Promise<GridSnapshot>; + + /** + * Compare two grids for equality. True if they differ. + */ + gridsAreDifferent(a: Grid | null, b: Grid | null): boolean; + + // -- Input -- + + /** + * Press a game control key. Uses the controls detected during calibration. + */ + pressKey(action: "left" | "right" | "down" | "rotate" | "drop"): Promise<void>; + + /** + * Press an arbitrary key (for testing CCW rotation with 'z', etc.). + */ + pressRawKey(key: string): Promise<void>; + + /** + * Wait for a specified duration (milliseconds). + */ + wait(ms: number): Promise<void>; + + // -- Score/Level/Lines Reading -- + + /** + * Read the current score from the detected score element. + * Returns null if no score element was found or reading fails. + */ + readScore(): Promise<number | null>; + + /** + * Read the current level from the page. + * Returns null if no level display found or reading fails. + */ + readLevel(): Promise<number | null>; + + // -- Page State Queries -- + + /** + * Check if "Game Over" (or equivalent) text is visible on the page. + * Returns the matched text, or null if not found. + */ + detectGameOverText(): Promise<string | null>; + + /** + * Check if a restart button/prompt is visible. + */ + detectRestartOption(): Promise<boolean>; + + /** + * Check if a next piece preview display exists. + */ + detectNextPiecePreview(): Promise<boolean>; + + /** + * Get all console errors collected since loadPage() was called. + */ + getConsoleErrors(): string[]; + + // -- Screenshots -- + + /** + * Take a screenshot. Returns raw PNG buffer. + */ + screenshot(): Promise<Buffer>; + + /** + * Measure the auto-drop interval (time between gravity-driven grid changes + * with no input). Returns average interval in ms, or 0 if unmeasurable. + */ + measureDropInterval(): Promise<number>; +} +``` + +### Method-to-Source Mapping + +Each Driver method maps to existing code as follows: + +| Driver Method | Current Source | Current Function(s) | +|---|---|---| +| `loadPage()` | tests.ts:277-303 | `loadAndCheckPage()`, `loadGamePage()` | +| `surveyPage()` | calibrate.ts:1300-1393 | `surveyPage()` | +| `calibrate()` | calibrate.ts:24-94 | `calibrate()`, `detectGrid()`, `detectStartMechanism()`, `detectControls()`, `detectScoreElement()`, `measureGridConfidence()` | +| `recalibrate()` | tests.ts:152-163 | inline re-calibration after start | +| `readGrid()` | grid-reader.ts:15-38, 46-118, 142-364 | `readGrid()`, `readCanvasGrid()`, `readDomGrid()`, plus `countFilled()`, `countFilledInBottomRows()`, `hasFilledInTopRows()`, `countCompleteRows()`, `detectActivePieceCells()`, `identifyPieceType()` | +| `gridsAreDifferent()` | grid-reader.ts:400-410 | `gridsAreDifferent()` | +| `pressKey()` | player.ts:251-277 | inline `page.keyboard.press()` calls using `cal.controls` | +| `pressRawKey()` | tests.ts:841-842 | inline `page.keyboard.press("z")` | +| `wait()` | everywhere | `page.waitForTimeout()` | +| `readScore()` | tests.ts:490-497, 529-538, 743-749 | inline score element reading | +| `readLevel()` | tests.ts:1597-1630 | `readLevelFromPage()` | +| `detectGameOverText()` | tests.ts:929-940 | inline `page.evaluate()` for game over text | +| `detectRestartOption()` | tests.ts:943-955 | inline `page.evaluate()` for restart buttons | +| `detectNextPiecePreview()` | tests.ts:1669-1717 | `detectNextPiecePreview()` | +| `getConsoleErrors()` | tests.ts:94-98 | `consoleErrors` array | +| `screenshot()` | player.ts:370-371 | `page.screenshot()` | +| `measureDropInterval()` | tests.ts:1636-1664 | `measureDropInterval()` | + +### How the Driver handles different renderers + +The Driver encapsulates renderer differences entirely. The Bot never knows or cares +whether the game uses canvas, DOM, SVG, or WebGL. + +``` +readGrid() internally: + if renderer === "canvas" && gridBounds: + -> readCanvasGrid() via page.evaluate(getImageData) + if renderer === "dom": + -> readDomGrid() via page.evaluate(DOM traversal) + if renderer === "svg": + -> future: readSvgGrid() + fallback: + -> try canvas if bounds exist, then try DOM +``` + +The `GridSnapshot` returned to the Bot is always the same shape regardless of renderer. + +### Re-calibration + +The Driver maintains mutable internal state: + +```typescript +class PlaywrightDriver implements TetrisDriver { + private page: Page; + private cal: DriverCalibration | null = null; + private consoleErrors: string[] = []; +} +``` + +`recalibrate()` re-runs grid detection and start detection, but preserves +the existing calibration if the new one is worse (e.g., grid detection fails +on re-calibration but worked initially). This handles: + +- Games where the grid appears only after clicking "Start" +- Games where the grid is rebuilt on game restart (new DOM elements) +- Games where the canvas resizes after initialization + +### Error handling + +| Scenario | Driver behavior | +|---|---| +| Grid read returns null | `readGrid()` returns `GridSnapshot` with `grid: null`, `filledCount: 0` | +| Grid read throws | Same as null -- caught internally, never thrown to Bot | +| No score element found | `readScore()` returns `null` | +| Score element disappeared | `readScore()` returns `null` (caught internally) | +| Console error during play | Accumulated in `consoleErrors`, accessible via `getConsoleErrors()` | +| Page navigation fails | `loadPage()` returns `{ loaded: false, detail: "..." }` | +| Canvas getImageData all zeros (no GPU) | Grid validation rejects (>60% filled), returns null | +| Calibration finds nothing | Returns calibration with `gridDetected: false`, `startMechanism: "unknown"` | + +The Driver never throws. All errors are represented in return values. + +--- + +## Bot Interface + +### How the Bot calls the Driver + +The Bot receives a `TetrisDriver` instance. It never imports `Page` or +anything from Playwright. It never calls `page.evaluate()`, `page.keyboard`, +or `page.screenshot()` directly. + +```typescript +// bot.ts +import type { TetrisDriver, DriverCalibration, GridSnapshot } from "./driver"; +import type { + TestResult, + GameplayStats, + GameSession, + CompetitivePlayResult, + SurveyData, + BotReport, + Grid, +} from "./types"; +import { findBestPlacement } from "./player"; + +export async function runAllTests( + driver: TetrisDriver, + serverUrl: string +): Promise<{ + testResults: TestResult[]; + calibration: DriverCalibration; + gameplay: GameplayStats; + session: GameSession; + survey: SurveyData; + competitivePlay: CompetitivePlayResult | null; +}> { + // Phase 1: Load + const loadResult = await driver.loadPage(serverUrl); + // ... + + // Phase 2: Calibrate + const cal = await driver.calibrate(); + // ... + + // Phase 3-8: Use only driver.readGrid(), driver.pressKey(), etc. +} +``` + +### Phase execution flow using Driver methods + +**Phase 1: Page Load** +``` +driver.loadPage(url) -> { loaded, detail, errorsOnLoad } +driver.wait(3000) +``` + +**Phase 2: Calibrate + Start** +``` +survey = driver.surveyPage() +cal = driver.calibrate() + // Internally: detectStartMechanism(), detectGrid(), etc. +if cal.startMechanism === "unknown" || !cal.gridDetected: + cal = driver.recalibrate() +``` + +**Phase 3: Basic Mechanics** +``` +// Auto-drop test +snap0 = driver.readGrid() +driver.wait(5000) +snap1 = driver.readGrid() +gridChanged = driver.gridsAreDifferent(snap0.grid, snap1.grid) + +// Movement tests +for dir in [left, right, down]: + snapBefore = driver.readGrid() + driver.pressKey(dir) + driver.wait(300) + snapAfter = driver.readGrid() + // compare + +// Rotation test +snapBefore = driver.readGrid() +driver.pressKey("rotate") +driver.wait(300) +snapAfter = driver.readGrid() +// compare bounding boxes of active piece cells + +// Hard drop test +driver.pressKey("drop") +driver.wait(500) +snapAfter = driver.readGrid() +// check bottom rows +``` + +**Phase 4: Piece Lifecycle** +``` +// Already tested during Phase 3 mechanics +// Piece locks: bottom cells persist across reads +// New piece spawns: top rows have cells after drop +// Multiple pieces: piecesLocked counter >= 3 +``` + +**Phase 5: Gameplay** +``` +driver.loadPage(url) +cal = driver.calibrate() +initialScore = driver.readScore() +// Play loop (60 pieces / 45s): +while pieces < 60 && elapsed < 45s: + snap = driver.readGrid(settledGrid) + if snap.activePieceCells: + placement = findBestPlacement(settledGrid, snap.activePieceType) + // Execute placement using driver.pressKey() + for i in 0..placement.rotations: + driver.pressKey("rotate") + driver.wait(50) + // Move to column + driver.pressKey("left" or "right") * N + driver.pressKey("drop") + driver.wait(100) + settledGrid = (await driver.readGrid()).grid + driver.wait(60) +finalScore = driver.readScore() +``` + +**Phase 6: Game Over** +``` +driver.loadPage(url) +driver.calibrate() +// Hard drop 40 times, checking grid after every 5 +for i in 0..40: + driver.pressKey("drop") + driver.wait(150) + if i % 5 === 0: + snap = driver.readGrid() + if snap.hasFilledInTop(4): + driver.pressKey("drop") + driver.wait(300) + snap2 = driver.readGrid() + if !driver.gridsAreDifferent(snap.grid, snap2.grid): + // Game over detected +gameOverText = driver.detectGameOverText() +``` + +**Phase 7: Endurance** +``` +driver.loadPage(url) +driver.calibrate() +// Play for 30 seconds using same play loop as Phase 5 +``` + +**Phase 8: Competitive Play** +``` +driver.loadPage(url) +driver.calibrate() +initialDropInterval = driver.measureDropInterval() +initialLevel = driver.readLevel() +// Play for 60 seconds with detailed tracking +// Every 5th poll: driver.readScore() +// Every 10th poll: driver.readLevel() +// Periodic: driver.pressRawKey("z") for CCW test +// Periodic: soft drop test via driver.pressKey("down") +finalDropInterval = driver.measureDropInterval() +nextPieceVisible = driver.detectNextPiecePreview() +gameOverText = driver.detectGameOverText() +restartAvailable = driver.detectRestartOption() +``` + +### Test derivation + +`deriveTestResults()` stays in bot.ts. It receives the `GameSession` data +that the Bot accumulated during phases, and produces the 24 `TestResult[]` array. +It does not need the Driver at all -- it operates on pure data. + +The function signature is unchanged: + +```typescript +function deriveTestResults( + session: GameSession, + cal: DriverCalibration, + loadResult: LoadResult, + consoleErrors: string[], + gameplay: GameplayStats, + phaseState: PhaseState, + competitivePlay: CompetitivePlayResult | null +): TestResult[] +``` + +### Where the AI player logic lives + +`player.ts` becomes a pure computation module. It keeps: + +- `PIECES` definitions +- `findBestPlacement()` (exported) +- `findBestPlacementGeneric()` +- `simulateDropPiece()` +- `clearLines()` +- `aggregateHeight()`, `countHoles()`, `bumpiness()` +- `stripActivePiece()` (exported) +- `Placement` interface (exported) + +It loses: + +- `playGame()` -- moves to bot.ts (it orchestrates grid reads + AI + key presses) +- `hardDrop()` -- replaced by `driver.pressKey("drop")` +- `playRandomMove()` -- moves to bot.ts +- `playRandomForDuration()` -- moves to bot.ts +- `tryFillRow()` -- moves to bot.ts +- `stackToGameOver()` -- moves to bot.ts +- `executePlacement()` -- moves to bot.ts (it calls driver.pressKey) +- `countTotalFilled()` -- redundant with GridSnapshot.filledCount + +After refactor, `player.ts` has zero Playwright imports. + +--- + +## Migration Plan + +### New files created + +| File | Purpose | Est. lines | +|---|---|---| +| `driver.ts` | TetrisDriver interface + PlaywrightDriver implementation | ~900 | +| `bot.ts` | Phase orchestration, play loops, test derivation | ~1100 | + +### Files modified + +| File | Change | +|---|---| +| `player.ts` | Remove all Playwright-dependent functions, keep pure AI logic | ~350 -> ~250 | +| `types.ts` | Add `DriverCalibration`, `GridSnapshot` interfaces (or keep in driver.ts). Minor additions. | ~205 -> ~220 | +| `index.ts` | Change import from `tests.ts` to `bot.ts`, instantiate `PlaywrightDriver`, pass to `runAllTests`. | ~260 -> ~270 | + +### Files deleted + +| File | Reason | +|---|---| +| `calibrate.ts` | Absorbed into `driver.ts` | +| `grid-reader.ts` | Absorbed into `driver.ts` | +| `tests.ts` | Replaced by `bot.ts` | + +### What stays + +- `types.ts` -- interfaces stay the same, report format unchanged +- `index.ts` -- HTTP server, Playwright test structure, report writing all stay +- `SPEC.md` -- unchanged +- `COMPETITIVE_PLAY_SPEC.md` -- unchanged +- Report format (`BotReport`) -- identical JSON output + +### Incremental migration (4 phases) + +**Phase A: Create driver.ts with the interface + implementation (no callers yet)** + +1. Create `driver.ts` with `TetrisDriver` interface and `PlaywrightDriver` class. +2. Move into it from `calibrate.ts`: + - `detectStartMechanism()` and its sub-functions (`tryKeyboardTriggers`, `tryDomButtons`, `tryCanvasClicks`) + - `detectGrid()` + - `detectControls()` + - `detectScoreElement()` + - `measureGridConfidence()` + - `surveyPage()` + - `sampleScreenshot()` + - `detectVisualChange()` + - `verifyInteractivity()` + - `clusterPoints()` + - `recalibrateWithRetry()` +3. Move into it from `grid-reader.ts`: + - `readGrid()`, `readCanvasGrid()`, `readDomGrid()` + - `sampleBackgroundColor()` + - `validateGridBounds()` + - `gridsAreDifferent()` + - `countFilled()`, `countFilledInBottomRows()`, `hasFilledInTopRows()` + - `countCompleteRows()`, `isRowComplete()` + - `getColumnHeights()` + - `detectActivePieceCells()`, `identifyPieceType()` +4. Move into it from `tests.ts`: + - `readLevelFromPage()` + - `measureDropInterval()` + - `detectNextPiecePreview()` + - `extractScoreFromText()` (internal helper) +5. Wrap everything behind `PlaywrightDriver` methods. +6. Export both the interface and the class. +7. At this point, old code still works -- `calibrate.ts`, `grid-reader.ts`, and `tests.ts` are unchanged. + +**Commit A**: "Add driver.ts: TetrisDriver interface and PlaywrightDriver implementation" + +**Phase B: Create bot.ts (calls driver.ts, replaces tests.ts)** + +1. Create `bot.ts` with the new `runAllTests()` that accepts `TetrisDriver`. +2. Move into it from `tests.ts`: + - `runAllTests()` (rewritten to call Driver instead of Playwright directly) + - `runBasicMechanicsPhase()` + - `runGameplayPhase()` + - `runGameOverPhase()` + - `runEndurancePhase()` + - `runCompetitivePlayPhase()` + - `deriveTestResults()` + - `ALL_TEST_NAMES` + - `emptyCalibration()` (adapted to return `DriverCalibration`) + - `loadAndCheckPage()` (replaced by `driver.loadPage()`) + - `boundingBox()` helper + - `countFilledInTopRows()` helper (local in tests.ts, replaced by GridSnapshot method) +3. Move into it from `player.ts`: + - `playGame()` (rewritten to call Driver) + - `executePlacement()` (rewritten to call Driver) + - `playRandomMove()` (rewritten to call Driver) + - `playRandomForDuration()` (rewritten to call Driver) + - `tryFillRow()` (rewritten to call Driver) + - `stackToGameOver()` (rewritten to call Driver) +4. bot.ts imports `findBestPlacement`, `stripActivePiece`, `Placement` from `player.ts` + and everything else from `driver.ts`. + +**Commit B**: "Add bot.ts: phase orchestration using TetrisDriver" + +**Phase C: Rewire index.ts, slim player.ts** + +1. Update `index.ts`: + - Import `PlaywrightDriver` from `./driver` + - Import `runAllTests` from `./bot` (not `./tests`) + - In the test body: `const driver = new PlaywrightDriver(page); const results = await runAllTests(driver, serverUrl);` +2. Remove from `player.ts`: + - `playGame()`, `hardDrop()`, `executePlacement()`, `playRandomMove()`, `playRandomForDuration()`, `tryFillRow()`, `stackToGameOver()` + - `import type { Page }` and `import { readGrid, ... }` from grid-reader + - `countTotalFilled()` (redundant) +3. `player.ts` now exports only: + - `findBestPlacement()` (accepts `Grid` and `PieceType`, returns `Placement | null`) + - `stripActivePiece()` (accepts `Grid` and cells, returns `Grid`) + - `Placement` interface + +**Commit C**: "Rewire index.ts to use bot.ts + driver.ts, slim player.ts" + +**Phase D: Delete old files** + +1. Delete `calibrate.ts` +2. Delete `grid-reader.ts` +3. Delete `tests.ts` +4. Verify all imports resolve +5. Run the full eval pipeline against a known artifact to confirm identical report output + +**Commit D**: "Remove old calibrate.ts, grid-reader.ts, tests.ts" + +### Backwards compatibility + +The report format (`BotReport`) does not change. The JSON output is byte-identical +for the same game input. The summary score calculation is unchanged. The test names +are unchanged. The competitive play data structure is unchanged. + +The only external-facing change is the internal file structure. Nothing downstream +(the scoring pipeline, the dashboard, the harness) needs to change. + +--- + +## File Structure After Refactor + +``` +gameplay-bot/ + types.ts ~220 lines Interfaces (unchanged) + driver.ts ~900 lines TetrisDriver interface + PlaywrightDriver class + player.ts ~250 lines Pure AI: heuristics, simulation, placement finding + bot.ts ~1100 lines Phases, play loops, test derivation, competitive play + index.ts ~270 lines Playwright test entry, HTTP server, report output + SPEC.md Unchanged + COMPETITIVE_PLAY_SPEC.md Unchanged + REFACTOR_SPEC.md This document +``` + +Total: ~2740 lines (down from ~3500 because of deduplication and removing +redundant helpers that now live behind the Driver). + +### Import/dependency graph + +``` +index.ts + -> driver.ts (PlaywrightDriver constructor) + -> bot.ts (runAllTests) + -> types.ts (BotReport) + +bot.ts + -> driver.ts (TetrisDriver interface, DriverCalibration, GridSnapshot) + -> player.ts (findBestPlacement, stripActivePiece, Placement) + -> types.ts (all data interfaces) + +driver.ts + -> types.ts (Grid, GridBounds, RendererType, Controls, etc.) + -> @playwright/test (Page) + +player.ts + -> types.ts (Grid, PieceType) + (NO @playwright/test import) +``` + +Key constraint: `bot.ts` does NOT import `@playwright/test`. It depends on the +`TetrisDriver` interface, not the implementation. This means the Bot can be tested +with a mock driver that returns canned grid states -- no browser needed. + +--- + +## Edge Cases + +### Games that need re-calibration mid-session + +**Scenario**: Grid appears only after clicking "Start". On page load, there is no +canvas and no DOM grid -- just a splash screen. + +**Current behavior**: `calibrate()` runs on the splash screen, finds nothing. +Then `tests.ts` tries start mechanisms, and after starting, re-runs `calibrate()`. + +**Driver behavior**: `calibrate()` includes start detection. If it starts the game +but finds no grid, it waits and re-scans. `recalibrate()` is also available for the +Bot to call explicitly after any phase reload. + +**Bot flow**: +``` +cal = driver.calibrate() +if cal.gridDetected === false && cal.startMechanism !== "unknown": + // Game started but grid not found yet -- wait and retry + driver.wait(500) + cal = driver.recalibrate() +``` + +### Games where the Driver cannot read the grid at all + +**Scenario**: Canvas game without GPU access. `getImageData()` returns all zeros. + +**Driver behavior**: `readGrid()` returns `GridSnapshot { grid: null }` every time. +The Bot sees grid failures accumulate. + +**Bot flow**: Phase 3 (mechanics) detects that `gridReadSuccess === 0`. The Bot +marks all grid-dependent tests as failed with detail "grid reader unavailable". +It does NOT fall back to screenshot-only testing (per the "NO FALSE POSITIVES" rule). +Competitive play is skipped. + +### Games that pause themselves + +**Scenario**: Player accidentally triggers a pause menu (Escape key, or a pause +button that overlaps with the game area). + +**Driver behavior**: `readGrid()` may return null (if an overlay covers the grid) +or return a static grid (same state on every read). The Driver does not know about +pausing -- it just reports what it sees. + +**Bot flow**: The play loop in bot.ts already handles stale grids. If the grid +hasn't changed for 8 seconds, it tries pressing the drop key (which may unpause). +If grid reads start returning null, the Bot counts consecutive failures. After 10 +consecutive null reads, it falls back to random key presses for a brief period, +then re-reads. + +The Bot could also try pressing Escape or P to dismiss a pause screen: +``` +if consecutiveUnchanged > 80: // 80 polls * 60ms = ~5 seconds + driver.pressRawKey("Escape") + driver.wait(500) + driver.pressRawKey("p") + driver.wait(500) +``` + +### Games with overlays that block gameplay + +**Scenario**: A modal overlay (tutorial, cookie consent, "enter your name" dialog) +appears on top of the game, blocking input. + +**Driver behavior**: `surveyPage()` detects overlays (positioned elements covering +>50% of viewport). The start mechanism detection already tries clicking overlays +and pressing Escape to dismiss them. + +**Bot flow**: If the game started but mechanics tests show no response to input +(movementsObserved === 0), the Bot can request a recalibrate, which may re-run +start detection and dismiss a new overlay. + +### Games in different languages + +**Scenario**: The game UI is in Spanish, Japanese, or any non-English language. +"Start", "Game Over", "Score" have different text. + +**Driver behavior**: Start mechanism detection is already fully language-agnostic +(visual change detection + interactivity verification, no text matching). Score +element detection falls back from labeled text ("Score: 0") to structural heuristics +(leaf element containing a standalone number). Game over text detection checks +multiple languages ("game over", "fin del juego", etc.) or falls back to +grid-state-based detection (grid frozen after filling to top). + +**Bot flow**: The Bot does not do any text matching. It delegates all text-based +detection to the Driver. Tests like `game_over` use `driver.detectGameOverText()` +which is the Driver's responsibility. The Bot adds a grid-based game over check +(frozen grid after stacking) as a secondary signal that doesn't depend on language. + +The `detectGameOverText()` method could be extended with more languages: +```typescript +// Inside driver.ts +const gameOverPatterns = [ + "game over", "gameover", "you lose", "try again", + "play again", "restart", "fin del juego", "juego terminado", + "ゲームオーバー", "游戏结束" +]; +``` + +But the primary game over detection in bot.ts (Phase 6) does not depend on text -- +it watches the grid freeze after filling to the top. + +--- + +## What This Spec Does NOT Cover + +- WebGL grid reading (not implemented yet, out of scope) +- New tests beyond the existing 24 +- Changes to the report format or scoring +- Dashboard changes +- Harness changes +- Performance optimization of grid reading +- Testability improvements beyond the Driver/Bot split (e.g., mock Driver tests) + +These are natural follow-ups after the refactor lands, but they are separate work items.

Impressum · Datenschutz