loop-benchmarking

Controlled experiments across agentic coding configurations. Same task, one variable, what actually works.
git clone https://git.shiptheloop.com/loop-benchmarking.git
Log | Files | Refs | README

task.yaml (656B)


      1 name: tetris
      2 description: Build a playable Tetris game that runs in the browser
      3 difficulty: easy
      4 category: visual-interactive
      5 
      6 prompt_styles:
      7   - simple
      8   - detailed
      9 
     10 languages:
     11   - typescript
     12   - javascript
     13 
     14 eval:
     15   structural:
     16     - entry_point_exists  # index.html or equivalent
     17     - build_succeeds      # npm run build or direct HTML
     18     - no_build_errors
     19   functional:
     20     framework: playwright
     21     test_file: functional.spec.ts
     22   quality:
     23     - lint
     24     - typecheck           # if typescript
     25     - accessibility       # axe-core audit
     26     - performance         # page load time, bundle size
     27     - no_console_errors   # during automated play session

Impressum · Datenschutz