CLAUDE.md - ai-research-survey - Systematic scan of agentic development research. What's signal, what's noise.

CLAUDE.md (3488B)
      1 # AI Research Survey - Project Rules
      2 
      3 ## What This Is
      4 Systematic review of ~1,000 research papers evaluating methodological quality in the agentic AI / LLM programming space.
      5 
      6 ## Project Structure
      7 - `registry.jsonl` — One JSON object per line, one per paper. Source of truth for paper inventory.
      8 - `papers/<slug>/` — One directory per paper. Contains `paper.pdf` (local only), `scan.json`, optionally `calibration.json`, `deep_eval.json`.
      9 - `schema/` — JSON Schemas that agent outputs must conform to.
     10 - `agents/` — Prompt files for each agent type.
     11 - `context/` — Project requirements, methodology, related work.
     12 - `scripts/` — Pipeline tooling (harvest-citations, etc.).
     13 - `inbox/` — Drop PDFs here for the inbox-sorter agent to process.
     14 
     15 ## Registry Conventions
     16 - **ID slugs**: lowercase, hyphen-separated, concise. Include year. E.g., `metr-rct-2025`.
     17 - **ID slugs must be unique** across the entire registry.
     18 - **Status values**: `queued` → `downloaded` → `scanned` → `deep_eval`. Also `excluded`.
     19 - **Source values**: `manual`, `arxiv`, `huggingface`, `semantic_scholar`, `inbox`.
     20 - **Dedup before inserting**: Check `arxiv_id`, `doi`, and title (case-insensitive) against existing entries.
     21 - **Never delete registry entries**. Set status to `excluded` with a note explaining why.
     22 
     23 ## Model Assignments
     24 - **Harvester agent**: Sonnet (structured metadata extraction, no deep reasoning needed)
     25 - **Scan agent**: Opus (calibration showed persistent Sonnet generosity bias — 36% of disagreements)
     26 - **Audit/calibration agent**: Opus only (independent re-evaluation to measure inter-rater agreement — `/audit` command)
     27 - **Deep-eval agent**: Opus (requires careful verification work)
     28 - **Inbox sorter**: Sonnet
     29 
     30 ## PDFs
     31 - PDFs are stored locally for analysis but **never committed to git**.
     32 - `.gitignore` excludes `papers/*/*.pdf` and `inbox/*.pdf`.
     33 - Do not redistribute PDFs. Only structured outputs (scan.json, deep_eval.json) are publishable.
     34 
     35 ## Scan Output
     36 - Must conform to `schema/scan.schema.json`.
     37 - V1: 50 base questions. V2: 50 base + up to 15 conditional (experimental_rigor, data_leakage, survey_methodology).
     38 - Each checklist item has two boolean fields: `applies` (is the criterion relevant?) and `answer` (does the paper satisfy it?), plus a `justification` string.
     39 - V2 scans include `scan_version: 2` and `active_modules` array.
     40 - `cited_papers` array is required — extract 3-15 survey-relevant references per paper for citation chasing.
     41 - Validate with `python3 scripts/validate-scan.py` (supports both v1 and v2).
     42 - Run `scripts/harvest-citations.py` after scanning to discover new candidates.
     43 
     44 ## Calibration / Audit
     45 - `/audit` runs Opus calibration on existing scans. Always uses Opus — never Sonnet.
     46 - Opus independently answers the same 50-question checklist, then compares with scan.json.
     47 - Output: `papers/<slug>/calibration.json` with agreement rate, per-question disagreements, and Opus's full checklist.
     48 - Purpose: measure inter-rater reliability for the published paper. Target: >95% agreement.
     49 - Round 3 results (60 papers): 97.0% agreement. Existing calibration.json files compare Sonnet scans vs Opus.
     50 
     51 ## Code Style
     52 - Scripts in Python 3. No external dependencies unless unavoidable.
     53 - JSON output: `ensure_ascii=False`, one object per line for JSONL.
     54 - Dates in ISO 8601 (`YYYY-MM-DD`).
     55 
     56 ## Git
     57 - Commit messages: imperative mood, concise first line, body for detail.
     58 - Never commit PDFs.
     59 - Never amend published commits.
	ai-research-survey Systematic scan of agentic development research. What's signal, what's noise.
	git clone https://git.shiptheloop.com/ai-research-survey.git
	Log \| Files \| Refs