commit 9168d67f29d824ef189647a7b5049acf0e55cdca
parent 1c6a723f27a3efff840ec524ce2f155237b0429a
Author: Brian Graham <brian@buildingbetterteams.de>
Date: Fri, 27 Feb 2026 21:05:53 +0100
Add .gitignore and CLAUDE.md project rules
- .gitignore: exclude PDFs from papers/ and inbox/, OS files, Python cache
- CLAUDE.md: registry conventions (slug format, dedup rules, status flow),
model assignments per agent, code style, git rules
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat:
2 files changed, 61 insertions(+), 0 deletions(-)
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,14 @@
+# Paper PDFs (local only, never publish)
+papers/*/paper.pdf
+papers/*/*.pdf
+inbox/*.pdf
+
+# OS files
+.DS_Store
+Thumbs.db
+
+# Python
+__pycache__/
+*.pyc
+*.pyo
+.venv/
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,47 @@
+# AI Research Survey - Project Rules
+
+## What This Is
+Systematic review of ~1,000 research papers evaluating methodological quality in the agentic AI / LLM programming space.
+
+## Project Structure
+- `registry.jsonl` — One JSON object per line, one per paper. Source of truth for paper inventory.
+- `papers/<slug>/` — One directory per paper. Contains `paper.pdf` (local only), `scan.json`, optionally `deep_eval.json`.
+- `schema/` — JSON Schemas that agent outputs must conform to.
+- `agents/` — Prompt files for each agent type.
+- `context/` — Project requirements, methodology, related work.
+- `scripts/` — Pipeline tooling (harvest-citations, etc.).
+- `inbox/` — Drop PDFs here for the inbox-sorter agent to process.
+
+## Registry Conventions
+- **ID slugs**: lowercase, hyphen-separated, concise. Include year. E.g., `metr-rct-2025`.
+- **ID slugs must be unique** across the entire registry.
+- **Status values**: `queued` → `downloaded` → `scanned` → `deep_eval`. Also `excluded`.
+- **Source values**: `manual`, `arxiv`, `huggingface`, `semantic_scholar`, `inbox`.
+- **Dedup before inserting**: Check `arxiv_id`, `doi`, and title (case-insensitive) against existing entries.
+- **Never delete registry entries**. Set status to `excluded` with a note explaining why.
+
+## Model Assignments
+- **Harvester agent**: Sonnet (structured metadata extraction, no deep reasoning needed)
+- **Scan agent**: Opus (requires judgment on methodology quality)
+- **Deep-eval agent**: Opus (requires careful verification work)
+- **Inbox sorter**: Sonnet
+
+## PDFs
+- PDFs are stored locally for analysis but **never committed to git**.
+- `.gitignore` excludes `papers/*/*.pdf` and `inbox/*.pdf`.
+- Do not redistribute PDFs. Only structured outputs (scan.json, deep_eval.json) are publishable.
+
+## Scan Output
+- Must conform to `schema/scan.schema.json`.
+- `cited_papers` array is required — extract 3-15 survey-relevant references per paper for citation chasing.
+- Run `scripts/harvest-citations.py` after scanning to discover new candidates.
+
+## Code Style
+- Scripts in Python 3. No external dependencies unless unavoidable.
+- JSON output: `ensure_ascii=False`, one object per line for JSONL.
+- Dates in ISO 8601 (`YYYY-MM-DD`).
+
+## Git
+- Commit messages: imperative mood, concise first line, body for detail.
+- Never commit PDFs.
+- Never amend published commits.