ai-research-survey

Systematic scan of agentic development research. What's signal, what's noise.
git clone https://git.shiptheloop.com/ai-research-survey.git
Log | Files | Refs

commit 9168d67f29d824ef189647a7b5049acf0e55cdca
parent 1c6a723f27a3efff840ec524ce2f155237b0429a
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Fri, 27 Feb 2026 21:05:53 +0100

Add .gitignore and CLAUDE.md project rules

- .gitignore: exclude PDFs from papers/ and inbox/, OS files, Python cache
- CLAUDE.md: registry conventions (slug format, dedup rules, status flow),
  model assignments per agent, code style, git rules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat:
A.gitignore | 14++++++++++++++
ACLAUDE.md | 47+++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore @@ -0,0 +1,14 @@ +# Paper PDFs (local only, never publish) +papers/*/paper.pdf +papers/*/*.pdf +inbox/*.pdf + +# OS files +.DS_Store +Thumbs.db + +# Python +__pycache__/ +*.pyc +*.pyo +.venv/ diff --git a/CLAUDE.md b/CLAUDE.md @@ -0,0 +1,47 @@ +# AI Research Survey - Project Rules + +## What This Is +Systematic review of ~1,000 research papers evaluating methodological quality in the agentic AI / LLM programming space. + +## Project Structure +- `registry.jsonl` — One JSON object per line, one per paper. Source of truth for paper inventory. +- `papers/<slug>/` — One directory per paper. Contains `paper.pdf` (local only), `scan.json`, optionally `deep_eval.json`. +- `schema/` — JSON Schemas that agent outputs must conform to. +- `agents/` — Prompt files for each agent type. +- `context/` — Project requirements, methodology, related work. +- `scripts/` — Pipeline tooling (harvest-citations, etc.). +- `inbox/` — Drop PDFs here for the inbox-sorter agent to process. + +## Registry Conventions +- **ID slugs**: lowercase, hyphen-separated, concise. Include year. E.g., `metr-rct-2025`. +- **ID slugs must be unique** across the entire registry. +- **Status values**: `queued` → `downloaded` → `scanned` → `deep_eval`. Also `excluded`. +- **Source values**: `manual`, `arxiv`, `huggingface`, `semantic_scholar`, `inbox`. +- **Dedup before inserting**: Check `arxiv_id`, `doi`, and title (case-insensitive) against existing entries. +- **Never delete registry entries**. Set status to `excluded` with a note explaining why. + +## Model Assignments +- **Harvester agent**: Sonnet (structured metadata extraction, no deep reasoning needed) +- **Scan agent**: Opus (requires judgment on methodology quality) +- **Deep-eval agent**: Opus (requires careful verification work) +- **Inbox sorter**: Sonnet + +## PDFs +- PDFs are stored locally for analysis but **never committed to git**. +- `.gitignore` excludes `papers/*/*.pdf` and `inbox/*.pdf`. +- Do not redistribute PDFs. Only structured outputs (scan.json, deep_eval.json) are publishable. + +## Scan Output +- Must conform to `schema/scan.schema.json`. +- `cited_papers` array is required — extract 3-15 survey-relevant references per paper for citation chasing. +- Run `scripts/harvest-citations.py` after scanning to discover new candidates. + +## Code Style +- Scripts in Python 3. No external dependencies unless unavoidable. +- JSON output: `ensure_ascii=False`, one object per line for JSONL. +- Dates in ISO 8601 (`YYYY-MM-DD`). + +## Git +- Commit messages: imperative mood, concise first line, body for detail. +- Never commit PDFs. +- Never amend published commits.

Impressum · Datenschutz