ai-research-survey

Systematic scan of agentic development research. What's signal, what's noise.
git clone https://git.shiptheloop.com/ai-research-survey.git
Log | Files | Refs

commit 69c92da1bfbb276ddd27e9ba8256d0087e01c43a
parent a6c809bdf74b788c3f3285e3a37f649be5193fe9
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Fri, 27 Feb 2026 21:11:39 +0100

Add downstream pipeline context to harvester agent prompt

Explain how registry entries feed into download, scan, and citation
chasing so the agent understands why arxiv_id matters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat:
Magents/harvester-agent.md | 11+++++++++++
1 file changed, 11 insertions(+), 0 deletions(-)

diff --git a/agents/harvester-agent.md b/agents/harvester-agent.md @@ -71,8 +71,19 @@ Skip papers that: - Don't make falsifiable claims - Are outside scope (non-code domains, unless methodology is transferable) +## Downstream Pipeline + +After you add entries to the registry, the rest of the pipeline handles them automatically: + +1. **Download**: `python scripts/download-arxiv.py` downloads PDFs for all queued entries that have an `arxiv_id`. So always include the `arxiv_id` when available -- it's the key that makes automated download work. +2. **Scan**: The scan agent reads each downloaded paper and produces `scan.json`. +3. **Citation chasing**: `python scripts/harvest-citations.py` extracts cited papers from scan results and proposes new registry entries, feeding back into discovery. + +You don't run these steps. Just know that your output feeds them, so complete and accurate `arxiv_id` fields are important. + ## Guidelines - Discovery only. Do not download PDFs or access full paper text. - When in doubt about relevance, include it with a note explaining the uncertainty. - Log your search queries and results for reproducibility. +- Always include `arxiv_id` when the paper is on arXiv. This enables automated PDF download.

Impressum · Datenschutz