commit 69c92da1bfbb276ddd27e9ba8256d0087e01c43a
parent a6c809bdf74b788c3f3285e3a37f649be5193fe9
Author: Brian Graham <brian@buildingbetterteams.de>
Date: Fri, 27 Feb 2026 21:11:39 +0100
Add downstream pipeline context to harvester agent prompt
Explain how registry entries feed into download, scan, and citation
chasing so the agent understands why arxiv_id matters.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat:
1 file changed, 11 insertions(+), 0 deletions(-)
diff --git a/agents/harvester-agent.md b/agents/harvester-agent.md
@@ -71,8 +71,19 @@ Skip papers that:
- Don't make falsifiable claims
- Are outside scope (non-code domains, unless methodology is transferable)
+## Downstream Pipeline
+
+After you add entries to the registry, the rest of the pipeline handles them automatically:
+
+1. **Download**: `python scripts/download-arxiv.py` downloads PDFs for all queued entries that have an `arxiv_id`. So always include the `arxiv_id` when available -- it's the key that makes automated download work.
+2. **Scan**: The scan agent reads each downloaded paper and produces `scan.json`.
+3. **Citation chasing**: `python scripts/harvest-citations.py` extracts cited papers from scan results and proposes new registry entries, feeding back into discovery.
+
+You don't run these steps. Just know that your output feeds them, so complete and accurate `arxiv_id` fields are important.
+
## Guidelines
- Discovery only. Do not download PDFs or access full paper text.
- When in doubt about relevance, include it with a note explaining the uncertainty.
- Log your search queries and results for reproducibility.
+- Always include `arxiv_id` when the paper is on arXiv. This enables automated PDF download.