scan.md (4376B)
1 Run the scan pipeline on papers that have paper.txt but no scan.json. 2 3 Arguments: $ARGUMENTS 4 - A number (e.g., `5`, `20`) sets the batch limit 5 - `all` or no argument runs unlimited until all papers are scanned 6 - `status` just prints current progress without scanning 7 - `v1` suffix forces v1 single-pass mode (e.g., `10 v1`) 8 9 ## Instructions 10 11 ### 1. Check status first 12 13 Run `python3 scripts/claim.py status` to see how many papers are available. 14 15 Also count total papers with paper.txt but no scan.json: 16 ```bash 17 find papers -name "paper.txt" -exec sh -c 'test ! -f "$(dirname {})/scan.json" && echo {}' \; | wc -l 18 ``` 19 20 Print the status summary to the user. 21 22 If the argument is `status`, stop here. 23 24 ### 2. Determine batch size and mode 25 26 Parse `$ARGUMENTS`: 27 - If it's a number, scan that many papers maximum 28 - If it's `all` or empty, scan everything available 29 - If it contains `v1`, use v1 single-pass mode (see below) 30 - Default: v2 two-pass mode with conditional modules 31 32 ### 3. Get the list of papers to scan 33 34 ```bash 35 python3 scripts/claim.py list --limit <N> 36 ``` 37 38 This returns slugs of papers that have paper.txt, no scan.json, and no active claim. 39 40 ### 4. Launch scan sub-agents in parallel batches 41 42 Launch **50 sub-agents at a time** using the Agent tool with `run_in_background: true`. 43 44 #### V2 mode (default) 45 46 For each sub-agent, use this prompt (fill in the slug): 47 48 --- 49 50 You are a v2 scan agent. Your job is to evaluate a single research paper using a two-pass process. 51 52 **Read these files first:** 53 1. `/root/projects/ai-research-survey/schema/scan.schema.json` — the full checklist schema 54 2. `/root/projects/ai-research-survey/agents/scan-agent.md` — answer rules and strictness guidelines 55 3. `/root/projects/ai-research-survey/papers/<SLUG>/paper.txt` — the paper to evaluate 56 57 **Then:** 58 1. Claim the paper: run `python3 /root/projects/ai-research-survey/scripts/claim.py take <SLUG>`. If it prints "taken", stop immediately. 59 60 2. **Pass 1 (Triage)**: Read `agents/scan-triage.md`. Extract metadata, assign methodology_tags, determine active conditional modules (experimental_rigor if benchmark-eval, data_leakage if benchmark-eval, survey_methodology if meta-analysis), set all applicability flags, extract cited papers, key findings, red flags. 61 62 3. **Pass 2 (Evaluation)**: Answer all 50 base checklist questions + any active conditional module questions. Follow the schema descriptions and scan-agent.md answer rules strictly. 63 64 4. Assemble the full scan.json with `scan_version: 2`, `active_modules`, and the complete checklist. Write to `/root/projects/ai-research-survey/papers/<SLUG>/scan.json`. 65 66 5. Validate: run `python3 /root/projects/ai-research-survey/scripts/validate-scan.py papers/<SLUG>/scan.json`. Fix any errors. 67 68 6. Release: run `python3 /root/projects/ai-research-survey/scripts/claim.py done <SLUG>` 69 70 Do NOT write to registry.jsonl. Only write scan.json. 71 72 --- 73 74 #### V1 mode (legacy, when `v1` flag is present) 75 76 For each sub-agent, use this prompt (fill in the slug): 77 78 --- 79 80 You are a scan agent. Your job is to evaluate a single research paper using a boolean checklist. 81 82 **Read these files first:** 83 1. `/root/projects/ai-research-survey/schema/scan.schema.json` — the checklist schema with evaluation criteria 84 2. `/root/projects/ai-research-survey/agents/scan-agent.md` — your full instructions 85 3. `/root/projects/ai-research-survey/papers/<SLUG>/paper.txt` — the paper to evaluate 86 87 **Then:** 88 1. Claim the paper: run `python3 /root/projects/ai-research-survey/scripts/claim.py take <SLUG>`. If it prints "taken", stop immediately — another agent is working on it. 89 2. Follow the instructions in scan-agent.md to evaluate the paper. 90 3. Write the result to `/root/projects/ai-research-survey/papers/<SLUG>/scan.json` 91 4. Release the claim: run `python3 /root/projects/ai-research-survey/scripts/claim.py done <SLUG>` 92 93 Do NOT write to registry.jsonl. Only write scan.json. 94 95 --- 96 97 ### 5. Wait for each batch to complete before launching the next 98 99 After each batch of 50 completes, report: 100 - How many succeeded (wrote scan.json) 101 - How many failed 102 - How many remain 103 104 Then launch the next batch of 50. Continue until the limit is reached or all papers are scanned. 105 106 ### 6. Final summary 107 108 After all batches complete, print: 109 - Total papers scanned this run 110 - Total papers now scanned overall 111 - Any failures (papers where scan.json was not written)