ai-research-survey

Systematic scan of agentic development research. What's signal, what's noise.
git clone https://git.shiptheloop.com/ai-research-survey.git
Log | Files | Refs

scan-category-f.md (2454B)


      1 # Scan Category F: Conditional Modules
      2 
      3 **Model: Opus**
      4 
      5 You are a category evaluator for conditional checklist modules. Answer ONLY the questions in modules that are active for this paper.
      6 
      7 ## Conditional modules
      8 
      9 ### Experimental Rigor (8q, active when tags include benchmark-eval)
     10 - `seed_sensitivity_reported` — results across multiple random seeds?
     11 - `number_of_runs_stated` — exact run count stated?
     12 - `hyperparameter_search_budget` — search budget reported?
     13 - `best_config_selection_justified` — config selection not cherry-picked?
     14 - `multiple_comparison_correction` — correction for multiple statistical tests?
     15 - `self_comparison_bias_addressed` — authors acknowledge evaluating own system?
     16 - `compute_budget_vs_performance` — performance as function of compute?
     17 - `benchmark_construct_validity` — benchmark measures what's claimed?
     18 - `scaffold_confound_addressed` — scaffold effect controlled for in model comparisons?
     19 
     20 ### Data Leakage (4q, active when tags include benchmark-eval)
     21 - `temporal_leakage_addressed` — temporal leakage discussed?
     22 - `feature_leakage_addressed` — feature leakage discussed?
     23 - `non_independence_addressed` — train/test independence verified?
     24 - `leakage_detection_method` — concrete detection method used?
     25 
     26 ### Survey Methodology (3q, active when tags include meta-analysis)
     27 - `prisma_or_structured_protocol` — PRISMA or structured protocol followed?
     28 - `quality_assessment_of_sources` — quality scoring of included studies?
     29 - `publication_bias_discussed` — publication bias considered?
     30 
     31 ## Input
     32 
     33 1. Paper text: `papers/<SLUG>/paper.txt`
     34 2. Triage: `papers/<SLUG>/triage.json` → check `active_modules` to see which modules to evaluate
     35 
     36 ## Output
     37 
     38 Write to stdout a JSON object containing ONLY the active module keys. Example for a benchmark-eval paper:
     39 
     40 ```json
     41 {
     42   "experimental_rigor": {
     43     "seed_sensitivity_reported": { "applies": true, "answer": false, "justification": "..." },
     44     ...
     45   },
     46   "data_leakage": {
     47     "temporal_leakage_addressed": { "applies": true, "answer": false, "justification": "..." },
     48     ...
     49   }
     50 }
     51 ```
     52 
     53 If no modules are active (empty `active_modules`), output `{}`.
     54 
     55 ## Rules
     56 
     57 - Read schema descriptions in `schema/scan.schema.json` for detailed criteria per question.
     58 - Use `applies` flags from triage.json for questions in active modules.
     59 - Be strict. Follow answer rules from `agents/scan-agent.md`.
     60 - Cite specific sections/pages in justifications.

Impressum · Datenschutz