ai-research-survey

Systematic scan of agentic development research. What's signal, what's noise.
git clone https://git.shiptheloop.com/ai-research-survey.git
Log | Files | Refs

commit 4b8436506afa1c261f8cd6e046caa136ce386732
parent 06cbf721cea34bee65e068cd7363caff35325a3b
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Tue, 14 Apr 2026 21:45:34 +0200

stats: include v1 scans with graceful degradation

The scan_version < 2 filter was excluding 558 papers (~28% of the
scanned corpus). Inspection showed the v1 rubric is a proper subset
of v2+: 50 identical questions across 11 identical categories, zero
dropped or changed. The v2+ additions (proxy_outcome_distinction +
data_leakage + experimental_rigor + survey_methodology = 17 questions
in one new field + 3 new conditional modules) are purely additive.

compute_overall_score already uses passed/applicable over present
questions, so v1 papers degrade gracefully: their 50 applicable
questions are scored normally and the 7 v2+-only questions are
treated as absent. classify_archetype only touches categories in
the shared 11. detect_games only references questions in the
shared 11. No scoring bias introduced.

Effect: n rises from 1,047 to 1,531 (+484 v1 papers that had
scorable data; 74 more v1 scans still excluded via the
"no applicable questions" check). Median moves 47.2 -> 49.1,
all game_pcts within 2 points of prior values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Mscripts/build-explorer-data.py | 9+++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/scripts/build-explorer-data.py b/scripts/build-explorer-data.py @@ -263,8 +263,13 @@ def build(): with open(scan_path) as f: scan = json.load(f) - if scan.get("scan_version", 1) < 2: - continue + # Include all scans regardless of version. The v1 rubric (50 questions) + # is a proper subset of v2+ (57 questions, adding data_leakage, + # experimental_rigor, and survey_methodology modules). compute_overall_score + # uses passed/applicable over present questions, so v1 papers degrade + # gracefully: their 50 applicable questions are scored normally and the + # 7 v2+-only questions are treated as absent (same as any paper where + # a conditional module doesn't apply). checklist = scan.get("checklist", {}) paper_meta = scan.get("paper", {})

Impressum · Datenschutz