commit 4b8436506afa1c261f8cd6e046caa136ce386732
parent 06cbf721cea34bee65e068cd7363caff35325a3b
Author: Brian Graham <brian@buildingbetterteams.de>
Date: Tue, 14 Apr 2026 21:45:34 +0200
stats: include v1 scans with graceful degradation
The scan_version < 2 filter was excluding 558 papers (~28% of the
scanned corpus). Inspection showed the v1 rubric is a proper subset
of v2+: 50 identical questions across 11 identical categories, zero
dropped or changed. The v2+ additions (proxy_outcome_distinction +
data_leakage + experimental_rigor + survey_methodology = 17 questions
in one new field + 3 new conditional modules) are purely additive.
compute_overall_score already uses passed/applicable over present
questions, so v1 papers degrade gracefully: their 50 applicable
questions are scored normally and the 7 v2+-only questions are
treated as absent. classify_archetype only touches categories in
the shared 11. detect_games only references questions in the
shared 11. No scoring bias introduced.
Effect: n rises from 1,047 to 1,531 (+484 v1 papers that had
scorable data; 74 more v1 scans still excluded via the
"no applicable questions" check). Median moves 47.2 -> 49.1,
all game_pcts within 2 points of prior values.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat:
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/scripts/build-explorer-data.py b/scripts/build-explorer-data.py
@@ -263,8 +263,13 @@ def build():
with open(scan_path) as f:
scan = json.load(f)
- if scan.get("scan_version", 1) < 2:
- continue
+ # Include all scans regardless of version. The v1 rubric (50 questions)
+ # is a proper subset of v2+ (57 questions, adding data_leakage,
+ # experimental_rigor, and survey_methodology modules). compute_overall_score
+ # uses passed/applicable over present questions, so v1 papers degrade
+ # gracefully: their 50 applicable questions are scored normally and the
+ # 7 v2+-only questions are treated as absent (same as any paper where
+ # a conditional module doesn't apply).
checklist = scan.get("checklist", {})
paper_meta = scan.get("paper", {})