paper_type.json (249B)
1 { 2 "paper_type": "empirical", 3 "reason": "Runs controlled experiments measuring how models respond to warning-framed training data, reports quantitative vulnerability rates, and tests mechanistic explanations via SAE analysis and interventions." 4 }