paper_type.json (316B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "The paper introduces AART, a structured pipeline and resulting dataset for adversarial evaluation of LLMs across policy concepts, task formats, and geographic regions, with empirical validation showing superior coverage metrics compared to existing approaches." 4 }