paper_type.json (281B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "FrontierMath is a new benchmark dataset of expert-crafted mathematics problems designed to evaluate AI reasoning, with evaluation results presented to demonstrate the benchmark's value rather than as the primary contribution." 4 }