paper_type.json (270B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "Introduces OmniCode, a 1794-task benchmark for evaluating software development agents; empirical results on agent performance serve to demonstrate the benchmark's utility rather than being the primary contribution." 4 }