paper_type.json (233B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "REXBENCH is a novel benchmark for evaluating coding agents on AI research extension tasks, with baseline results from 9 LLM configurations establishing the evaluation framework." 4 }