paper_type.json (271B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "The primary contribution is Web-Bench, a new evaluation framework with 50 web projects and 20 sequential tasks per project; baseline experiments on existing models are secondary to the benchmark introduction itself." 4 }