paper_type.json (318B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "The paper introduces MLE-bench, a new evaluation framework with 75 Kaggle ML engineering competitions; while it reports experimental results on baseline agents, the primary contribution is the benchmark itself rather than novel findings about agent capabilities." 4 }