paper_type.json (281B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "The primary contribution is DSCodeBench itself—a new 1,000-problem benchmark for data science code generation with extensive test suites—while LLM evaluation serves to validate and characterize the benchmark's difficulty." 4 }