paper_type.json (342B)
1 { 2 "paper_type": "empirical", 3 "reason": "Evaluates 8 pre-trained language models on 5 existing program generation benchmarks, reporting quantitative findings about data duplication, output copying rates, and explainability patterns — the primary contribution is experimental evidence of reliability issues in the evaluation ecosystem." 4 }