paper_type.json (259B)
1 { 2 "paper_type": "empirical", 3 "reason": "The paper experimentally evaluates 8 LLMs as judges for code generation and summarization tasks, reporting quantitative metrics on judgment accuracy, agreement rates with human evaluations, and systematic biases." 4 }