paper_type.json (328B)
1 { 2 "paper_type": "empirical", 3 "reason": "Primary contribution is quantitative experimental findings on LLM judge performance (80%+ agreement rates, bias identification, and mitigation strategies) across controlled and crowdsourced settings, with MT-Bench serving as an evaluation vehicle rather than the main contribution." 4 }