paper_type.json (252B)
1 { 2 "paper_type": "empirical", 3 "reason": "Proposes Meta-Rewarding method and runs experiments across 4 iterations, reporting quantitative improvements on AlpacaEval 2 (22.9%→39.4%) and Arena-Hard (20.6%→29.1%), with analysis of learned biases." 4 }