paper_type.json (252B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "VERINA's primary contribution is a new 189-task benchmark in Lean for evaluating verifiable code generation; while baseline evaluations are included, the benchmark itself is the core contribution." 4 }