paper_type.json (316B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "Introduces REval, a novel evaluation framework with four code reasoning tasks and an Incremental Consistency metric, then validates it by evaluating 15 LLMs; the primary contribution is the benchmark and evaluation methodology, not just the empirical findings." 4 }