paper_type.json (250B)
1 { 2 "paper_type": "benchmark-creation", 3 "reason": "SEC-bench is the primary contribution—a new benchmark for evaluating LLM agents on real-world security tasks with 200 CVE instances, with baseline results provided for state-of-the-art models." 4 }