paper_type.json (319B)
1 { 2 "paper_type": "empirical", 3 "reason": "Runs experiments on LLM agents under various robustness conditions (partial observability, noise, non-stationarity) and reports quantitative findings about performance gaps and model behavior, with the benchmark as the evaluation tool rather than the primary contribution." 4 }