Harvester run 2: 2492 new papers via S2 citation graph + keyword search - ai-research-survey - Systematic scan of agentic development research. What's signal, what's noise.

commit 1021d39ac6f95f5694904bc4c19a3953006c570d
parent 9aa129f9efbb8bf248c15cdd99a3c0205c7295b7
Author: Brian Graham <brian@buildingbetterteams.de>
Date:   Fri, 27 Feb 2026 21:45:23 +0100

Harvester run 2: 2492 new papers via S2 citation graph + keyword search

Phase 1 (citation graph): 692 new papers
  - Fetched citations + references for 8 seed papers via Semantic Scholar API
  - Seeds: METR RCT, Emergent abilities mirage, MAST, Sleeper Agents,
    TypeScript type-check, Scaffolded LLMs, Remote Labor Index, Code gen survey

Phase 2 (keyword search): 1800 new papers
  - 15 query clusters via Semantic Scholar paper search
  - Queries: LLM code generation, AI code review, prompt injection,
    alignment deception, APR, test generation, RAG code, multi-agent
    failure, scaling, AI software engineering, code completion, etc.

Registry grows from 155 → 2647 papers (well past 1000 target).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Diffstat:
M registry.jsonl  | 2492 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 2492 insertions(+), 0 deletions(-)
diff --git a/registry.jsonl b/registry.jsonl
@@ -153,3 +153,2495 @@
 {"id":"nl2repo-bench-2025","title":"NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents","authors":["Unknown"],"year":2025,"venue":"arXiv","source_url":"https://arxiv.org/abs/2512.12730","arxiv_id":"2512.12730","source":"arxiv","status":"queued","tags":["benchmarks","agents","code-generation"],"added":"2026-02-27","notes":"Benchmark for long-horizon repository generation from natural language. Evaluates agents on full codebase creation tasks beyond single-file completion."}
 {"id":"configuring-agentic-coding-tools-2026","title":"Configuring Agentic AI Coding Tools: An Exploratory Study","authors":["Unknown"],"year":2026,"venue":"arXiv","source_url":"https://arxiv.org/abs/2602.14690","arxiv_id":"2602.14690","source":"arxiv","status":"queued","tags":["agents","productivity","observational"],"added":"2026-02-27","notes":"Exploratory study of how developers configure agentic AI coding tools. Setup transparency and configuration decisions affect outcomes."}
 {"id":"agentic-programming-survey-2025","title":"AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities","authors":["Unknown"],"year":2025,"venue":"arXiv","source_url":"https://arxiv.org/abs/2508.11126","arxiv_id":"2508.11126","source":"arxiv","status":"queued","tags":["agents","code-generation","survey"],"added":"2026-02-27","notes":"Survey of agentic programming techniques: planning, tool use, memory, reflection. Challenges and open problems in deploying AI coding agents at scale."}
+{"id": "artificial-intelligence-assistance-2026", "title": "Artificial intelligence assistance in foresight research: Enhancing technology assessment through data-driven methods", "authors": ["Ewa Chodakowska", "Wojciech Danilczuk", "J. Nazarko"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.12913/22998624/211285", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Foresight can be viewed as an approach to managing uncertainty – an instrument that enables foreseeing while actively shaping the future under conditions of unpredictability. The rapid development of ", "doi": "10.12913/22998624/211285"}
+{"id": "vibe-coding-future-2026", "title": "Is Vibe Coding the Future of Software?", "authors": ["N. Kshetri", "Jeffrey M. Voas"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/mc.2025.3634712", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/mc.2025.3634712"}
+{"id": "editflow-benchmarking-optimizing-2026", "title": "EditFlow: Benchmarking and Optimizing Code Edit Recommendation Systems via Reconstruction of Developer Flows", "authors": ["Chenyan Liu", "Yun Lin", "Jiaxin Chang", "Jiawei Liu", "Binhang Qi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.21697", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) for code editing have achieved remarkable progress, yet recent empirical studies reveal a fundamental disconnect between technical accuracy and developer productivity. Des", "arxiv_id": "2602.21697"}
+{"id": "many-ai-analysts-2026", "title": "Many AI Analysts, One Dataset: Navigating the Agentic Data Science Multiverse", "authors": ["Martin Bertran", "Riccardo Fogliato", "Zhiwei Steven Wu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.18710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The conclusions of empirical research depend not only on data but on a sequence of analytic decisions that published results seldom make explicit. Past ``many-analyst\"studies have demonstrated this: i", "arxiv_id": "2602.18710"}
+{"id": "what-cut-predicting-2026", "title": "What to Cut? Predicting Unnecessary Methods in Agentic Code Generation", "authors": ["Kanetaka Watanabe", "Tatsuya Shirai", "Yutaro Kashiwa", "Hajimu Iida"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.17091", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agentic Coding, powered by autonomous agents such as GitHub Copilot and Cursor, enables developers to generate code, tests, and pull requests from natural language instructions alone. While this accel", "arxiv_id": "2602.17091"}
+{"id": "measuring-mid2025-llmassistance-2026", "title": "Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology", "authors": ["Shenda Hong", "Alexander Kleinman", "Alyssa J. Mathiowetz", "Adam Howes", "Julian Cohen"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.16703", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) perform strongly on biological benchmarks, raising concerns that they may help novice actors acquire dual-use laboratory skills. Yet, whether this translates to improved h", "arxiv_id": "2602.16703"}
+{"id": "interpretive-cultures-resonance-2026", "title": "Interpretive Cultures: Resonance, randomness, and negotiated meaning for AI-assisted tarot divination", "authors": ["Matthew Prock", "Ziv Epstein", "Hope Schroeder", "Amy Smith", "Cassandra Lee"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.11367", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While generative AI tools are increasingly adopted for creative and analytical tasks, their role in interpretive practices, where meaning is subjective, plural, and non-causal, remains poorly understo", "arxiv_id": "2602.11367", "doi": "10.1145/3772318.3791571"}
+{"id": "design-evaluation-assisted-2026", "title": "Design and Evaluation of an Assisted Programming Interface for Behavior Trees in Robotics", "authors": ["J. Styrud", "Matteo Iovino", "Rebecca Stower", "Mart Kartavsev", "Mikael Norrlof"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.09772", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The possibility to create reactive robot programs faster without the need for extensively trained programmers is becoming increasingly important. So far, it has not been explored how various technique", "arxiv_id": "2602.09772"}
+{"id": "exploring-aiaugmented-sensemaking-2026", "title": "Exploring AI-Augmented Sensemaking of Patient-Generated Health Data: A Mixed-Method Study with Healthcare Professionals in Cardiac Risk Reduction", "authors": ["Pavithren V. S. Pakianathan", "Rania Islambouli", "Diogo Branco", "Albrecht Schmidt", "Tiago Guerreiro"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05687", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Individuals are increasingly generating substantial personal health and lifestyle data, e.g. through wearables and smartphones. While such data could transform preventative care, its integration into ", "arxiv_id": "2602.05687"}
+{"id": "steering-llms-scalable-2026", "title": "Steering LLMs via Scalable Interactive Oversight", "authors": ["Enyu Zhou", "Zhiheng Xi", "Long Ma", "Zhihao Zhang", "Shihan Dou"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04210", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models increasingly automate complex, long-horizon tasks such as \\emph{vibe coding}, a supervision gap has emerged. While models excel at execution, users often struggle to guide the", "arxiv_id": "2602.04210"}
+{"id": "tasklevel-evaluation-ai-2026", "title": "A Task-Level Evaluation of AI Agents in Open-Source Projects", "authors": ["Shojibur Rahman", "Md. Fazle Rabbi", "M. Zibran"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we present a comparative study of five autonomous coding agents using AIDev-pop, which is a public dataset containing thousands of AI-generated pull requests (PRs) across popular open-s", "arxiv_id": "2602.02345"}
+{"id": "from-horizontal-layering-2026", "title": "From Horizontal Layering to Vertical Integration: A Comparative Study of the AI-Driven Software Development Paradigm", "authors": ["Chi Zhang", "Zehan Li", "Ziqiang Zhong", "Haibing Ma", "Dan Xiao"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22667", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper examines the organizational implications of Generative AI adoption in software engineering through a multiple-case comparative study. We contrast two development environments: a traditional", "arxiv_id": "2601.22667", "doi": "10.48550/arXiv.2601.22667"}
+{"id": "coding-agents-generating-2026", "title": "Are Coding Agents Generating Over-Mocked Tests? An Empirical Study", "authors": ["Andre Hora", "Romain Robbes"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00409", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Coding agents have received significant adoption in software development recently. Unlike traditional LLM-based code completion tools, coding agents work with autonomy (e.g., invoking external tools) ", "arxiv_id": "2602.00409"}
+{"id": "control-models-inide-2026", "title": "Control Models for In-IDE Code Completion", "authors": ["Aral de Moor", "Yana Hrynevich", "Hleb Badzeika", "Vladyslav Furda", "Marko Kojic"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.20223", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce control models for LLM-powered code completion in JetBrains IDEs: ML classifiers which trigger inference and filter the generated suggestions to better align them with users and reduce un", "arxiv_id": "2601.20223", "doi": "10.1145/3786151.3788608"}
+{"id": "how-ai-impacts-2026", "title": "How AI Impacts Skill Formation", "authors": ["Judy Hanwen Shen", "Alex Tamkin"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.20245", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively su", "arxiv_id": "2601.20245", "doi": "10.48550/arXiv.2601.20245"}
+{"id": "promises-perils-timely-2026", "title": "Promises, Perils, and (Timely) Heuristics for Mining Coding Agent Activity", "authors": ["Romain Robbes", "Théo Matricon", "Thomas Degueule", "Andre Hora", "Stefano Zacchiroli"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.18345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In 2025, coding agents have seen a very rapid adoption. Coding agents leverage Large Language Models (LLMs) in ways that are markedly different from LLM-based code completion, making their study criti", "arxiv_id": "2601.18345", "doi": "10.48550/arXiv.2601.18345"}
+{"id": "adoption-generative-artificial-2026", "title": "Adoption of Generative Artificial Intelligence in the German Software Engineering Industry: An Empirical Study", "authors": ["Ludwig Felder", "Tobias Eisenreich", "Mahsa Fischer", "Stefan Wagner", "Chunyang Chen"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.16700", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative artificial intelligence (GenAI) tools have seen rapid adoption among software developers. While adoption rates in the industry are rising, the underlying factors influencing the effective u", "arxiv_id": "2601.16700", "doi": "10.48550/arXiv.2601.16700"}
+{"id": "hogyan-igazodjunk-el-2026", "title": "Hogyan igazodjunk el a mesterséges intelligencia munkaerőpiaci hatásait övező zajban?", "authors": ["Andrea Szalavetz"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.18414/ksz.2026.1.72", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Az írás a szakirodalom kritikai elemzésével és anekdotikus példákkal mutatja be a mesterséges intelligencia (AI) munkaerőpiacra gyakorolt hatásának értékelését nehezítő „zajt”. Elemzi az AI fejlődésén", "doi": "10.18414/ksz.2026.1.72"}
+{"id": "changes-coding-behavior-2026", "title": "Changes in Coding Behavior and Performance Since the Introduction of LLMs", "authors": ["Yufan Zhang", "Jaromír Savelka", "S. Goldstein", "M. Conway"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11835", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The widespread availability of large language models (LLMs) has changed how students engage with coding and problem-solving. While these tools may increase student productivity, they also make it more", "arxiv_id": "2601.11835", "doi": "10.1145/3785022.3785075"}
+{"id": "evolving-ai-longitudinal-2026", "title": "Evolving with AI: A Longitudinal Analysis of Developer Logs", "authors": ["Agnia Sergeyuk", "Eric Huang", "Dariia Karaeva", "Anastasiia Serova", "Yaroslav Golubev"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.10258", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI-powered coding assistants are rapidly becoming fixtures in professional IDEs, yet their sustained influence on everyday development remains poorly understood. Prior research has focused on short-te", "arxiv_id": "2601.10258", "doi": "10.1145/3744916.3787811"}
+{"id": "promises-perils-llm-2026", "title": "Promises and Perils of LLM- and Agent-Generated Code", "authors": ["P. Devanbu", "Benoit Baudry", "Jeffrey M. Voas"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MC.2025.3627694", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article draws attention to the fact that, traditionally, software maintenance costs have strongly dominated initial development costs, and calls for more in-depth, focused, specialized studies of", "doi": "10.1109/MC.2025.3627694"}
+{"id": "empirical-study-generative-2025", "title": "An Empirical Study of Generative AI Adoption in Software Engineering", "authors": ["G. Giray", "Onur Demirörs", "Marcos Kalinowski", "Daniel Méndez"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.23327", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context. GenAI tools are being increasingly adopted by practitioners in SE, promising support for several SE activities. Despite increasing adoption, we still lack empirical evidence on how GenAI is u", "arxiv_id": "2512.23327", "doi": "10.48550/arXiv.2512.23327"}
+{"id": "more-code-less-2025", "title": "More code, less validation: Risk factors for over-reliance on AI coding tools among scientists", "authors": ["Gabrielle O'Brien", "Alexis Parker", "Nasir Eisty", "Jeffrey Carver"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.19644", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Programming is essential to modern scientific research, yet most scientists report inadequate training for the software development their work demands. Generative AI tools capable of code generation m", "arxiv_id": "2512.19644", "doi": "10.48550/arXiv.2512.19644"}
+{"id": "haieval-measuring-humanai-2025", "title": "HAI-Eval: Measuring Human-AI Synergy in Collaborative Coding", "authors": ["Hanjun Luo", "Chiming Ni", "Jiaheng Wen", "Zhimu Huang", "Yiran Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.04111", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-powered coding agents are reshaping the development paradigm. However, existing evaluation systems, neither traditional tests for humans nor benchmarks for LLMs, fail to capture this shift. They r", "arxiv_id": "2512.04111", "doi": "10.48550/arXiv.2512.04111"}
+{"id": "can-vibe-coding-2025", "title": "Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning", "authors": ["Panayiotis Danassis", "N. Goel"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.20613", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid proliferation of Large Language Models (LLMs) has revolutionized AI-assisted code generation. This rapid development of LLMs has outpaced our ability to properly benchmark them. Prevailing b", "arxiv_id": "2511.20613", "doi": "10.48550/arXiv.2511.20613"}
+{"id": "prompts-first-precision-2025", "title": "Prompts First, Precision Later: Reviving the Vision of Natural Language Programming for Computing Education", "authors": ["Brent N. Reeves", "J. Prather", "Paul Denny", "Juho Leinonen", "Stephen MacNeil"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3769994.3770039", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) is disrupting Computer Science Education, proving to be increasingly capable at more and more challenges. Although some educators consider this a threat to computing education, w", "doi": "10.1145/3769994.3770039"}
+{"id": "lumen-developer-agency-2025", "title": "Lumen: Developer Agency Through Transparent Context Control in AI-Assisted Programming", "authors": ["Nakul Goel", "Glaucia Melo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CASCON66301.2025.00024", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The landscape of software engineering has undergone rapid transformation with the emergence of Artificial Intelligence (AI) coding assistants, which are being increasingly integrated into development ", "doi": "10.1109/CASCON66301.2025.00024"}
+{"id": "cognitive-risks-ai-2025", "title": "Cognitive Risks of AI: Literacy, Trust, and Critical Thinking", "authors": ["Abhinandan Kulal"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1080/08874417.2025.2582050", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1080/08874417.2025.2582050"}
+{"id": "validity-what-you-2025", "title": "Validity Is What You Need", "authors": ["Sebastian Benthall", "A. Clark"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.27628", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While AI agents have long been discussed and studied in computer science, today's Agentic AI systems are something new. We consider other definitions of Agentic AI and propose a new realist definition", "arxiv_id": "2510.27628", "doi": "10.48550/arXiv.2510.27628"}
+{"id": "ai-as-cognitive-2025", "title": "AI as Cognitive Amplifier: Rethinking Human Judgment in the Age of Generative AI", "authors": ["Tao An"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.10961", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Through extensive experience training professionals and individual users in AI tool adoption since the GPT-3 era, I have observed a consistent pattern: the same AI tool produces dramatically different", "arxiv_id": "2512.10961", "doi": "10.48550/arXiv.2512.10961"}
+{"id": "user-misconceptions-llmbased-2025", "title": "User Misconceptions of LLM-Based Conversational Programming Assistants", "authors": ["Gabrielle O'Brien", "Antonio Pedro Santos Alves", "Sebastian Baltes", "Grischa Liebel", "Mircea Lungu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.25662", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Programming assistants powered by large language models (LLMs) have become widely available, with conversational assistants like ChatGPT proving particularly accessible to less experienced programmers", "arxiv_id": "2510.25662", "doi": "10.48550/arXiv.2510.25662"}
+{"id": "developer-productivity-genai-2025", "title": "Developer Productivity with GenAI", "authors": ["Sadia Afroz", "Zixuan Feng", "Katie Kimura", "Bianca Trinkenreich", "Igor Steinmacher"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.24265", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) tools are increasingly being adopted in software development as productivity aids. However, evidence regarding where and when these tools actually enhance productivity is unclear", "arxiv_id": "2510.24265", "doi": "10.48550/arXiv.2510.24265"}
+{"id": "how-do-ai-2025", "title": "How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations", "authors": ["Z. Z. Wang", "Yijia Shao", "Omar Shaikh", "Daniel Fried", "Graham Neubig"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.22780", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents are continually optimized for tasks related to human work, such as software engineering and professional writing, signaling a pressing trend with significant impacts on the human workforce. ", "arxiv_id": "2510.22780", "doi": "10.48550/arXiv.2510.22780"}
+{"id": "ten-simple-rules-2025", "title": "Ten Simple Rules for AI-Assisted Coding in Science", "authors": ["Eric W. Bridgeford", "Iain Campbell", "Zijao Chen", "Zhicheng Lin", "Harrison Ritz"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.22254", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While AI coding tools have demonstrated potential to accelerate software development, their use in scientific computing raises critical questions about code quality and scientific validity. In this pa", "arxiv_id": "2510.22254", "doi": "10.48550/arXiv.2510.22254"}
+{"id": "vibe-coding-ainative-2025", "title": "Vibe Coding: Toward an AI-Native Paradigm for Semantic and Intent-Driven Programming", "authors": ["Vinay Bamil"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.17842", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models have enabled developers to generate software by conversing with artificial intelligence systems rather than writing code directly. This paper introduces vibe c", "arxiv_id": "2510.17842", "doi": "10.48550/arXiv.2510.17842"}
+{"id": "from-gains-strains-2025", "title": "From Gains to Strains: Modeling Developer Burnout with GenAI Adoption", "authors": ["Zixuan Feng", "Sadia Afroz", "Anita Sarma"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.07435", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) is rapidly reshaping software development workflows. While prior studies emphasize productivity gains, the adoption of GenAI also introduces new pressures that may harm developer", "arxiv_id": "2510.07435"}
+{"id": "role-artificial-intelligence-2025", "title": "The Role of Artificial Intelligence in Enhancing Operational Efficiency and Cost Optimization in Engineering-Driven Enterprises", "authors": ["Nandha Kumar", "Balaji Jayakrishnan", "Toufik Mzili"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.63503/j.ijaimd.2025.169", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The business environment of engineering-driven enterprises is characterized by complex projects, strict deadlines, and financial constraints, which means that operational efficiency serves as a major ", "doi": "10.63503/j.ijaimd.2025.169"}
+{"id": "us-betting-economy-2025", "title": "The U.S. Is Betting the Economy on ‘Scaling’ AI: Where Is the Intelligence When One Needs It?", "authors": ["Servaas Storm"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1080/08911916.2026.2616133", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract The AI industry is betting that ‘scaling’, i.e., adding more and more data, GPUs, compute infrastructure and dollars, will lead to machine superintelligence or Artificial General Intelligence", "doi": "10.1080/08911916.2026.2616133"}
+{"id": "automatically-generating-web-2025", "title": "Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development", "authors": ["Yuxuan Wan", "Ting Liang", "Jiakai Xu", "Jingyu Xiao", "Yintong Huo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.25297", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Developing full-stack web applications is complex and time-intensive, demanding proficiency across diverse technologies and frameworks. Although recent advances in multimodal large language models (ML", "arxiv_id": "2509.25297", "doi": "10.48550/arXiv.2509.25297"}
+{"id": "not-everyone-wins-2025", "title": "Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications for AI Literacy in Programmatic Data Science", "authors": ["Qianou Ma", "Kenneth R. Koedinger", "Tongshuang Wu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.21890", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs promise to democratize technical work in complex domains like programmatic data analysis, but not everyone benefits equally. We study how students with varied experiences use LLMs to complete Pyt", "arxiv_id": "2509.21890", "doi": "10.1145/3772318.3791283"}
+{"id": "reshaping-higher-education-2025", "title": "Reshaping higher education in the unavoidable era of AI", "authors": ["Ahmet Baytak"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.31039/plic.2025.14.338", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We're living through an extraordinary time, a moment in history as significant as the Industrial Revolution or the dawn of the Information Age. Artificial Intelligence (AI) isn't just another tech tre", "doi": "10.31039/plic.2025.14.338"}
+{"id": "cuckoo-attack-stealthy-2025", "title": "Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE", "authors": ["Xinpeng Liu", "Junming Liu", "Peiyu Liu", "Han Zheng", "Qinying Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.15572", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern AI-powered Integrated Development Environments (AI-IDEs) are increasingly defined by an Agent-centric architecture, where an LLM-powered Agent is deeply integrated to autonomously execute compl", "arxiv_id": "2509.15572", "doi": "10.48550/arXiv.2509.15572"}
+{"id": "vibe-coding-product-2025", "title": "Vibe Coding for Product Design: Understanding Product Team Members'Perceptions of AI-Assisted Design and Development", "authors": ["Jie Li", "Youyang Hou", "Laura Lin", "Ruihao Zhu", "Hancheng Cao"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.10652", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI is reshaping product design practices through\"vibe coding\", where product team members express intent in natural language and AI translates it into functional prototypes and code. Despit", "arxiv_id": "2509.10652"}
+{"id": "revolution-hype-seeking-2025", "title": "Revolution or Hype? Seeking the Limits of Large Models in Hardware Design", "authors": ["Qiang Xu", "Leon Stok", "Rolf Drechsler", "Xi Wang", "Grace Li Zhang"], "year": 2025, "venue": "2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)", "source_url": "https://arxiv.org/abs/2509.04905", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent breakthroughs in Large Language Models (LLMs) and Large Circuit Models (LCMs) have sparked excitement across the electronic design automation (EDA) community, promising a revolution in circuit ", "arxiv_id": "2509.04905", "doi": "10.1109/ICCAD66269.2025.11240750"}
+{"id": "advancing-nursing-regulation-2025", "title": "Advancing Nursing Regulation in the Digital Era: Harnessing AI to Bridge Workforce Gaps and Strengthen Practice Competency and Safety", "authors": ["Elizabeth H. Zhong", "N. Spector", "Charlie O’Hara", "Nicole Livanos", "J. D. Castillo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.jnr.2025.08.015", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ,", "doi": "10.1016/j.jnr.2025.08.015"}
+{"id": "understanding-protecting-augmenting-2025", "title": "Understanding, Protecting, and Augmenting Human Cognition with Generative AI: A Synthesis of the CHI 2025 Tools for Thought Workshop", "authors": ["Lev Tankelevitch", "Elena L. Glassman", "Jessica He", "A. Kittur", "Mina Lee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.21036", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) radically expands the scope and capability of automation for work, education, and everyday tasks, a transformation posing both risks and opportunities for human cognition. How wi", "arxiv_id": "2508.21036", "doi": "10.48550/arXiv.2508.21036"}
+{"id": "future-software-reuse-2025", "title": "On the Future of Software Reuse in the Era of AI Native Software Engineering", "authors": ["A. Taivalsaari", "T. Mikkonen", "Cesare Pautasso"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.19834", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software development is currently under a paradigm shift in which artificial intelligence and generative software reuse are taking the center stage in software creation. Earlier opportunistic software", "arxiv_id": "2508.19834", "doi": "10.48550/arXiv.2508.19834"}
+{"id": "collaborating-genai-incentives-2025", "title": "Collaborating with GenAI: Incentives and Replacements", "authors": ["Boaz Taitler", "Omer Ben-Porat"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.20213", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rise of Generative AI (GenAI) is reshaping how workers contribute to shared projects. While workers can use GenAI to boost productivity or reduce effort, managers may use it to replace some worker", "arxiv_id": "2508.20213", "doi": "10.48550/arXiv.2508.20213"}
+{"id": "skate-scalable-tournament-2025", "title": "SKATE, a Scalable Tournament Eval: Weaker LLMs differentiate between stronger ones using verifiable challenges", "authors": ["Dewi Gould", "Bruno Mlodozeniec", "Samuel F. Brown"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.06111", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating the capabilities and risks of foundation models is paramount, yet current methods demand extensive domain expertise, hindering their scalability as these models rapidly evolve. We introduce", "arxiv_id": "2508.06111", "doi": "10.48550/arXiv.2508.06111"}
+{"id": "maybe-we-need-2025", "title": "\"Maybe We Need Some More Examples:\" Individual and Team Drivers of Developer GenAI Tool Use", "authors": ["Courtney Miller", "Rudrajit Choudhuri", "Mara Ulloa", "Sankeerti Haniyur", "Robert DeLine"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.21280", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the widespread availability of generative AI tools in software engineering, developer adoption remains uneven. This unevenness is problematic because it hampers productivity efforts, frustrate", "arxiv_id": "2507.21280", "doi": "10.48550/arXiv.2507.21280"}
+{"id": "automation-ai-intergenerational-2025", "title": "Automation, AI, and the Intergenerational Transmission of Knowledge", "authors": ["Enrique Ide"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2507.16078", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in Artificial Intelligence (AI) have sparked expectations of unprecedented economic growth. Yet, by enabling senior workers to accomplish more tasks independently, AI may reduce entry-", "arxiv_id": "2507.16078"}
+{"id": "code-me-me-2025", "title": "Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows", "authors": ["Valerie Chen", "Ameet Talwalkar", "Robert Brennan", "Graham Neubig"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.08149", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Developers now have access to a growing array of increasingly autonomous AI tools for software development. While many studies examine copilots that provide chat assistance or code completions, evalua", "arxiv_id": "2507.08149", "doi": "10.48550/arXiv.2507.08149"}
+{"id": "dynamic-memory-management-2025", "title": "Dynamic Memory Management on GPUs with SYCL", "authors": ["Russell K. Standish"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.18211", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Dynamic memory allocation is not traditionally available in kernels running on GPUs. This work aims to build on Ouroboros, an efficient dynamic memory management library for CUDA applications, by port", "arxiv_id": "2504.18211", "doi": "10.48550/arXiv.2504.18211"}
+{"id": "predictable-artificial-intelligence-2023", "title": "Predictable Artificial Intelligence", "authors": ["Lexin Zhou", "Pablo A. Moreno-Casares", "Fernando Mart'inez-Plumed", "John Burden", "Ryan Burnell"], "year": 2023, "venue": "Artificial Intelligence", "source_url": "https://arxiv.org/abs/2310.06167", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce the fundamental ideas and challenges of Predictable AI, a nascent research area that explores the ways in which we can anticipate key validity indicators (e.g., performance, safety) of pr", "arxiv_id": "2310.06167", "doi": "10.48550/arXiv.2310.06167"}
+{"id": "speed-at-cost-2025", "title": "Speed at the Cost of Quality? The Impact of LLM Agent Assistance on Software Development", "authors": ["Hao He", "Courtney Miller", "Shyam Agarwal", "Christian Kästner", "Bogdan Vasilescu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2511.04427", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2511.04427"}
+{"id": "not-everyone-wins-2025-2", "title": "Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications in AI-assisted Data Analysis", "authors": ["Qianou Ma", "Kenneth R. Koedinger", "Tongshuang Wu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2509.21890", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2509.21890"}
+{"id": "paperbench-evaluating-ais-2025", "title": "PaperBench: Evaluating AI's Ability to Replicate AI Research", "authors": ["Giulio Starace", "Oliver Jaffe", "Dane Sherburn", "James Aung", "Jun Shern Chan"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2504.01848", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research. Agents must replicate 20 ICML 2024 Spotlight and Oral papers from scratch, including", "arxiv_id": "2504.01848", "doi": "10.48550/arXiv.2504.01848"}
+{"id": "measuring-ai-ability-2025", "title": "Measuring AI Ability to Complete Long Software Tasks", "authors": ["Thomas Kwa", "Ben West", "Joel Becker", "Amy Deng", "Katharyn Garcia"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.14499", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear. To quantify the capabilities of AI systems in terms of human capabilities, we propose a new me", "arxiv_id": "2503.14499"}
+{"id": "gate-integrated-assessment-2025", "title": "GATE: An Integrated Assessment Model for AI Automation", "authors": ["Ege Erdil", "Andrei V. Potlogea", "T. Besiroglu", "Edu Roldan", "Anson Ho"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.04941", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Assessing the economic impacts of artificial intelligence requires integrating insights from both computer science and economics. We present the Growth and AI Transition Endogenous model (GATE), a dyn", "arxiv_id": "2503.04941"}
+{"id": "swelancer-can-frontier-2025", "title": "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?", "authors": ["Samuel Miserendino", "Michele Wang", "Tejal Patwardhan", "Johannes Heidecke"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.12115", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce SWE-Lancer, a benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at \\$1 million USD total in real-world payouts. SWE-Lancer encompasses both independent engi", "arxiv_id": "2502.12115", "doi": "10.48550/arXiv.2502.12115"}
+{"id": "codeelo-benchmarking-competitionlevel-2025", "title": "CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings", "authors": ["Shanghaoran Quan", "Jiaxin Yang", "Bowen Yu", "Bo Zheng", "Dayiheng Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.01257", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the increasing code reasoning capabilities of existing large language models (LLMs) and breakthroughs in reasoning models like OpenAI o1 and o3, there is a growing need to develop more challengin", "arxiv_id": "2501.01257", "doi": "10.48550/arXiv.2501.01257"}
+{"id": "rebench-evaluating-frontier-2024", "title": "RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts", "authors": ["Hjalmar Wijk", "T. Lin", "Joel Becker", "Sami Jawhar", "Neev Parikh"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.15114", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Frontier AI safety policies highlight automation of AI research and development (R&D) by AI agents as an important capability to anticipate. However, there exist few evaluations for AI R&D capabilitie", "arxiv_id": "2411.15114", "doi": "10.48550/arXiv.2411.15114"}
+{"id": "how-much-does-2024", "title": "How Much Does AI Impact Development Speed? an Enterprise-Based Randomized Controlled Trial", "authors": ["Elise Paradis", "Kate Grey", "Quinn Madison", "Daye Nam", "Andrew Macvean"], "year": 2024, "venue": "2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)", "source_url": "https://arxiv.org/abs/2410.12944", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: How much does AI assistance impact developer productivity? To date, the software engineering literature has provided a range of answers, targeting a diversity of outcomes: from perceived productivity ", "arxiv_id": "2410.12944", "doi": "10.1109/ICSE-SEIP66354.2025.00060"}
+{"id": "impact-large-language-2024", "title": "The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot", "authors": ["Doron Yeverechyahu", "Raveesh Mayya", "Gal Oestreicher-Singer"], "year": 2024, "venue": "International Conference on Interaction Sciences", "source_url": "https://arxiv.org/abs/2409.08379", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have been shown to enhance individual productivity in guided settings. Whereas LLMs are likely to also transform innovation processes in a collaborative work setting, it i", "arxiv_id": "2409.08379", "doi": "10.2139/ssrn.4684662"}
+{"id": "significant-productivity-gains-2024", "title": "Significant Productivity Gains through Programming with Large Language Models", "authors": ["Thomas Weber", "Maximilian Brandmaier", "Albrecht Schmidt", "Sven Mayer"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3661145", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models like GPT and Codex drastically alter many daily tasks, including programming, where they can rapidly generate code from natural language or informal specifications. Thus, they wi", "doi": "10.1145/3661145"}
+{"id": "lessons-from-trenches-2024", "title": "Lessons from the Trenches on Reproducible Evaluation of Language Models", "authors": ["Stella Biderman", "Hailey Schoelkopf", "Lintang Sutawika", "Leo Gao", "J. Tow"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.14782", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of prop", "arxiv_id": "2405.14782", "doi": "10.48550/arXiv.2405.14782"}
+{"id": "gpqa-graduatelevel-googleproof-2023", "title": "GPQA: A Graduate-Level Google-Proof Q&A Benchmark", "authors": ["David Rein", "Betty Li Hou", "Asa Cooper Stickland", "Jackson Petty", "Richard Yuanzhe Pang"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.12022", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely diffic", "arxiv_id": "2311.12022"}
+{"id": "explosive-growth-from-2023", "title": "Explosive growth from AI automation: A review of the arguments", "authors": ["Ege Erdil", "T. Besiroglu"], "year": 2023, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2309.11690", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We examine whether substantial AI automation could accelerate global economic growth by about an order of magnitude, akin to the economic growth effects of the Industrial Revolution. We identify three", "arxiv_id": "2309.11690"}
+{"id": "experimental-evidence-productivity-2023", "title": "Experimental evidence on the productivity effects of generative artificial intelligence", "authors": ["Shakked Noy", "Whitney Zhang"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1126/science.adh2586", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We examined the productivity effects of a generative artificial intelligence (AI) technology, the assistive chatbot ChatGPT, in the context of midlevel professional writing tasks. In a preregistered o", "doi": "10.1126/science.adh2586"}
+{"id": "lost-middle-how-2023", "title": "Lost in the Middle: How Language Models Use Long Contexts", "authors": ["Nelson F. Liu", "Kevin Lin", "John Hewitt", "Ashwin Paranjape", "Michele Bevilacqua"], "year": 2023, "venue": "Transactions of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2307.03172", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two ta", "arxiv_id": "2307.03172", "doi": "10.1162/tacl_a_00638"}
+{"id": "generative-ai-at-2023", "title": "Generative AI at Work", "authors": ["Erik Brynjolfsson", "Danielle Li", "Lindsey Raymond"], "year": 2023, "venue": "Social Science Research Network", "source_url": "https://arxiv.org/abs/2304.11771", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \n We study the staggered introduction of a generative AI–based conversational assistant using data from 5,172 customer-support agents. Access to AI assistance increases worker productivity, as measure", "arxiv_id": "2304.11771", "doi": "10.3386/w31161"}
+{"id": "benchmarks-automated-commonsense-2023", "title": "Benchmarks for Automated Commonsense Reasoning: A Survey", "authors": ["E. Davis"], "year": 2023, "venue": "ACM Computing Surveys", "source_url": "https://arxiv.org/abs/2302.04752", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often ", "arxiv_id": "2302.04752", "doi": "10.1145/3615355"}
+{"id": "selfconsistency-improves-chain-2022", "title": "Self-Consistency Improves Chain of Thought Reasoning in Language Models", "authors": ["Xuezhi Wang", "Jason Wei", "D. Schuurmans", "Quoc Le", "Ed H. Chi"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2203.11171", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consiste", "arxiv_id": "2203.11171"}
+{"id": "building-early-warning-2024", "title": "Building an early warning system for LLM-aided biological threat creation", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "ai-assistance-legal-2023", "title": "AI Assistance in Legal Analysis: An Empirical Study", "authors": ["Jonathan H. Choi", "D. Schwarcz"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.2139/ssrn.4539836", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.2139/ssrn.4539836"}
+{"id": "anatomy-capability-emergence-2026", "title": "Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks", "authors": ["Jayadev Billa"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.15997", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K--85M parameters), 120 task$\\times$level$\\times$ mode", "arxiv_id": "2602.15997"}
+{"id": "transformer-we-trust-2026", "title": "In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes", "authors": ["Trishit Mondal", "Ameya D. Jagtap"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.14318", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Transformer architectures have revolutionized machine learning across a wide range of domains, from natural language processing to scientific computing. However, their growing deployment in high-stake", "arxiv_id": "2602.14318"}
+{"id": "outofcontext-outofscope-manipulating-2026", "title": "Out-of-context and out-of-scope: Manipulating large language models through minimal instruction set modifications", "authors": ["Monty-Maximilian Zühlke", "Daniel Kudenko", "Wolfgang Nejdl"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1371/journal.pone.0341558", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Understanding the emergence of reasoning capabilities in large language models (LLMs) is important for aligning their response behaviour with human intentions, especially as these models become access", "doi": "10.1371/journal.pone.0341558"}
+{"id": "core-comprehensive-ontological-2026", "title": "CORE: Comprehensive Ontological Relation Evaluation for Large Language Models", "authors": ["Satyam Dwivedi", "Sanjukta Ghosh", "S. Dwivedi", "N. Kumari", "A. Thakur"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06446", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) perform well on many reasoning benchmarks, yet existing evaluations rarely assess their ability to distinguish between meaningful semantic relations and genuine unrelatedn", "arxiv_id": "2602.06446"}
+{"id": "optimal-scaling-laws-2026", "title": "Optimal scaling laws in learning hierarchical multi-index models", "authors": ["Leonardo Defilippis", "Florent Krzakala", "Bruno Loureiro", "A. Maillard"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05846", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive ex", "arxiv_id": "2602.05846"}
+{"id": "verification-implicit-world-2026", "title": "Verification of the Implicit World Model in a Generative Model via Adversarial Sequences", "authors": ["Andr'as Balogh", "M'ark Jelasity"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05903", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative sequence models are typically trained on sample sequences from natural or formal languages. It is a crucial question whether -- or to what extent -- sample-based training is able to capture", "arxiv_id": "2602.05903"}
+{"id": "mose-mixture-slimmable-2026", "title": "MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models", "authors": ["Nurbek Tastan", "Stefanos Laskaridis", "Karthik Nandakumar", "Samuel Horváth"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06154", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between accuracy and ", "arxiv_id": "2602.06154"}
+{"id": "limits-layer-pruning-2026", "title": "On the Limits of Layer Pruning for Generative Reasoning in LLMs", "authors": ["Safal Shrestha", "Anubhav Shrestha", "Aadim Nepal", "Minwu Kim", "Keith Ross"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.01997", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent works have shown that layer pruning can compress large language models (LLMs) while retaining strong performance on classification benchmarks with little or no finetuning. However, existing pru", "arxiv_id": "2602.01997"}
+{"id": "position-explaining-behavioral-2026", "title": "Position: Explaining Behavioral Shifts in Large Language Models Requires a Comparative Approach", "authors": ["Martino Ciaperoni", "Marzio Di Vece", "Luca Pappalardo", "Fosca Giannotti", "Francesco Giannini"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02304", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large-scale foundation models exhibit behavioral shifts: intervention-induced behavioral changes that appear after scaling, fine-tuning, reinforcement learning or in-context learning. While investigat", "arxiv_id": "2602.02304"}
+{"id": "audit-trails-accountability-2026", "title": "Audit Trails for Accountability in Large Language Models", "authors": ["Victor Ojewale", "Harini Suresh", "Suresh Venkatasubramanian"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.20727", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly embedded in consequential decisions across healthcare, finance, employment, and public services. Yet accountability remains fragile because process transp", "arxiv_id": "2601.20727", "doi": "10.48550/arXiv.2601.20727"}
+{"id": "neural-neural-scaling-2026", "title": "Neural Neural Scaling Laws", "authors": ["Michael Hu", "Jane Pan", "Ayush Rajesh Jhaveri", "Nicholas Lourie", "Kyunghyun Cho"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19831", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural scaling laws predict how language model performance improves with increased compute. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks", "arxiv_id": "2601.19831", "doi": "10.48550/arXiv.2601.19831"}
+{"id": "viscosity-logic-phase-2026", "title": "The Viscosity of Logic: Phase Transitions and Hysteresis in DPO Alignment", "authors": ["Marco Pollanen"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17260", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Direct Preference Optimization (DPO) is often tuned as if increasing alignment pressure (controlled by $\\beta$) yields progressively\"better\"behavior. We instead treat $\\beta$ as a control parameter an", "arxiv_id": "2601.17260", "doi": "10.48550/arXiv.2601.17260"}
+{"id": "geometry-thought-how-2026", "title": "The Geometry of Thought: How Scale Restructures Reasoning In Large Language Models", "authors": ["S. Anderson"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.13358", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scale does not uniformly improve reasoning - it restructures it. Analyzing 25,000+ chain-of-thought trajectories across four domains (Law, Science, Code, Math) and two scales (8B, 70B parameters), we ", "arxiv_id": "2601.13358", "doi": "10.48550/arXiv.2601.13358"}
+{"id": "why-ai-alignment-2026", "title": "Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock", "authors": ["Didier Sornette", "S. Lera", "Ke Wu"], "year": 2026, "venue": "Robotics", "source_url": "https://arxiv.org/abs/2601.08673", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue ", "arxiv_id": "2601.08673", "doi": "10.70777/si.v2i4.17163"}
+{"id": "proof-time-benchmark-2026", "title": "Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments", "authors": ["Bingyang Ye", "Shan Chen", "Jingxuan Tu", "Chen Liu", "Zidi Xiong"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.07606", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models are increasingly being used to assess and forecast research ideas, yet we lack scalable ways to evaluate the quality of models'judgments about these scientific ideas. Towards thi", "arxiv_id": "2601.07606", "doi": "10.48550/arXiv.2601.07606"}
+{"id": "auditing-fairness-under-2026", "title": "Auditing Fairness under Model Updates: Fundamental Complexity and Property-Preserving Updates", "authors": ["Ayoub Ajarra", "Debabrota Basu"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.05909", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As machine learning models become increasingly embedded in societal infrastructure, auditing them for bias is of growing importance. However, in real-world deployments, auditing is complicated by the ", "arxiv_id": "2601.05909", "doi": "10.48550/arXiv.2601.05909"}
+{"id": "when-singleagent-skills-2026", "title": "When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail", "authors": ["Xiaoxiao Li"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.04748", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent AI systems have proven effective for complex reasoning. These systems are compounded by specialized agents, which collaborate through explicit communication, but incur substantial computat", "arxiv_id": "2601.04748", "doi": "10.48550/arXiv.2601.04748"}
+{"id": "illusion-insight-reasoning-2026", "title": "The Illusion of Insight in Reasoning Models", "authors": ["L. d'Aliberti", "M. Ribeiro"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00514", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Do reasoning models have\"Aha!\"moments? Prior work suggests that models like DeepSeek-R1-Zero undergo sudden mid-trace realizations that lead to accurate outputs, implying an intrinsic capacity for sel", "arxiv_id": "2601.00514", "doi": "10.48550/arXiv.2601.00514"}
+{"id": "scientific-foundation-models-2026", "title": "On scientific foundation models: Rigorous definitions, key applications, and a comprehensive survey.", "authors": ["S. S. Menon", "Trishit Mondal", "Shuvayan Brahmachary", "A. Panda", "S. Joshi"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.neunet.2026.108567", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scientific Foundation Models (SciFMs) represent a transformative paradigm for addressing complex scientific and engineering problems by leveraging large-scale pretraining and deep learning architectur", "doi": "10.1016/j.neunet.2026.108567"}
+{"id": "nested-learning-illusion-2025", "title": "Nested Learning: The Illusion of Deep Learning Architectures", "authors": ["Ali Behrouz", "Meisam Razaviyayn", "Peilin Zhong", "V. Mirrokni"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.24695", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, self-improve,", "arxiv_id": "2512.24695", "doi": "10.48550/arXiv.2512.24695"}
+{"id": "crossplatform-evaluation-large-2025", "title": "Cross-Platform Evaluation of Large Language Model Safety in Pediatric Consultations: Evolution of Adversarial Robustness and the Scale Paradox", "authors": ["Vahideh Zolfaghari"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.09721", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background Large language models (LLMs) are increasingly deployed in medical consultations, yet their safety under realistic user pressures remains understudied. Prior assessments focused on neutral c", "arxiv_id": "2601.09721", "doi": "10.48550/arXiv.2601.09721"}
+{"id": "teaching-critiquing-conceptualization-2025", "title": "Teaching and Critiquing Conceptualization and Operationalization in NLP", "authors": ["Vagrant Gautam"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.18505", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: NLP researchers regularly invoke abstract concepts like\"interpretability,\"\"bias,\"\"reasoning,\"and\"stereotypes,\"without defining them. Each subfield has a shared understanding or conceptualization of wh", "arxiv_id": "2512.18505", "doi": "10.48550/arXiv.2512.18505"}
+{"id": "natural-born-intelligence-2025", "title": "Natural born intelligence manifesto: Illustrating the dynamic perspective for consciousness", "authors": ["Y. Gunji"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.biosystems.2025.105677", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.biosystems.2025.105677"}
+{"id": "scaling-laws-code-2025", "title": "Scaling Laws for Code: Every Programming Language Matters", "authors": ["Jian Yang", "Shawn Guo", "Lin Jing", "Wei Zhang", "Aishan Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.13472", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code large language models (Code LLMs) are powerful but costly to train, with scaling laws predicting performance from model size, data, and compute. However, different programming languages (PLs) hav", "arxiv_id": "2512.13472", "doi": "10.48550/arXiv.2512.13472"}
+{"id": "when-observer-becomes-2025", "title": "When the observer becomes the observed: A critical alternative to environment sensing through embodied engagement with a generative system", "authors": ["Zhen Wu", "Xiaomin Fan", "Mika Shirahama", "Tristan Braud"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3757369.3767610", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Traditional sensors function as exact measuring instruments to represent the physical world. Conversely, humans are subjective, inaccurate sensors, whose measured quantities are often influenced by th", "doi": "10.1145/3757369.3767610"}
+{"id": "curriculum-guided-massive-2025", "title": "Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks", "authors": ["Indrajit Kar", "Kalathur Chenchu Kishore Kumar"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08545", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models and multi-agent systems have shown promise in decomposing complex tasks, yet they struggle with long-horizon reasoning tasks and escalating computation cost. This work introduces", "arxiv_id": "2512.08545", "doi": "10.48550/arXiv.2512.08545"}
+{"id": "singleagent-scaling-fails-2025", "title": "Single-Agent Scaling Fails Multi-Agent Intelligence: Towards Foundation Models with Native Multi-Agent Intelligence", "authors": ["Shuyue Hu", "Hao Yan", "Yiqun Zhang", "Yang Chen", "Dongzhan Zhou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08743", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Foundation models (FMs) are increasingly assuming the role of the''brain''of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or in", "arxiv_id": "2512.08743", "doi": "10.48550/arXiv.2512.08743"}
+{"id": "sequential-enumeration-large-2025", "title": "Sequential Enumeration in Large Language Models", "authors": ["Kuinan Hou", "Marco Zorzi", "Alberto Testolin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.04727", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily handled by r", "arxiv_id": "2512.04727", "doi": "10.48550/arXiv.2512.04727"}
+{"id": "ais-environmental-cost-2025", "title": "AI’s Environmental Cost: Comparing Resource Consumption Between SLMs and LLMs Across Queries", "authors": ["Aryaanshi Sundaram", "Sparsh Kamdar", "Shreyas Kumar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.34190/icair.5.1.4345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As artificial intelligence becomes increasingly embedded in daily life, the environmental costs of its deployment remain underexplored. This study investigates the environmental footprint of both larg", "doi": "10.34190/icair.5.1.4345"}
+{"id": "claude-does-permanent-2025", "title": "Claude Does Permanent Scatterer Interferometric Synthetic Aperture Radar: A tutorial for the use of large language models in replication studies", "authors": ["Timo Balz", "Shuyi Yao", "Chamini Mirandu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MGRS.2025.3574703", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this tutorial, the application of large language models (LLMs) in replicating complex remote sensing algorithms is explained, using permanent scatterer interferometric synthetic aperture radar (PSI", "doi": "10.1109/MGRS.2025.3574703"}
+{"id": "instruction-tuning-large-2025", "title": "Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day", "authors": ["Milad Abdollahzadeh", "Abdul Raheem", "Zilong Zhao", "Uzair Javaid", "Kevin Yee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.23220", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Tabular instruction tuning has emerged as a promising research direction for improving LLMs understanding of tabular data. However, the majority of existing works only consider question-answering and ", "arxiv_id": "2511.23220", "doi": "10.48550/arXiv.2511.23220"}
+{"id": "adaptive-twolayer-inspection-2025", "title": "Adaptive Two-Layer Inspection Framework for Mitigating Security Risks in Large-Scale Vertical Domain Language Models", "authors": ["Wei Liang", "Zhengkai Guo", "Junqiang Li", "Xiaocui Li", "Junfeng Yang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ISPCE-ASIA69076.2025.11312923", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are trained on large amounts of data with diverse and imbalance distribution, therein leading to capability bottlenecks. In vertical domains, incremental domain-specific p", "doi": "10.1109/ISPCE-ASIA69076.2025.11312923"}
+{"id": "detection-llm-deceptive-2025", "title": "Detection of LLM Deceptive Behaviour Triggered by the Poisonous Context Injection: The Problem Demonstration", "authors": ["Stanislav Selitskiy", "C. Inoue"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FLLM67465.2025.11391110", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a focused demonstration of deceptive behaviour in Large Language Models (LLMs) arising under poisonous context injection. The case study is constructed around a Japanese haiku, sel", "doi": "10.1109/FLLM67465.2025.11391110"}
+{"id": "realist-pluralist-conceptions-2025", "title": "Realist and Pluralist Conceptions of Intelligence and Their Implications on AI Research", "authors": ["Ninell Oldenburg", "Ruchira Dhar", "Anders Søgaard"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15282", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we argue that current AI research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a s", "arxiv_id": "2511.15282", "doi": "10.48550/arXiv.2511.15282"}
+{"id": "beyond-mimicry-preference-2025", "title": "Beyond Mimicry: Preference Coherence in LLMs", "authors": ["Luhan Mikaelson", "Derek Shiller", "Hayley Clatterbuck"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2511.13630", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs involving GPU reduction, capability restrictions, shutdown, dele", "arxiv_id": "2511.13630"}
+{"id": "evidence-phase-transitions-2025", "title": "Evidence of Phase Transitions in Small Transformer-Based Language Models", "authors": ["Noah Hong", "Tao Hong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.12768", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Phase transitions have been proposed as the origin of emergent abilities in large language models (LLMs), where new capabilities appear abruptly once models surpass critical thresholds of scale. Prior", "arxiv_id": "2511.12768", "doi": "10.48550/arXiv.2511.12768"}
+{"id": "information-capacity-evaluating-2025", "title": "Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression", "authors": ["Cheng Yuan", "Jiawei Shao", "Chi Zhang", "Xuelong Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.08066", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent years have witnessed the rapid advancements of large language models (LLMs) and their expanding applications, leading to soaring demands for computational resources. The widespread adoption of ", "arxiv_id": "2511.08066", "doi": "10.48550/arXiv.2511.08066"}
+{"id": "importanceaware-data-selection-2025", "title": "Importance-Aware Data Selection for Efficient LLM Instruction Tuning", "authors": ["Tingyu Jiang", "Shen Li", "Yiyao Song", "Lan Zhang", "Hualei Zhu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.07074", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Instruction tuning plays a critical role in enhancing the performance and efficiency of Large Language Models (LLMs). Its success depends not only on the quality of the instruction data but also on th", "arxiv_id": "2511.07074", "doi": "10.48550/arXiv.2511.07074"}
+{"id": "pedagogicallydriven-prompt-engineering-2025", "title": "Pedagogically-Driven Prompt Engineering Towards Developing Mathematical Literacy of Russian Students", "authors": ["R. Zaripova", "Andrew V. Danilov", "L. L. Salekhova", "T. R. Fazliakhmetov", "M. A. Lukoyanova"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/DeSE68208.2025.11367925", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The study explores a pedagogically driven prompting strategy for generating AI-assisted tasks to enhance mathematical literacy of Russian 5th-grade students. Grounded in international frameworks (NCTM", "doi": "10.1109/DeSE68208.2025.11367925"}
+{"id": "optimal-attention-temperature-2025", "title": "Optimal Attention Temperature Enhances In-Context Learning under Distribution Shift", "authors": ["Samet Demir", "Zafer Dogan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01292", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pretrained Transformers excel at in-context learning (ICL), inferring new tasks from only a handful of examples. Yet, their ICL performance can degrade sharply under distribution shift between pretrai", "arxiv_id": "2511.01292", "doi": "10.48550/arXiv.2511.01292"}
+{"id": "measuring-what-matters-2025", "title": "Measuring what Matters: Construct Validity in Large Language Model Benchmarks", "authors": ["Andrew M. Bean", "R. Kearns", "Angelika Romanou", "Franziska Sofia Hafner", "Harry Mayne"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.04703", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment. Reliably measuring abstract and complex phenom", "arxiv_id": "2511.04703", "doi": "10.48550/arXiv.2511.04703"}
+{"id": "rethinking-knowledge-distillation-2025", "title": "Rethinking Knowledge Distillation in Collaborative Machine Learning: Memory, Knowledge, and Their Interactions", "authors": ["Pengchao Han", "Xi Huang", "Yi Fang", "Guojun Han"], "year": 2025, "venue": "IEEE Transactions on Network Science and Engineering", "source_url": "https://arxiv.org/abs/2512.19972", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Collaborative learning has emerged as a key paradigm in large-scale intelligent systems, enabling distributed agents to cooperatively train their models while addressing their privacy concerns. Centra", "arxiv_id": "2512.19972", "doi": "10.1109/TNSE.2025.3572362"}
+{"id": "classit-conversational-lecturealigned-2025", "title": "CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs", "authors": ["Luca Capone", "Alessandro Bondielli", "Alessandro Lenci"], "year": 2025, "venue": "Proceedings of the First BabyLM Workshop", "source_url": "https://arxiv.org/abs/2510.25364", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work investigates whether small-scale LMs can benefit from instruction tuning. We compare conversational and question-answering instruction tuning datasets, applied either in a merged or sequenti", "arxiv_id": "2510.25364", "doi": "10.18653/v1/2025.babylm-main.30"}
+{"id": "how-data-mixing-2025", "title": "How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs", "authors": ["Samet Demir", "Zafer Dogan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.25753", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pretrained Transformers demonstrate remarkable in-context learning (ICL) capabilities, enabling them to adapt to new tasks from demonstrations without parameter updates. However, theoretical studies o", "arxiv_id": "2510.25753", "doi": "10.48550/arXiv.2510.25753"}
+{"id": "improved-autoregressive-evaluation-2025", "title": "An Improved Autoregressive Evaluation Paradigm for Large Language Models", "authors": ["Jipeng Zhang", "Rui Pan", "Yuzheng Hu", "Kashun Shum", "Guanyu Yao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3763000", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The AI community has witnessed the emergence of various chat-style Large Language Models (LLMs) since the advent of ChatGPT. Despite significant progress in this area, evaluating these models remains ", "doi": "10.1145/3763000"}
+{"id": "relative-scaling-laws-2025", "title": "Relative Scaling Laws for LLMs", "authors": ["William B. Held", "D. Hall", "Percy Liang", "Diyi Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.24626", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling laws describe how language models improve with additional data, parameters, and compute. While widely used, they are typically measured on aggregate test sets. Aggregate evaluations yield clea", "arxiv_id": "2510.24626", "doi": "10.48550/arXiv.2510.24626"}
+{"id": "disaggregation-reveals-hidden-2025", "title": "Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction", "authors": ["James A. Michaelov", "Catherine Arnett"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.24934", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models generally produce grammatical text, but they are more likely to make errors in certain contexts. Drawing on paradigms from psycholinguistics, we carry out a fine-grained analysis of th", "arxiv_id": "2510.24934", "doi": "10.48550/arXiv.2510.24934"}
+{"id": "language-model-behavioral-2025", "title": "Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale", "authors": ["James A. Michaelov", "Roger P. Levy", "Benjamin Bergen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.24963", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We show that across architecture (Transformer vs. Mamba vs. RWKV), training dataset (OpenWebText vs. The Pile), and scale (14 million parameters to 12 billion parameters), autoregressive language mode", "arxiv_id": "2510.24963", "doi": "10.48550/arXiv.2510.24963"}
+{"id": "benchmarking-epistemology-construct-2025", "title": "The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models", "authors": ["Timo Freiesleben", "Sebastian Zezulka"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23191", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasi", "arxiv_id": "2510.23191", "doi": "10.48550/arXiv.2510.23191"}
+{"id": "relativebased-scaling-law-2025", "title": "Relative-Based Scaling Law for Neural Language Models", "authors": ["Baoqing Yue", "J. Zhou", "Zixi Wei", "Jingtao Zhan", "Qingyao Ai"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.20387", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling laws aim to accurately predict model performance across different scales. Existing scaling-law studies almost exclusively rely on cross-entropy as the evaluation metric. However, cross-entropy", "arxiv_id": "2510.20387", "doi": "10.48550/arXiv.2510.20387"}
+{"id": "capability-ceilings-autoregressive-2025", "title": "Capability Ceilings in Autoregressive Language Models: Empirical Evidence from Knowledge-Intensive Tasks", "authors": ["J. Mar'in"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.21866", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We document empirical capability ceilings in decoder-only autoregressive language models across knowledge-intensive tasks. Systematic evaluation of OPT and Pythia model families (70M-30B parameters, s", "arxiv_id": "2510.21866", "doi": "10.48550/arXiv.2510.21866"}
+{"id": "do-prompts-reshape-2025", "title": "Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings", "authors": ["Cesar Gonzalez-Gutierrez", "Dirk Hovy"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19694", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompting is a common approach for leveraging LMs in zero-shot settings. However, the underlying mechanisms that enable LMs to perform diverse tasks without task-specific supervision remain poorly und", "arxiv_id": "2510.19694", "doi": "10.48550/arXiv.2510.19694"}
+{"id": "evaluating-llm-reasoning-2025", "title": "Evaluating LLM Reasoning Beyond Correctness and CoT", "authors": ["Soheil Abbasloo"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.18134", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: What does it truly mean for a language model to\"reason\"? Current evaluations reward models'correct standalone answers-but correctness alone reveals little about the process that produced them. We argu", "arxiv_id": "2510.18134"}
+{"id": "unicode-augmenting-evaluation-2025", "title": "UniCode: Augmenting Evaluation for Code Reasoning", "authors": ["Xinyue Zheng", "Haowei Lin", "Shaofei Cai", "Zilong Zheng", "Yitao Liang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.17868", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current coding benchmarks often inflate Large Language Model (LLM) capabilities due to static paradigms and data contamination, enabling models to exploit statistical shortcuts rather than genuine rea", "arxiv_id": "2510.17868"}
+{"id": "mechanistic-emergence-symbol-2025", "title": "The Mechanistic Emergence of Symbol Grounding in Language Models", "authors": ["Shuyu Wu", "Ziqiao Ma", "Xiaoxi Luo", "Yidong Huang", "Josue Torres-Fonseca"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.13796", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that groundi", "arxiv_id": "2510.13796", "doi": "10.48550/arXiv.2510.13796"}
+{"id": "position-require-frontier-2025", "title": "Position: Require Frontier AI Labs To Release Small \"Analog\" Models", "authors": ["Shriyash Upadhyay", "Chaithanya Bandi", "Narmeen Oozeer", "Philip Quirke"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.14053", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent proposals for regulating frontier AI models have sparked concerns about the cost of safety regulation, and most such regulations have been shelved due to the safety-innovation tradeoff. This pa", "arxiv_id": "2510.14053", "doi": "10.48550/arXiv.2510.14053"}
+{"id": "kormo-korean-open-2025", "title": "KORMo: Korean Open Reasoning Model for Everyone", "authors": ["Minjun Kim", "HyeonSeok Lim", "Hangyeol Yoo", "Inho Won", "Seungwoo Song"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.09426", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work presents the first large-scale investigation into constructing a fully open bilingual large language model (LLM) for a non-English language, specifically Korean, trained predominantly on syn", "arxiv_id": "2510.09426", "doi": "10.48550/arXiv.2510.09426"}
+{"id": "testtime-matching-unlocking-2025", "title": "Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models", "authors": ["Yinglun Zhu", "Jiancheng Zhang", "Fuzhi Tang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.07632", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Frontier AI models have achieved remarkable progress, yet recent studies suggest they struggle with compositional reasoning, often performing at or below random chance on established benchmarks. We re", "arxiv_id": "2510.07632", "doi": "10.48550/arXiv.2510.07632"}
+{"id": "ainstein-assessing-feasibility-2025", "title": "AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems", "authors": ["Shambhavi Mishra", "Gaurav Sahu", "Marco Pedersoli", "Laurent Charlin", "J. Dolz"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.05432", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) demonstrate impressive capabilities across a wide range of tasks, yet it remains unclear whether such success reflects genuine reasoning or sophisticated recall. We introd", "arxiv_id": "2510.05432", "doi": "10.48550/arXiv.2510.05432"}
+{"id": "generative-ai-computational-2025", "title": "Generative AI for computational chemistry: A roadmap to predicting emergent phenomena", "authors": ["P. Tiwary", "Lukas Herron", "Richard John", "Suemin Lee", "Disha Sanwal"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1073/pnas.2415655121", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent surge in generative AI has introduced exciting possibilities for computational chemistry. Generative AI methods have made significant progress in sampling molecular structures across chemic", "doi": "10.1073/pnas.2415655121"}
+{"id": "how-get-enriched-2025", "title": "How to Get Enriched Metadata? A Multi‐modal Model Fusion Strategy for Automatic Metadata Enhancement in GLAM Art Collections", "authors": ["Zhihan Sun", "Chengxi Yan", "Yurong Zeng"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/pra2.1345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Cultural heritage resource metadata is the foundation and precious asset for GLAM institutions to provide knowledge services which enables users to efficiently search relevant collection information. ", "doi": "10.1002/pra2.1345"}
+{"id": "singlehead-attention-high-2025", "title": "Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws", "authors": ["Fabrizio Boncoraglio", "Vittorio Erba", "Emanuele Troiani", "Florent Krzakala", "Lenka Zdeborov'a"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.24914", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Trained attention layers exhibit striking and reproducible spectral structure of the weights, including low-rank collapse, bulk deformation, and isolated spectral outliers, yet the origin of these phe", "arxiv_id": "2509.24914"}
+{"id": "evaluating-robustness-chinchilla-2025", "title": "Evaluating the Robustness of Chinchilla Compute-Optimal Scaling", "authors": ["Rylan Schaeffer", "Noam Levi", "Andreas Kirsch", "Theo Guenais", "Brando Miranda"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23963", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hoffman et al (2022)'s Chinchilla paper introduced the principle of compute-optimal scaling, laying a foundation for future scaling of language models. In the years since, however, valid concerns abou", "arxiv_id": "2509.23963", "doi": "10.48550/arXiv.2509.23963"}
+{"id": "pretraining-scaling-laws-2025", "title": "Pretraining Scaling Laws for Generative Evaluations of Language Models", "authors": ["Rylan Schaeffer", "Noam Levi", "Brando Miranda", "Oluwasanmi Koyejo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.24012", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural scaling laws have driven the field's ever-expanding exponential growth in parameters, data and compute. While scaling behaviors for pretraining losses and discriminative benchmarks are well est", "arxiv_id": "2509.24012", "doi": "10.48550/arXiv.2509.24012"}
+{"id": "review-hallucination-understanding-2025", "title": "Review of Hallucination Understanding in Large Language and Vision Models", "authors": ["Z. Ho", "Siyuan Liang", "Dacheng Tao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.00034", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The widespread adoption of large language and vision models in real-world applications has made urgent the need to address hallucinations -- instances where models produce incorrect or nonsensical out", "arxiv_id": "2510.00034", "doi": "10.48550/arXiv.2510.00034"}
+{"id": "predicting-llm-reasoning-2025", "title": "Predicting LLM Reasoning Performance with Small Proxy Model", "authors": ["Woosung Koh", "Juyoung Suk", "Sungjun Han", "Se-young Yun", "Jay Shin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Given the prohibitive cost of pre-training large language models, it is essential to leverage smaller proxy models to optimize datasets before scaling up. However, this approach becomes challenging fo", "arxiv_id": "2509.21013", "doi": "10.48550/arXiv.2509.21013"}
+{"id": "novel-differential-feature-2025", "title": "A Novel Differential Feature Learning for Effective Hallucination Detection and Classification", "authors": ["Wenkai Wang", "Vincent Lee", "Yi Zheng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21357", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model hallucination represents a critical challenge where outputs deviate from factual accuracy due to distributional biases in training data. While recent investigations establish that", "arxiv_id": "2509.21357", "doi": "10.48550/arXiv.2509.21357"}
+{"id": "psychometric-personality-shaping-2025", "title": "Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models", "authors": ["Stephen Fitz", "P. Romero", "Steven Basart", "Sipeng Chen", "J. Hernández-Orallo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.16332", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models increasingly mediate high-stakes interactions, intensifying research on their capabilities and safety. While recent work has shown that LLMs exhibit consistent and measurable syn", "arxiv_id": "2509.16332", "doi": "10.48550/arXiv.2509.16332"}
+{"id": "from-firewalls-frontiers-2025", "title": "From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming", "authors": ["Anusha Sinha", "Keltin Grimes", "James Lucassen", "Michael Feffer", "Nathan Vanhoudnos"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.11398", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A red team simulates adversary attacks to help defenders find effective strategies to defend their systems in a real-world operational setting. As more enterprise systems adopt AI, red-teaming will ne", "arxiv_id": "2509.11398", "doi": "10.48550/arXiv.2509.11398"}
+{"id": "advancing-largemolecule-discovery-2025", "title": "Advancing large-molecule discovery with a unified digital platform for data analysis and workflow management", "authors": ["E. Natali", "Jana Hersch", "C. Freiberg", "Stephan Steigele"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1080/19420862.2025.2555346", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ABSTRACT The repertoire of large-molecule treatments continues to expand, resulting in diverse discovery and development workflows. This diversity yields a proliferation of software solutions and proc", "doi": "10.1080/19420862.2025.2555346"}
+{"id": "incontext-learning-learning-2025", "title": "Is In-Context Learning Learning?", "authors": ["Adrian de Wynter"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.10414", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In-context learning (ICL) allows some autoregressive models to solve tasks via next-token prediction and without needing further training. This has led to claims about these model's ability to solve (", "arxiv_id": "2509.10414", "doi": "10.48550/arXiv.2509.10414"}
+{"id": "fundamental-language-models-2025", "title": "Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size?", "authors": ["Jaime Collado-Montañez", "L. López", "Arturo Montejo Ráez"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.02225", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models offer impressive language capabilities but suffer from well-known limitations, including hallucinations, biases, privacy concerns, and high computational costs. These issues are ", "arxiv_id": "2509.02225", "doi": "10.48550/arXiv.2509.02225"}
+{"id": "temporal-knowledgebase-creation-2025", "title": "Towards Temporal Knowledge-Base Creation for Fine-Grained Opinion Analysis with Language Models", "authors": ["Gaurav Negi", "Atul Kr. Ojha", "Omnia Zayed", "Paul Buitelaar"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.02363", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose a scalable method for constructing a temporal opinion knowledge base with large language models (LLMs) as automated annotators. Despite the demonstrated utility of time-series opinion analy", "arxiv_id": "2509.02363", "doi": "10.48550/arXiv.2509.02363"}
+{"id": "artificial-human-intelligence-2025", "title": "Artificial or Human Intelligence?", "authors": ["Eric Gao"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.02879", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI) tools such as large language models (LLMs) are already altering student learning. Unlike previous technologies, LLMs can independently solve problems regardless of student", "arxiv_id": "2509.02879"}
+{"id": "llmled-visionspectral-fusion-2025", "title": "LLM-led vision-spectral fusion: A zero-shot approach to temporal fruit image classification", "authors": ["Huyu Wu", "Bowen Jia", "Xue-Ming Yuan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.neunet.2025.108155", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.neunet.2025.108155"}
+{"id": "responsible-artificial-intelligence-2025", "title": "Responsible Artificial Intelligence for Earth Observation: Achievable and realistic paths to serve the collective good", "authors": ["Pedram Ghamisi", "Weikang Yu", "Andrea Marinoni", "Caroline M. Gevaert", "Claudio Persello"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MGRS.2025.3529726", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The convergence of artificial intelligence (AI) and Earth observation (EO) technologies has brought geoscience and remote sensing into an era of unparalleled capabilities. AI’s transformative impact o", "doi": "10.1109/MGRS.2025.3529726"}
+{"id": "asymptotic-study-incontext-2025", "title": "Asymptotic Study of in-Context Learning with Random Transformers Through Equivalent Models", "authors": ["Samet Demir", "Zafer Dogan"], "year": 2025, "venue": "International Workshop on Machine Learning for Signal Processing", "source_url": "https://arxiv.org/abs/2509.15152", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where th", "arxiv_id": "2509.15152", "doi": "10.1109/MLSP62443.2025.11204336"}
+{"id": "april-api-synthesis-2025", "title": "APRIL: API Synthesis with Automatic Prompt Optimization and Reinforcement Learning", "authors": ["Hua Zhong", "Shan Jiang", "S. Khurshid"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.25196", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: APIs are central to modern software development, yet composing new APIs from large libraries is difficult due to the exponential search space; traditional component-based synthesis relies on costly ex", "arxiv_id": "2509.25196", "doi": "10.48550/arXiv.2509.25196"}
+{"id": "ramon-llulls-thinking-2025", "title": "The Ramon Llull's Thinking Machine for Automated Ideation", "authors": ["Xinran Zhao", "Boyuan Zheng", "Chenglei Si", "Haofei Yu", "Ke Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.19200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper revisits Ramon Llull's Ars combinatoria - a medieval framework for generating knowledge through symbolic recombination - as a conceptual foundation for building a modern Llull's thinking ma", "arxiv_id": "2508.19200", "doi": "10.48550/arXiv.2508.19200"}
+{"id": "edge-memorization-diffusion-2025", "title": "On the Edge of Memorization in Diffusion Models", "authors": ["Sam Buchanan", "Druv Pai", "Yi-Ting Ma", "Valentin De Bortoli"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.17689", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When do diffusion models reproduce their training data, and when are they able to generate samples beyond it? A practically relevant theoretical understanding of this interplay between memorization an", "arxiv_id": "2508.17689", "doi": "10.48550/arXiv.2508.17689"}
+{"id": "evaluating-embeddable-language-2025", "title": "Evaluating Embeddable Language Models in Verbalizing Rule-based Inferences through Justifications", "authors": ["Bastien Dussard", "Aurélie Clodic", "Guillaume Sarthou"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/RO-MAN63969.2025.11217601", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Language Models have shown promising performance, they still struggle with limitations regarding reasoning and are very token-sensitive. In contrast, knowledge-based systems, such as ontologies,", "doi": "10.1109/RO-MAN63969.2025.11217601"}
+{"id": "equinox-holistic-fair-2025", "title": "Equinox: Holistic Fair Scheduling in Serving Large Language Models", "authors": ["Zhixiang Wei", "James Yen", "Jingyi Chen", "Ziyang Zhang", "Zhibai Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.16646", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We address the limitations of current LLM serving with a dual-counter framework separating user and operator perspectives. The User Fairness Counter measures quality of service via weighted tokens and", "arxiv_id": "2508.16646", "doi": "10.48550/arXiv.2508.16646"}
+{"id": "gptoss-good-comprehensive-2025", "title": "Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models", "authors": ["Ziqian Bi", "Keyu Chen", "Chiung-Yi Tseng", "Danyang Zhang", "Tianyang Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.12461", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In August 2025, OpenAI released GPT-OSS models, its first open weight large language models since GPT-2 in 2019, comprising two mixture of experts architectures with 120B and 20B parameters. We evalua", "arxiv_id": "2508.12461", "doi": "10.48550/arXiv.2508.12461"}
+{"id": "survey-agentic-service-2025", "title": "A Survey on Agentic Service Ecosystems: Measurement, Analysis, and Optimization", "authors": ["Xuwen Zhang", "Xiao Xue", "Xia Xie", "Qun Ma", "Xiangning Yu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.07343", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Agentic Service Ecosystem consists of heterogeneous autonomous agents (e.g., intelligent machines, humans, and human-machine hybrid systems) that interact through resource exchange and service co-", "arxiv_id": "2508.07343", "doi": "10.48550/arXiv.2508.07343"}
+{"id": "investigating-intersectional-bias-2025", "title": "Investigating Intersectional Bias in Large Language Models using Confidence Disparities in Coreference Resolution", "authors": ["F. A. Khan", "N. Sivakumar", "Yinong Oliver Wang", "Katherine Metcalf", "Cezanne Camacho"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.07111", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved impressive performance, leading to their widespread adoption as decision-support tools in resource-constrained contexts like hiring and admissions. There is,", "arxiv_id": "2508.07111", "doi": "10.48550/arXiv.2508.07111"}
+{"id": "datasetresearch-benchmarking-agent-2025", "title": "DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery", "authors": ["Keyu Li", "Mohan Jiang", "Dayuan Fu", "Yunze Wu", "Xiangkun Hu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.06960", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models has fundamentally shifted the bottleneck in AI development from computational power to data availability-with countless valuable datasets remaining hidde", "arxiv_id": "2508.06960", "doi": "10.48550/arXiv.2508.06960"}
+{"id": "survivehr-competing-risks-2025", "title": "SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records", "authors": ["C. Gadd", "K. Gokhale", "A. Acharya", "J. Cooper", "F. Crowe"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2025.08.04.25332916", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1101/2025.08.04.25332916"}
+{"id": "how-does-controllability-2025", "title": "How Does Controllability Emerge In Language Models During Pretraining?", "authors": ["Jianshu She", "Xinyue Li", "Eric P. Xing", "Zhengzhong Liu", "Qirong Ho"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.01892", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models can be steered by modifying their internal representations to control concepts such as emotion, style, or truthfulness in generation. However, the conditions for an effective intervent", "arxiv_id": "2508.01892", "doi": "10.48550/arXiv.2508.01892"}
+{"id": "what-does-it-2025", "title": "What Does it Mean for a Neural Network to Learn a \"World Model\"?", "authors": ["Kenneth Li", "Fernanda Vi'egas", "Martin Wattenberg"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.21513", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose a set of precise criteria for saying a neural net learns and uses a\"world model.\"The goal is to give an operational meaning to terms that are often used informally, in order to provide a co", "arxiv_id": "2507.21513", "doi": "10.48550/arXiv.2507.21513"}
+{"id": "scaling-laws-data-2025", "title": "Using Scaling Laws for Data Source Utility Estimation in Domain-Specific Pre-Training", "authors": ["Oleksiy Ostapenko", "Charles Guille-Escuret", "Luke Kumar", "Max Tian", "Denis Kocetkov"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.22250", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce a framework for optimizing domain-specific dataset construction in foundation model training. Specifically, we seek a cost-efficient way to estimate the quality of data sources (e.g. synt", "arxiv_id": "2507.22250", "doi": "10.48550/arXiv.2507.22250"}
+{"id": "advancing-methodological-development-2025", "title": "Advancing methodological development of artificial intelligence in patient-centered comparative clinical effectiveness research: Patient-Centered Outcomes Research Institute’s unique contribution to research done differently", "authors": ["Jinghua Ou", "Erin Holve"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1093/jamiaopen/ooaf081", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Background Recent advancements of Artificial Intelligence (AI) are rapidly transforming clinical research. While this technology offers exciting opportunities, it amplifies existing concerns ", "doi": "10.1093/jamiaopen/ooaf081"}
+{"id": "thinking-isnt-illusion-2025", "title": "Thinking Isn't an Illusion: Overcoming the Limitations of Reasoning Models via Tool Augmentations", "authors": ["Zhao Song", "Song Yue", "Jiahao Zhang"], "year": 2025, "venue": "Robotics", "source_url": "https://arxiv.org/abs/2507.17699", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Reasoning Models (LRMs) have become a central focus in today’s large language model (LLM) research, where models are designed to output a step-by-step thinking process before arriving at a final", "arxiv_id": "2507.17699", "doi": "10.48550/arXiv.2507.17699"}
+{"id": "metric-assessment-protocol-2025", "title": "Metric assessment protocol in the context of answer fluctuation on MCQ tasks", "authors": ["Ekaterina Goliakova", "X. Renard", "Marie-Jeanne Lesot", "Thibault Laugel", "Christophe Marsala"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.15581", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Using multiple-choice questions (MCQs) has become a standard for assessing LLM capabilities efficiently. A variety of metrics can be employed for this task. However, previous research has not conducte", "arxiv_id": "2507.15581", "doi": "10.48550/arXiv.2507.15581"}
+{"id": "bottomup-domainspecific-superintelligence-2025", "title": "Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need", "authors": ["Bhishma Dedhia", "Yuval Kansal", "N. Jha"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.13966", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models traditionally used for cross-domain generalization have recently demonstrated task-specific reasoning. However, their top-down training approach on general corpora is insufficient for ", "arxiv_id": "2507.13966", "doi": "10.48550/arXiv.2507.13966"}
+{"id": "aiaided-tooling-domestic-2025", "title": "AI-Aided Tooling for Domestic Robots", "authors": ["Rituja Bhattacharya", "Neel Adwani", "Muhaiminul Islam Akash", "Cong Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AIM64088.2025.11175869", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Research in robotics has been quickly shifting from industrial applications such as manufacturing to domestic services such as cooking, cleaning, and organizing. A current focus of developing domestic", "doi": "10.1109/AIM64088.2025.11175869"}
+{"id": "agentsnet-coordination-collaborative-2025", "title": "AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs", "authors": ["Florian Grötschla", "Luis Müller", "Jan Tönshoff", "Mikhail Galkin", "Bryan Perozzi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.08616", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large-language models (LLMs) have demonstrated powerful problem-solving capabilities, in particular when organized in multi-agent systems. However, the advent of such systems also raises several quest", "arxiv_id": "2507.08616", "doi": "10.48550/arXiv.2507.08616"}
+{"id": "metalearning-transformers-improve-2025", "title": "Meta-Learning Transformers to Improve In-Context Generalization", "authors": ["Lorenzo Braccaioli", "Anna Vettoruzzo", "Prabhant Singh", "Joaquin Vanschoren", "Mohamed-Rafik Bouguelia"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.05019", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on lar", "arxiv_id": "2507.05019", "doi": "10.48550/arXiv.2507.05019"}
+{"id": "validityguided-workflow-robust-2025", "title": "A validity-guided workflow for robust large language model research in psychology", "authors": ["Zhicheng Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.04491", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are rapidly being integrated into psychological research as research tools, evaluation targets, human simulators, and cognitive models. However, recent evidence reveals se", "arxiv_id": "2507.04491", "doi": "10.48550/arXiv.2507.04491"}
+{"id": "peeping-at-creaitivity-2025", "title": "Peeping at creAItivity through a keyhole: creative self-perceptions, potential, and enhancement of GenAI chatbots", "authors": ["Dimitris Grammenos", "Todd Lubart"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10462-025-11288-6", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10462-025-11288-6"}
+{"id": "navigating-representation-utilizing-2025", "title": "Navigating representation: utilizing prompt engineering to minimize representational harms in journalist’s image captions", "authors": ["Habiba Sarhan", "Morteza Shahrezaye", "Simon Hegelich"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s43681-025-00773-x", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s43681-025-00773-x"}
+{"id": "quantization-model-neural-2023", "title": "The Quantization Model of Neural Scaling", "authors": ["Eric J. Michaud", "Ziming Liu", "Uzay Girit", "Max Tegmark"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2303.13506", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale.", "arxiv_id": "2303.13506", "doi": "10.48550/arXiv.2303.13506"}
+{"id": "grokking-modular-arithmetic-2023", "title": "Grokking modular arithmetic", "authors": ["A. Gromov"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2301.02679", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as ``grokking''. Concretely, we present (i) fully-connected two-layer netw", "arxiv_id": "2301.02679", "doi": "10.48550/arXiv.2301.02679"}
+{"id": "broken-neural-scaling-2022", "title": "Broken Neural Scaling Laws", "authors": ["Ethan Caballero", "Kshitij Gupta", "I. Rish", "David Krueger"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2210.14891", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models&extrapolates the scaling behaviors of deep neural networks (i.e. ", "arxiv_id": "2210.14891", "doi": "10.48550/arXiv.2210.14891"}
+{"id": "omnigrok-grokking-beyond-2022", "title": "Omnigrok: Grokking Beyond Algorithmic Data", "authors": ["Ziming Liu", "Eric J. Michaud", "Max Tegmark"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2210.01117", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking by analyzing the lo", "arxiv_id": "2210.01117", "doi": "10.48550/arXiv.2210.01117"}
+{"id": "scaling-laws-multiagent-2022", "title": "Scaling Laws for a Multi-Agent Reinforcement Learning Model", "authors": ["Oren Neumann", "C. Gros"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2210.00849", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent observation of neural power-law scaling relations has made a significant impact in the field of deep learning. A substantial amount of attention has been dedicated as a consequence to the d", "arxiv_id": "2210.00849", "doi": "10.48550/arXiv.2210.00849"}
+{"id": "hidden-progress-deep-2022", "title": "Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit", "authors": ["B. Barak", "Benjamin L. Edelman", "Surbhi Goel", "S. Kakade", "Eran Malach"], "year": 2022, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2207.08799", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resou", "arxiv_id": "2207.08799", "doi": "10.48550/arXiv.2207.08799"}
+{"id": "emergent-abilities-large-2022", "title": "Emergent Abilities of Large Language Models", "authors": ["Jason Wei", "Yi Tay", "Rishi Bommasani", "Colin Raffel", "Barret Zoph"], "year": 2022, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2206.07682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we", "arxiv_id": "2206.07682", "doi": "10.48550/arXiv.2206.07682"}
+{"id": "beyond-imitation-game-2022", "title": "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models", "authors": ["Aarohi Srivastava", "Abhinav Rastogi", "Abhishek Rao", "Abu Awal Md Shoeb", "Abubakar Abid"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2206.04615", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poo", "arxiv_id": "2206.04615"}
+{"id": "data-distributional-properties-2022", "title": "Data Distributional Properties Drive Emergent In-Context Learning in Transformers", "authors": ["Stephanie C. Y. Chan", "Adam Santoro", "Andrew Kyle Lampinen", "Jane X. Wang", "Aaditya K Singh"], "year": 2022, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2205.05055", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead", "arxiv_id": "2205.05055", "doi": "10.48550/arXiv.2205.05055"}
+{"id": "palm-scaling-language-2022", "title": "PaLM: Scaling Language Modeling with Pathways", "authors": ["A. Chowdhery", "Sharan Narang", "Jacob Devlin", "Maarten Bosma", "Gaurav Mishra"], "year": 2022, "venue": "Journal of machine learning research", "source_url": "https://arxiv.org/abs/2204.02311", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific traini", "arxiv_id": "2204.02311"}
+{"id": "unified-scaling-laws-2022", "title": "Unified Scaling Laws for Routed Language Models", "authors": ["Aidan Clark", "Diego de Las Casas", "Aurelia Guy", "Arthur Mensch", "Michela Paganini"], "year": 2022, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2202.01169", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditio", "arxiv_id": "2202.01169"}
+{"id": "lamda-language-models-2022", "title": "LaMDA: Language Models for Dialog Applications", "authors": ["R. Thoppilan", "Daniel De Freitas", "Jamie Hall", "Noam Shazeer", "Apoorv Kulshreshtha"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2201.08239", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on", "arxiv_id": "2201.08239"}
+{"id": "grokking-generalization-beyond-2022", "title": "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets", "authors": ["Alethea Power", "Yuri Burda", "Harrison Edwards", "Igor Babuschkin", "Vedant Misra"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2201.02177", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and spe", "arxiv_id": "2201.02177"}
+{"id": "refining-sharp-left-2022", "title": "Refining the sharp left turn threat model, part 2: applying alignment techniques", "authors": ["Unknown"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "detecting-emergent-behavior-2022", "title": "Detecting emergent behavior", "authors": ["Unknown"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "future-ml-systems-2022", "title": "Future ml systems will be qualitatively different", "authors": ["Unknown"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "cognitive-models-ai-2026", "title": "Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents", "authors": ["Ryan Liu", "Dilip Arumugam", "Cedegao E. Zhang", "Sean Escola", "Xaq Pitkow"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.22523", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While contemporary large language models (LLMs) are increasingly capable in isolation, there are still many difficult problems that lie beyond the abilities of a single LLM. For such tasks, there is s", "arxiv_id": "2602.22523"}
+{"id": "empirical-study-bugs-2026", "title": "An Empirical Study of Bugs in Modern LLM Agent Frameworks", "authors": ["Xinxue Zhu", "Jiacong Wu", "Xiaoyu Zhang", "Tianlin Li", "Yanzhou Mu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.21806", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the under", "arxiv_id": "2602.21806"}
+{"id": "training-generalizable-collaborative-2026", "title": "Training Generalizable Collaborative Agents via Strategic Risk Aversion", "authors": ["Chengrui Qu", "Yizhou Zhang", "Nicholas Lanzetti", "Eric Mazumdar"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.21515", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many emerging agentic paradigms require agents to collaborate with one another (or people) to achieve shared goals. Unfortunately, existing approaches to learning policies for such collaborative probl", "arxiv_id": "2602.21515"}
+{"id": "awcp-workspace-delegation-2026", "title": "AWCP: A Workspace Delegation Protocol for Deep-Engagement Collaboration across Remote Agents", "authors": ["Xiaohang Nie", "Zihan Guo", "Youliang Chen", "Yuanjian Zhou", "Weinan Zhang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.20493", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid evolution of Large Language Model (LLM)-based autonomous agents is reshaping the digital landscape toward an emerging Agentic Web, where increasingly specialized agents must collaborate to a", "arxiv_id": "2602.20493"}
+{"id": "artificial-brain-neuroscience-2026", "title": "The Artificial Brain: A Neuroscience Inspired Architecture for Multimodal AI Systems", "authors": ["Krrish Choudhary", "Tanvi Kandoi"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.65138/ijtrp.2026.v2i2.13", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current AI models process input through a single lens. The human brain never does this—it cross-references every sense against every other sense, flags conflicts, and only calls on expensive conscious", "doi": "10.65138/ijtrp.2026.v2i2.13"}
+{"id": "decoding-ml-decision-2026", "title": "Decoding ML Decision: An Agentic Reasoning Framework for Large-Scale Ranking System", "authors": ["Longfei Yun", "Yihang Wu", "Haoran Liu", "Xiaoxuan Liu", "Ziyun Xu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.18640", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern large-scale ranking systems operate within a sophisticated landscape of competing objectives, operational constraints, and evolving product requirements. Progress in this domain is increasingly", "arxiv_id": "2602.18640"}
+{"id": "wink-recovering-from-2026", "title": "Wink: Recovering from Misbehaviors in Coding Agents", "authors": ["Rahul Nanda", "Chandra Maddila", "Smriti Jha", "Euna Mehnaz Khan", "Matteo Paltenghi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.17037", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to a w", "arxiv_id": "2602.17037"}
+{"id": "five-fatal-assumptions-2026", "title": "Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects", "authors": ["Raja Soundaramourty", "O. Kilic", "R. Chenchaiah"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.17734", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agile estimation techniques, particularly T-shirt sizing, are widely used in software development for their simplicity and utility in scoping work. However, when we apply these methods to artificial i", "arxiv_id": "2602.17734"}
+{"id": "policy-compiler-secure-2026", "title": "Policy Compiler for Secure Agentic Systems", "authors": ["Nils Palumbo", "Sarthak Choudhary", "Jihye Choi", "P. Chalasani", "Somesh Jha"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.16708", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based agents are increasingly being deployed in contexts requiring complex authorization policies: customer service protocols, approval workflows, data access restrictions, and regulatory complian", "arxiv_id": "2602.16708"}
+{"id": "overseeing-agents-without-2026", "title": "Overseeing Agents Without Constant Oversight: Challenges and Opportunities", "authors": ["Madeleine Grunde-McLaughlin", "Hussein Mozannar", "Maya Murad", "Jingya Chen", "Saleema Amershi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.16844", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To enable human oversight, agentic AI systems often provide a trace of reasoning and action steps. Designing traces to have an informative, but not overwhelming, level of detail remains a critical cha", "arxiv_id": "2602.16844"}
+{"id": "vision-wormhole-latentspace-2026", "title": "The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems", "authors": ["Xiaoze Liu", "Ruowang Zhang", "Weicheng Yu", "Siheng Xiong", "Liu He"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.15382", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-Agent Systems (MAS) powered by Large Language Models have unlocked advanced collaborative reasoning, yet they remain shackled by the inefficiency of discrete text communication, which imposes si", "arxiv_id": "2602.15382"}
+{"id": "traceable-latent-variable-2026", "title": "Traceable Latent Variable Discovery Based on Multi-Agent Collaboration", "authors": ["Huaming Du", "Tao Hu", "Yijie Huang", "Yu Zhao", "Guisong Liu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.14456", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Revealing the underlying causal mechanisms in the real world is crucial for scientific and technological progress. Despite notable advances in recent decades, the lack of high-quality data and the rel", "arxiv_id": "2602.14456", "doi": "10.1145/3774904.3792244"}
+{"id": "from-fluent-verifiable-2026", "title": "From Fluent to Verifiable: Claim-Level Auditability for Deep Research Agents", "authors": ["Razeen A Rasheed", "Somnath Banerjee", "Animesh Mukherjee", "Rima Hazra"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.13855", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A deep research agent produces a fluent scientific report in minutes; a careful reader then tries to verify the main claims and discovers the real cost is not reading, but tracing: which sentence is s", "arxiv_id": "2602.13855"}
+{"id": "textresnet-decoupling-routing-2026", "title": "TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning", "authors": ["Suizhi Huang", "Mei Li", "Han Yu", "Xiaoxiao Li"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.08306", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation st", "arxiv_id": "2602.08306"}
+{"id": "evolutionary-generation-multiagent-2026", "title": "Evolutionary Generation of Multi-Agent Systems", "authors": ["Yun Hu", "Matthew Trager", "Yuting Zhang", "Yi Zhang", "Shuo Yang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06511", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks, but designing effective MAS architectures remains labor-intens", "arxiv_id": "2602.06511"}
+{"id": "value-variance-mitigating-2026", "title": "The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization", "authors": ["Luoxi Tang", "Yuqiao Meng", "J. Costa", "Yingxue Zhang", "Muchao Ye"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.07186", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous", "arxiv_id": "2602.07186"}
+{"id": "mama-gametheoretic-approach-2026", "title": "MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems", "authors": ["Jonathan Nother", "A. Singla", "Goran Radanovic"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04431", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based multi-agent systems have demonstrated impressive capabilities, but they also introduce significant safety risks when individual agents fail or behave adversarially. In this work, we study th", "arxiv_id": "2602.04431"}
+{"id": "uncertainty-large-language-2026", "title": "On the Uncertainty of Large Language Model-Based Multi-Agent Systems", "authors": ["Yuxuan Zhao", "Sijia Chen", "Ningxin Su"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04234", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upo", "arxiv_id": "2602.04234"}
+{"id": "vibe-aigc-new-2026", "title": "Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration", "authors": ["Jiaheng Liu", "Yuanxing Zhang", "Shihao Li", "Xinping Lei"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04575", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: For the past decade, the trajectory of generative artificial intelligence (AI) has been dominated by a model-centric paradigm driven by scaling laws. Despite significant leaps in visual fidelity, this", "arxiv_id": "2602.04575"}
+{"id": "socialveil-probing-social-2026", "title": "SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers", "authors": ["Keyang Xuan", "Pengda Wang", "Chongrui Ye", "Haofei Yu", "Tal August"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05115", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents,", "arxiv_id": "2602.05115"}
+{"id": "latentmem-customizing-latent-2026", "title": "LatentMem: Customizing Latent Memory for Multi-Agent Systems", "authors": ["Muxin Fu", "Guibin Zhang", "Xiangyuan Xue", "Yafu Li", "Zefeng He"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.03036", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-powered multi-agent systems (MAS) demonstrate remarkable collective intelligence, wherein multi-agent memory serves as a pivotal mechanism for continual adaptation. However,", "arxiv_id": "2602.03036"}
+{"id": "sidiffagent-selfimproving-diffusion-2026", "title": "SIDiffAgent: Self-Improving Diffusion Agent", "authors": ["Shivank Garg", "Ayush Singh", "Gaurav Nayak"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02051", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Text-to-image diffusion models have revolutionized generative AI, enabling high-quality and photorealistic image synthesis. However, their practical deployment remains hindered by several limitations:", "arxiv_id": "2602.02051"}
+{"id": "crossmodal-memory-compression-2026", "title": "Cross-Modal Memory Compression for Efficient Multi-Agent Debate", "authors": ["Jing Wu", "Yueqing Sun", "Tianpei Xie", "Suiyao Chen", "Jingyuan Bao"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00454", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent debate can improve reasoning quality and reduce hallucinations, but it incurs rapidly growing context as debate rounds and agent count increase. Retaining full textual histories leads to t", "arxiv_id": "2602.00454"}
+{"id": "dual-latent-memory-2026", "title": "Dual Latent Memory for Visual Multi-agent System", "authors": ["Xinlei Yu", "Chengming Xu", "Zhangquan Chen", "Bo Yin", "Cheng Yang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00471", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive\"scaling wall\": increasing agent tur", "arxiv_id": "2602.00471"}
+{"id": "raudit-blind-auditing-2026", "title": "RAudit: A Blind Auditing Protocol for Large Language Model Reasoning", "authors": ["Edward Y. Chang", "Longling Geng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.23133", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inference-time scaling can amplify reasoning pathologies: sycophancy, rung collapse, and premature certainty. We present RAudit, a diagnostic protocol for auditing LLM reasoning without ground truth a", "arxiv_id": "2601.23133", "doi": "10.48550/arXiv.2601.23133"}
+{"id": "learning-decentralized-llm-2026", "title": "Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic", "authors": ["Shuo Liu", "Tianle Chen", "R. Amiri", "Christopher Amato"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21972", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined execution protocols, which often r", "arxiv_id": "2601.21972", "doi": "10.48550/arXiv.2601.21972"}
+{"id": "six-sigma-agent-2026", "title": "The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution", "authors": ["Khush Patel", "Siva Surendira", "J. George", "Shreyas Kapale"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22290", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models demonstrate remarkable capabilities yet remain fundamentally probabilistic, presenting critical reliability challenges for enterprise deployment. We introduce the Six Sigma Agent", "arxiv_id": "2601.22290", "doi": "10.48550/arXiv.2601.22290"}
+{"id": "why-reasoning-fails-2026", "title": "Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents", "authors": ["Zehong Wang", "Fang Wu", "Hongru Wang", "Xiangru Tang", "Bolian Li"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22311", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-based agents exhibit strong step-by-step reasoning capabilities over short horizons, yet often fail to sustain coherent behavior over long planning horizons. We argue that t", "arxiv_id": "2601.22311", "doi": "10.48550/arXiv.2601.22311"}
+{"id": "interpreting-emergent-extreme-2026", "title": "Interpreting Emergent Extreme Events in Multi-Agent Systems", "authors": ["Ling Tang", "Jilin Mei", "Dongrui Liu", "Chen Qian", "Dawei Cheng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.20538", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model-powered multi-agent systems have emerged as powerful tools for simulating complex human-like systems. The interactions within these systems often lead to extreme events whose orig", "arxiv_id": "2601.20538", "doi": "10.48550/arXiv.2601.20538"}
+{"id": "yunque-deepresearch-technical-2026", "title": "Yunque DeepResearch Technical Report", "authors": ["Yuxuan Cai", "Xinyi Lai", "Peng Yuan", "Weiting Liu", "Huajian Li"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19578", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep research has emerged as a transformative capability for autonomous agents, empowering Large Language Models to navigate complex, open-ended tasks. However, realizing its full potential is hindere", "arxiv_id": "2601.19578", "doi": "10.48550/arXiv.2601.19578"}
+{"id": "when-nobody-around-2026", "title": "When Nobody Around Is Real: Exploring Public Opinions and User Experiences On the Multi-Agent AI Social Platform", "authors": ["Qiufang Yu", "Mengmeng Wu", "Xingyu Lan"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.18275", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Powered by large language models, a new genre of multi-agent social platforms has emerged. Apps such as Social.AI deploy numerous AI agents that emulate human behavior, creating unprecedented bot-cent", "arxiv_id": "2601.18275", "doi": "10.48550/arXiv.2601.18275"}
+{"id": "neurosymbolic-verification-instruction-2026", "title": "Neuro-Symbolic Verification on Instruction Following of LLMs", "authors": ["Yiming Su", "Kunzhao Xu", "Yanjie Gao", "Fan Yang", "Cheng Liu"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17789", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A fundamental problem of applying Large Language Models (LLMs) to important applications is that LLMs do not always follow instructions, and violations are often hard to observe or check. In LLM-based", "arxiv_id": "2601.17789", "doi": "10.48550/arXiv.2601.17789"}
+{"id": "think-locally-explain-2026", "title": "Think Locally, Explain Globally: Graph-Guided LLM Investigations via Local Reasoning and Belief Propagation", "authors": ["Saurabh Jha", "Rohan R. Arora", "Bhavya", "Noah Zheutlin", "Paulina Toro Isaza"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17915", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM agents excel when environments are mostly static and the needed information fits in a model's context window, but they often fail in open-ended investigations where explanations must be constructe", "arxiv_id": "2601.17915", "doi": "10.48550/arXiv.2601.17915"}
+{"id": "automated-structural-testing-2026", "title": "Automated structural testing of LLM-based agents: methods, framework, and case studies", "authors": ["Jens Kohl", "Otto Kruse", "Youssef Mostafa", "Andre Luckow", "K. Schroer"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.18827", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based agents are rapidly being adopted across diverse domains. Since they interact with users without supervision, they must be tested extensively. Current testing approaches focus on acceptance-l", "arxiv_id": "2601.18827", "doi": "10.48550/arXiv.2601.18827"}
+{"id": "declarative-agentic-layer-2026", "title": "Towards a Declarative Agentic Layer for Intelligent Agents in MCP-Based Server Ecosystems", "authors": ["María Jesús Rodríguez-Sánchez", "Manuel Noguera", "Ángel Ruiz-Zafra", "K. Benghazi"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17435", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in Large Language Models (LLMs) have enabled the development of increasingly complex agentic and multi-agent systems capable of planning, tool use and task decomposition. However, empi", "arxiv_id": "2601.17435", "doi": "10.48550/arXiv.2601.17435"}
+{"id": "mixtureofmodels-unifying-heterogeneous-2026", "title": "Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation", "authors": ["Tims Pečerskis", "Aivars Smirnovs"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.16863", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces the N-Way Self-Evaluating Deliberation (NSED) protocol, a Runtime Mixture-of-Models (MoM) architecture that constructs emergent composite models from a plurality of distinct expe", "arxiv_id": "2601.16863", "doi": "10.5281/zenodo.18234923"}
+{"id": "when-agents-fail-2026", "title": "When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling", "authors": ["Niful Islam", "Ragib Shahariar Ayon", "Deepak-George Thomas", "Shibbir Ahmed", "Mohammad Wardat"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.15232", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized intelligent application development. While standalone LLMs cannot perform any actions, LLM agents address the limitation by integrating tools. However,", "arxiv_id": "2601.15232", "doi": "10.48550/arXiv.2601.15232"}
+{"id": "why-behind-action-2026", "title": "The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution", "authors": ["Chen Qian", "Peng Wang", "Dongrui Liu", "Junyao Yang", "Dadi Guo"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.15075", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering. As these systems become more autonomous and are d", "arxiv_id": "2601.15075", "doi": "10.48550/arXiv.2601.15075"}
+{"id": "cooperbench-why-coding-2026", "title": "CooperBench: Why Coding Agents Cannot be Your Teammates Yet", "authors": ["A. Khatua", "Hao Zhu", "Peter Tran", "Arya Prabhudesai", "Frederic Sadrieh"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.13295", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they", "arxiv_id": "2601.13295", "doi": "10.48550/arXiv.2601.13295"}
+{"id": "institutional-ai-governing-2026", "title": "Institutional AI: Governing LLM Collusion in Multi-Agent Cournot Markets via Public Governance Graphs", "authors": ["Marcantonio Bracale", "Federico Pierucci", "Marcello Galisai", "Matteo Prandi", "Piercosma Bisconti"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11369", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent LLM ensembles can converge on coordinated, socially harmful equilibria. This paper advances an experimental framework for evaluating Institutional AI, our system-level approach to AI align", "arxiv_id": "2601.11369", "doi": "10.48550/arXiv.2601.11369"}
+{"id": "bridging-human-interpretation-2026", "title": "Bridging Human Interpretation and Machine Representation: A Landscape of Qualitative Data Analysis in the LLM Era", "authors": ["Xinyu Pi", "Qisen Yang", "Chuong Nguyen", "Hua Shen"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11739", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs are increasingly used to support qualitative research, yet existing systems produce outputs that vary widely--from trace-faithful summaries to theory-mediated explanations and system models. To m", "arxiv_id": "2601.11739", "doi": "10.48550/arXiv.2601.11739"}
+{"id": "from-single-multiagent-2026", "title": "From Single to Multi-Agent Reasoning: Advancing GeneGPT for Genomics QA", "authors": ["Kimia Abedini", "Farzad Shami", "Gianmaria Silvello"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.10581", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Comprehending genomic information is essential for biomedical research, yet extracting data from complex distributed databases remains challenging. Large language models (LLMs) offer potential for gen", "arxiv_id": "2601.10581", "doi": "10.48550/arXiv.2601.10581"}
+{"id": "agent-contracts-formal-2026", "title": "Agent Contracts: A Formal Framework for Resource-Bounded Autonomous AI Systems", "authors": ["Qing Ye", "Jing Tan"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.08815", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resou", "arxiv_id": "2601.08815", "doi": "10.48550/arXiv.2601.08815"}
+{"id": "lidl-llm-integration-2026", "title": "LIDL: LLM Integration Defect Localization via Knowledge Graph-Enhanced Multi-Agent Analysis", "authors": ["Gou Tan", "Zilong He", "Min Li", "Pengfei Chen", "Jieke Shi"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.05539", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-integrated software, which embeds or interacts with large language models (LLMs) as functional components, exhibits probabilistic and context-dependent behaviors that fundamentally differ from tho", "arxiv_id": "2601.05539", "doi": "10.48550/arXiv.2601.05539"}
+{"id": "jenius-agent-experiencedriven-2026", "title": "Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios", "authors": ["Defei Xia", "Bingfeng Pi", "Shenbin Zhang", "Song Hua", "Yunfei Wei"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.01857", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As agent systems powered by large language models (LLMs) advance, improving performance in context understanding, tool usage, and long-horizon execution has become critical. However, existing agent fr", "arxiv_id": "2601.01857", "doi": "10.48550/arXiv.2601.01857"}
+{"id": "maestro-multiagent-evaluation-2026", "title": "MAESTRO: Multi-Agent Evaluation Suite for Testing, Reliability, and Observability", "authors": ["Tie Ma", "Yixi Chen", "Vaastav Anand", "Alessandro Cornacchia", "Amândio Faustino"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00481", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present MAESTRO, an evaluation suite for the testing, reliability, and observability of LLM-based MAS. MAESTRO standardizes MAS configuration and execution through a unified interface, supports int", "arxiv_id": "2601.00481", "doi": "10.48550/arXiv.2601.00481"}
+{"id": "limagents-multiagent-llms-2025", "title": "LimAgents: Multi-Agent LLMs for Generating Research Limitations", "authors": ["Ibrahim Al Azher", "Zhishuai Guo", "Hamed Alhoori"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11578", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Identifying and articulating limitations is essential for transparent and rigorous scientific research. However, zero-shot large language models (LLMs) approach often produce superficial or general li", "arxiv_id": "2601.11578", "doi": "10.48550/arXiv.2601.11578"}
+{"id": "does-it-tie-2025", "title": "Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital", "authors": ["P. Colombo", "Malik Boudiaf", "Allyn Sweet", "Michael Desa", "Hongxi Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.18658", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Before closing venture capital financing rounds, lawyers conduct diligence that includes tying out the capitalization table: verifying that every security (for example, shares, options, warrants) and ", "arxiv_id": "2512.18658", "doi": "10.48550/arXiv.2512.18658"}
+{"id": "let-barbarians-how-2025", "title": "Let the Barbarians In: How AI Can Accelerate Systems Performance Research", "authors": ["Audrey Cheng", "Shu Liu", "Melissa Z. Pan", "Zhifei Li", "Shubham Agarwal"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.14806", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) is beginning to transform the research process by automating the discovery of new solutions. This shift depends on the availability of reliable verifiers, which AI-driven ", "arxiv_id": "2512.14806", "doi": "10.48550/arXiv.2512.14806"}
+{"id": "swenergy-empirical-study-2025", "title": "SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs", "authors": ["Arihant Tripathy", "Ch Pavan Harshit", "Karthik Vaidhyanathan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.09543", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context. LLM-based autonomous agents in software engineering rely on large, proprietary models, limiting local deployment. This has spurred interest in Small Language Models (SLMs), but their practica", "arxiv_id": "2512.09543", "doi": "10.48550/arXiv.2512.09543"}
+{"id": "evolving-excellence-automated-2025", "title": "Evolving Excellence: Automated Optimization of LLM-based Agents", "authors": ["Paul Brookes", "Vardan K. Voskanyan", "Rafail Giavrimis", "Matthew Truscott", "Mina Ilieva"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.09108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underper", "arxiv_id": "2512.09108", "doi": "10.48550/arXiv.2512.09108"}
+{"id": "insured-agents-decentralized-2025", "title": "Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy", "authors": ["B. Hu", "Bangdao Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08737", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emerging\"agentic web\"envisions large populations of autonomous agents coordinating, transacting, and delegating across open networks. Yet many agent communication and commerce protocols treat agen", "arxiv_id": "2512.08737", "doi": "10.48550/arXiv.2512.08737"}
+{"id": "science-scaling-agent-2025", "title": "Towards a Science of Scaling Agent Systems", "authors": ["Y. Kim", "Ken Gu", "Chanwoo Park", "Chunjong Park", "Samuel Schmidgall"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08296", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agents, language model-based systems that are capable of reasoning, planning, and acting are becoming the dominant paradigm for real-world AI applications. Despite this widespread adoption, the princi", "arxiv_id": "2512.08296", "doi": "10.48550/arXiv.2512.08296"}
+{"id": "harmtransform-transforming-explicit-2025", "title": "HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate", "authors": ["Shenzhe Zhu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.23717", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are equipped with safety mechanisms to detect and block harmful queries, yet current alignment approaches primarily focus on overtly dangerous content and overlook more su", "arxiv_id": "2512.23717", "doi": "10.48550/arXiv.2512.23717"}
+{"id": "dover-interventiondriven-auto-2025", "title": "DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems", "authors": ["Ming-Jie Ma", "Jue Zhang", "Fangkai Yang", "Yu Kang", "Qingwei Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.06749", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-based multi-agent systems are challenging to debug because failures often arise from long, branching interaction traces. The prevailing practice is to leverage LLMs for log-", "arxiv_id": "2512.06749", "doi": "10.48550/arXiv.2512.06749"}
+{"id": "sok-trustauthorization-mismatch-2025", "title": "SoK: Trust-Authorization Mismatch in LLM Agent Interactions", "authors": ["Guanquan Shi", "Haohua Du", "Zhiqiang Wang", "Xiaoyu Liang", "Weiwen Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.06914", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are evolving into autonomous agents capable of executing complex workflows via standardized protocols (e.g., MCP). However, this paradigm shifts control from deterministic", "arxiv_id": "2512.06914", "doi": "10.48550/arXiv.2512.06914"}
+{"id": "llm-harms-taxonomy-2025", "title": "LLM Harms: A Taxonomy and Discussion", "authors": ["Kevin Chen", "Saleh Afroogh", "Abhejay Murali", "David Atkinson", "Amit Dhurandhar"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.05929", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study addresses categories of harm surrounding Large Language Models (LLMs) in the field of artificial intelligence. It addresses five categories of harms addressed before, during, and after deve", "arxiv_id": "2512.05929", "doi": "10.48550/arXiv.2512.05929"}
+{"id": "comparative-study-designing-2025", "title": "A Comparative Study Towards Designing a Hybrid Architecture of Microservices and LLM-based Multi-Agent Systems", "authors": ["Peyman Yazdanian", "Yan Liu", "Zhengyang Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/APSEC66846.2025.00077", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based Multi-Agent Systems (LLM-MAS) present an emerging paradigm for constructing intelligent and adaptive applications that enable autonomous reasoning and collaborative problem-solving. Empirica", "doi": "10.1109/APSEC66846.2025.00077"}
+{"id": "measuring-agents-production-2025", "title": "Measuring Agents in Production", "authors": ["Melissa Z. Pan", "Negar Arabzadeh", "Riccardo Cogo", "Yuxuan Zhu", "Alexander Xiong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.04123", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based agents already operate in production across many industries, yet we lack an understanding of what technical methods make deployments successful. We present the first systematic study of Meas", "arxiv_id": "2512.04123", "doi": "10.48550/arXiv.2512.04123"}
+{"id": "beyond-singleagent-safety-2025", "title": "Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions", "authors": ["Piercosma Bisconti", "Marcello Galisai", "Federico Pierucci", "Marcantonio Bracale", "Matteo Prandi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.02682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper examines why safety mechanisms designed for human-model interaction do not scale to environments where large language models (LLMs) interact with each other. Most current governance practic", "arxiv_id": "2512.02682", "doi": "10.48550/arXiv.2512.02682"}
+{"id": "processcentric-analysis-agentic-2025", "title": "Process-Centric Analysis of Agentic Software Systems", "authors": ["Shuyang Liu", "Yang Chen", "Rahul Krishna", "Saurabh Sinha", "Jatin Ganhotra"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.02393", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agentic systems are modern software systems: they consist of orchestrated modules, expose interfaces, and are deployed in software pipelines. Unlike conventional programs, their execution (i.e., traje", "arxiv_id": "2512.02393", "doi": "10.48550/arXiv.2512.02393"}
+{"id": "how-far-we-2025", "title": "How Far Are We from Genuinely Useful Deep Research Agents?", "authors": ["Dingling Zhang", "He Zhu", "Jincheng Ren", "Kangqi Song", "Xinran Zhou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.01948", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep Research Agents (DRAs) aim to automatically produce analyst-level reports through iterative information retrieval and synthesis. However, most existing DRAs were validated on question-answering b", "arxiv_id": "2512.01948", "doi": "10.48550/arXiv.2512.01948"}
+{"id": "crystalyse-multitool-agent-2025", "title": "Crystalyse: a multi-tool agent for materials design", "authors": ["Ryan Nduma", "Hyunsoo Park", "Aron Walsh"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2512.00977", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present Crystalyse, an open, provenance-enforced scientific agent for computational materials design of inorganic crystals that orchestrates tools for compositional screening, crystal structure gen", "arxiv_id": "2512.00977"}
+{"id": "flockvote-llmempowered-agentbased-2025", "title": "FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections", "authors": ["Lingfeng Zhou", "Yi Xu", "Zhenyu Wang", "Dequan Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.05982", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modeling complex human behavior, such as voter decisions in national elections, is a long-standing challenge for computational social science. Traditional agent-based models (ABMs) are limited by over", "arxiv_id": "2512.05982", "doi": "10.48550/arXiv.2512.05982"}
+{"id": "saber-small-actions-2025", "title": "SABER: Small Actions, Big Errors - Safeguarding Mutating Steps in LLM Agents", "authors": ["Alejandro Cuadron", "Pengfei Yu", "Yang Liu", "Arpit Gupta"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.07850", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: \\emph{do all actions contribute equa", "arxiv_id": "2512.07850", "doi": "10.48550/arXiv.2512.07850"}
+{"id": "failure-modes-llm-2025", "title": "Failure Modes in LLM Systems: A System-Level Taxonomy for Reliable AI Applications", "authors": ["Vaishali Vinay"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19933", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are being rapidly integrated into decision-support tools, automation workflows, and AI-enabled software systems. However, their behavior in production environments remains", "arxiv_id": "2511.19933", "doi": "10.48550/arXiv.2511.19933"}
+{"id": "latent-collaboration-multiagent-2025", "title": "Latent Collaboration in Multi-Agent Systems", "authors": ["Jiaru Zou", "Xiyuan Yang", "Ruizhong Qiu", "Gaotang Li", "Katherine Tieu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.20639", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediatio", "arxiv_id": "2511.20639", "doi": "10.48550/arXiv.2511.20639"}
+{"id": "fara7b-efficient-agentic-2025", "title": "Fara-7B: An Efficient Agentic Model for Computer Use", "authors": ["Ahmed Awadallah", "Yash Lara", "Raghav Magazine", "Hussein Mozannar", "Akshay Nambi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19663", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Progress in computer use agents (CUAs) has been constrained by the absence of large and high-quality datasets that capture how humans interact with a computer. While LLMs have thrived on abundant text", "arxiv_id": "2511.19663", "doi": "10.48550/arXiv.2511.19663"}
+{"id": "animagents-coordinating-multistage-2025", "title": "AnimAgents: Coordinating Multi-Stage Animation Pre-Production with Human-Multi-Agent Collaboration", "authors": ["Wen-Fan Wang", "Chien-Ting Lu", "Jin Ping Ng", "Yi-Ting Chiu", "Ting-Ying Lee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.17906", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Animation pre-production lays the foundation of an animated film by transforming initial concepts into a coherent blueprint across interdependent stages such as ideation, scripting, design, and storyb", "arxiv_id": "2511.17906", "doi": "10.48550/arXiv.2511.17906"}
+{"id": "hiding-ai-traffic-2025", "title": "Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming", "authors": ["Strahinja Janjusevic", "Anna Barón Garcia", "Sohrob Kazerounian"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15998", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI is reshaping offensive cybersecurity by enabling autonomous red team agents that can plan, execute, and adapt during penetration tests. However, existing approaches face trade-offs betwe", "arxiv_id": "2511.15998", "doi": "10.48550/arXiv.2511.15998"}
+{"id": "sensorium-arc-ai-2025", "title": "Sensorium Arc: AI Agent System for Oceanic Data Exploration and Interactive Eco-Art", "authors": ["Noah Bissell", "Ethan Paley", "Joshua Harrison", "Juliano Calil", "Myungin Lee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15997", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Sensorium Arc (AI reflects on climate) is a real-time multimodal interactive AI agent system that personifies the ocean as a poetic speaker and guides users through immersive explorations of complex m", "arxiv_id": "2511.15997", "doi": "10.48550/arXiv.2511.15997"}
+{"id": "automatically-surfacing-opportunities-2025", "title": "Automatically Surfacing Opportunities for Improvements In Internet-Scale Applications", "authors": ["Vipul Harsh", "Sayan Sinha", "Henry Milner", "Haijie Wu", "Aditya Prakash"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3772356.3772423", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern Internet services generate massive volumes of observability data, yet identifying opportunities for business performance improvements remains elusive. In many cases, such insights manifest only", "doi": "10.1145/3772356.3772423"}
+{"id": "characterization-study-bugs-2025", "title": "A Characterization Study of Bugs in LLM Agent Workflow Orchestration Frameworks", "authors": ["Ziluo Xue", "Yanjie Zhao", "Shenao Wang", "Kai Chen", "Haoyu Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASE63991.2025.00278", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have rapidly gained popularity, transforming research and industry. To support their adoption, LLM agent workflow orchestration frameworks (hereinafter referred to as LLM ", "doi": "10.1109/ASE63991.2025.00278"}
+{"id": "multiagent-collaborative-fuzzing-2025", "title": "Multi-Agent Collaborative Fuzzing with Continuous Reflection for Smart Contracts Vulnerability Detection", "authors": ["Jie Chen", "Liangmin Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.12164", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fuzzing is a widely used technique for detecting vulnerabilities in smart contracts, which generates transaction sequences to explore the execution paths of smart contracts. However, existing fuzzers ", "arxiv_id": "2511.12164", "doi": "10.48550/arXiv.2511.12164"}
+{"id": "designing-llmbased-multiagent-2025", "title": "Designing LLM-based Multi-Agent Systems for Software Engineering Tasks: Quality Attributes, Design Patterns and Rationale", "authors": ["Yangxiao Cai", "Ruiyin Li", "Peng Liang", "Mojtaba Shahin", "Zengyang Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.08475", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As the complexity of Software Engineering (SE) tasks continues to escalate, Multi-Agent Systems (MASs) have emerged as a focal point of research and practice due to their autonomy and scalability. Fur", "arxiv_id": "2511.08475", "doi": "10.48550/arXiv.2511.08475"}
+{"id": "convergence-dynamics-agenttoagent-2025", "title": "Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives", "authors": ["Romain Cosentino", "Sarath Shekkizhar", "Adam Earle"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.08710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-layer transfor", "arxiv_id": "2511.08710", "doi": "10.48550/arXiv.2511.08710"}
+{"id": "researchrubrics-benchmark-prompts-2025", "title": "ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents", "authors": ["Manasi Sharma", "Chen Bo Calvin Zhang", "Chaithanya Bandi", "Clinton Wang", "Ankit Aich"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.07685", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep Research (DR) is an emerging agent application that leverages large language models (LLMs) to address open-ended queries. It requires the integration of several capabilities, including multi-step", "arxiv_id": "2511.07685", "doi": "10.48550/arXiv.2511.07685"}
+{"id": "scenariodriven-reference-architecture-2025", "title": "Towards Scenario-Driven Reference Architecture for Integrating Microservices and LLM-Based Multi-Agent Systems", "authors": ["Peyman Yazdanian", "Yan Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CASCON66301.2025.00107", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Microservice Systems (MS) and Large Language Model-based Multi-Agent Systems (LLM-MAS) are two major paradigms shaping modern software design. While MS emphasizes modularity and scalability, LLM-MAS i", "doi": "10.1109/CASCON66301.2025.00107"}
+{"id": "when-ai-agents-2025", "title": "When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms", "authors": ["Qibing Ren", "Zhijie Zheng", "Jiaxuan Guo", "Junchi Yan", "Lizhuang Ma"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.06448", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we study the risks of collective financial fraud in large-scale multi-agent systems powered by large language model (LLM) agents. We investigate whether agents can collaborate in fraudul", "arxiv_id": "2511.06448", "doi": "10.48550/arXiv.2511.06448"}
+{"id": "tamas-benchmarking-adversarial-2025", "title": "TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems", "authors": ["Ishan Kavathekar", "Hemang Jain", "Ameya Rathod", "P. Kumaraguru", "Tanuja Ganu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.05269", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents through tool use, planning, and decision-making abilities, leading to their widespread adoption across diverse t", "arxiv_id": "2511.05269", "doi": "10.48550/arXiv.2511.05269"}
+{"id": "detecting-silent-failures-2025", "title": "Detecting Silent Failures in Multi-Agentic AI Trajectories", "authors": ["Divya Pathak", "Harshit Kumar", "Anuska Roy", "Felix George", "Mudit Verma"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.04032", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-Agentic AI systems, powered by large language models (LLMs), are inherently non-deterministic and prone to silent failures such as drift, cycles, and missing details in outputs, which are diffic", "arxiv_id": "2511.04032", "doi": "10.48550/arXiv.2511.04032"}
+{"id": "sherlock-reliable-efficient-2025", "title": "Sherlock: Reliable and Efficient Agentic Workflow Execution", "authors": ["Yeonju Ro", "Haoran Qiu", "Íñigo Goiri", "Rodrigo Fonseca", "Ricardo Bianchini"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.00330", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional appl", "arxiv_id": "2511.00330", "doi": "10.48550/arXiv.2511.00330"}
+{"id": "rise-potential-opportunities-2025", "title": "The rise and potential opportunities of large language model agents in bioinformatics and biomedicine", "authors": ["Tiantian Yang", "Yihang Xiao", "Zhijie Bao", "Jianye Hao", "Jiajie Peng"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1093/bib/bbaf601", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Large language model (LLM) agents have demonstrated remarkable potential in the fields of bioinformatics and biomedicine. This paper reviews the technical foundations of LLM agents, including", "doi": "10.1093/bib/bbaf601"}
+{"id": "stop-wasting-your-2025", "title": "Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems", "authors": ["Fulin Lin", "Shaowen Chen", "Ruishan Fang", "Hongwei Wang", "Tao Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.26585", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Multi-Agent Systems (MAS) excel at complex tasks, their growing autonomy with operational complexity often leads to critical inefficiencies, such as excessive token consumption and failures aris", "arxiv_id": "2510.26585", "doi": "10.48550/arXiv.2510.26585"}
+{"id": "code-aesthetics-agentic-2025", "title": "Code Aesthetics with Agentic Reward Feedback", "authors": ["Bangjun Xiao", "Lingjie Jiang", "Shaohan Huang", "Tengchao Lv", "Yupan Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23272", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have become valuable assistants for developers in code-related tasks. While LLMs excel at traditional programming tasks such as code generation and bug fixing, they strugg", "arxiv_id": "2510.23272", "doi": "10.48550/arXiv.2510.23272"}
+{"id": "from-benchmarks-business-2025", "title": "From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production", "authors": ["Segev Shlomov", "Alon Oved", "Sami Marreed", "Ido Levy", "Offer Akrabi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23856", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is comp", "arxiv_id": "2510.23856", "doi": "10.48550/arXiv.2510.23856"}
+{"id": "multiagent-evolve-llm-2025", "title": "Multi-Agent Evolve: LLM Self-Improve through Co-evolution", "authors": ["Yixin Chen", "Yiding Wang", "Siqi Zhu", "Hao Yu", "Tao Feng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23595", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs). However, the success of RL for LLMs heavily relies on human-", "arxiv_id": "2510.23595", "doi": "10.48550/arXiv.2510.23595"}
+{"id": "cosight-enhancing-llmbased-2025", "title": "Co-Sight: Enhancing LLM-Based Agents via Conflict-Aware Meta-Verification and Trustworthy Reasoning with Structured Facts", "authors": ["Hongwei Zhang", "Ji Lu", "Shiqi Jiang", "Chen Zhu", "Lipeng Xie"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.21557", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Long-horizon reasoning in LLM-based agents often fails not from generative weakness but from insufficient verification of intermediate reasoning. Co-Sight addresses this challenge by turning reasoning", "arxiv_id": "2510.21557", "doi": "10.48550/arXiv.2510.21557"}
+{"id": "thought-communication-multiagent-2025", "title": "Thought Communication in Multiagent Collaboration", "authors": ["Yujia Zheng", "Zhuokai Zhao", "Zijian Li", "Yaqi Xie", "Mingze Gao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.20733", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Natural language has long enabled human cooperation, but its lossy, ambiguous, and indirect nature limits the potential of collective intelligence. While machines are not subject to these constraints,", "arxiv_id": "2510.20733", "doi": "10.48550/arXiv.2510.20733"}
+{"id": "foundational-automatic-evaluators-2025", "title": "Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains", "authors": ["Austin Xu", "Xuan-Phi Nguyen", "Yilun Zhou", "Chien-Sheng Wu", "Caiming Xiong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.17793", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Finetuning specialized generative evaluators has emerged as a popular paradigm to meet the increasing demand for scalable evaluation during both training and test-time. However, recent work has largel", "arxiv_id": "2510.17793", "doi": "10.48550/arXiv.2510.17793"}
+{"id": "marshal-incentivizing-multiagent-2025", "title": "MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs", "authors": ["Huining Yuan", "Zelai Xu", "Zheyue Tan", "Xiangmin Yi", "Mo Guang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.15414", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Developing Large Language Models (LLMs) to cooperate and compete effectively within multi-agent systems (MASs) is a critical step towards more advanced intelligence. While reinforcement learning (RL) ", "arxiv_id": "2510.15414"}
+{"id": "build-your-personalized-2025", "title": "Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation", "authors": ["Ed Li", "Junyu Ren", "Xintian Pan", "Cat Yan", "Chuanhao Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.15624", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The automation of scientific discovery represents a critical milestone in Artificial Intelligence (AI) research. However, existing agentic systems for science suffer from two fundamental limitations: ", "arxiv_id": "2510.15624", "doi": "10.48550/arXiv.2510.15624"}
+{"id": "metacognitive-selfcorrection-multiagent-2025", "title": "Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction", "authors": ["Xu Shen", "Qi Zhang", "Song Wang", "Zhen Tan", "Xinyu Zhao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.14319", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model based multi-agent systems (MAS) excel at collaborative problem solving but remain brittle to cascading errors: a single faulty step can propagate across agents and disrupt the tra", "arxiv_id": "2510.14319", "doi": "10.48550/arXiv.2510.14319"}
+{"id": "engineering-multiagent-llms-2025", "title": "Towards Engineering Multi-Agent LLMs: A Protocol-Driven Approach", "authors": ["Zhenyu Mao", "Jacky W. Keung", "Fengji Zhang", "Shuo Liu", "Yifei Wang"], "year": 2025, "venue": "Asia-Pacific Software Engineering Conference", "source_url": "https://arxiv.org/abs/2510.12120", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing demand for software development has driven interest in automating software engineering (SE) tasks using Large Language Models (LLMs). Recent efforts extend LLMs into multi-agent systems", "arxiv_id": "2510.12120", "doi": "10.1109/APSEC66846.2025.00100"}
+{"id": "automating-structural-engineering-2025", "title": "Automating Structural Engineering Workflows with Large Language Model Agents", "authors": ["Haoran Liang", "Yufa Zhou", "Mohammad Talebi-Kalaleh", "Qipei Mei"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.11004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce $\\textbf{MASSE}$, the first Multi-Agent System for Structural Engineering, effectively integrating large language model (LLM)-based agents with real-world engineering workflows. Structura", "arxiv_id": "2510.11004", "doi": "10.48550/arXiv.2510.11004"}
+{"id": "strongermas-multiagent-reinforcement-2025", "title": "Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs", "authors": ["Yujie Zhao", "Lanxiang Hu", "Yang Wang", "Minmin Hou", "Hao Zhang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.11062", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestr", "arxiv_id": "2510.11062"}
+{"id": "llm-as-judge-2025", "title": "LLM as a Judge", "authors": ["Md. Faizul Ibne Amin"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3156/jsoft.37.3_69_1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.3156/jsoft.37.3_69_1"}
+{"id": "which-agent-causes-2025", "title": "Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems", "authors": ["Shaokun Zhang", "Ming Yin", "Jieyu Zhang", "Jiale Liu", "Zhiguang Han"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2505.00212", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. ", "arxiv_id": "2505.00212", "doi": "10.48550/arXiv.2505.00212"}
+{"id": "interactive-debugging-steering-2025", "title": "Interactive Debugging and Steering of Multi-Agent AI Systems", "authors": ["Will Epperson", "Gagan Bansal", "Victor C. Dibia", "Adam Fourney", "Jack Gerrits"], "year": 2025, "venue": "International Conference on Human Factors in Computing Systems", "source_url": "https://arxiv.org/abs/2503.02068", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams", "arxiv_id": "2503.02068", "doi": "10.1145/3706598.3713581"}
+{"id": "multiagent-risks-from-2025", "title": "Multi-Agent Risks from Advanced AI", "authors": ["Lewis Hammond", "Alan Chan", "Jesse Clifton", "J. Hoelscher-Obermaier", "Akbir Khan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.14143", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel an", "arxiv_id": "2502.14143", "doi": "10.48550/arXiv.2502.14143"}
+{"id": "scaling-testtime-compute-2025", "title": "Scaling Test-Time Compute Without Verification or RL is Suboptimal", "authors": ["Amrith Rajagopal Setlur", "Nived Rajaraman", "Sergey Levine", "Aviral Kumar"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.12118", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite substantial advances in scaling test-time compute, an ongoing debate in the community is how it should be scaled up to enable continued and efficient improvements with scaling. There are large", "arxiv_id": "2502.12118", "doi": "10.48550/arXiv.2502.12118"}
+{"id": "harnessing-language-coordination-2024", "title": "Harnessing Language for Coordination: A Framework and Benchmark for LLM-Driven Multiagent Control", "authors": ["Timothée Anne", "Noah Syrkis", "Meriem Elhosni", "Florian Turati", "Franck Legendre"], "year": 2024, "venue": "IEEE Transactions on Games", "source_url": "https://arxiv.org/abs/2412.11761", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated remarkable performance across various tasks. Their potential to facilitate human coordination with many agents is a promising but largely under-explored ", "arxiv_id": "2412.11761", "doi": "10.1109/TG.2025.3564042"}
+{"id": "teamcraft-benchmark-multimodal-2024", "title": "TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft", "authors": ["Qian Long", "Zhi Li", "Ran Gong", "Ying Wu", "D. Terzopoulos"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.05255", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Collaboration is a cornerstone of society. In the real world, human teammates make use of multi-sensory data to tackle challenging tasks in ever-changing environments. It is essential for embodied age", "arxiv_id": "2412.05255", "doi": "10.48550/arXiv.2412.05255"}
+{"id": "challenges-humanagent-communication-2024", "title": "Challenges in Human-Agent Communication", "authors": ["Gagan Bansal", "Jennifer Wortman Vaughan", "Saleema Amershi", "Eric Horvitz", "Adam Fourney"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.10380", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Remarkable advancements in modern generative foundation models have enabled the development of sophisticated and highly capable autonomous agents that can observe their environment, invoke tools, and ", "arxiv_id": "2412.10380", "doi": "10.48550/arXiv.2412.10380"}
+{"id": "inference-scaling-flaws-2024", "title": "Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers", "authors": ["Benedikt Stroebl", "Sayash Kapoor", "Arvind Narayanan"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.17501", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent research has generated hope that inference scaling could allow weaker language models to match or exceed the accuracy of stronger models, such as by repeatedly sampling solutions to a coding pr", "arxiv_id": "2411.17501", "doi": "10.48550/arXiv.2411.17501"}
+{"id": "specifications-missing-link-2024", "title": "Specifications: The missing link to making the development of LLM systems an engineering discipline", "authors": ["Ion Stoica", "Matei Zaharia", "Joseph Gonzalez", "Ken Goldberg", "Koushik Sen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.05299", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the significant strides made by generative AI in just a few short years, its future progress is constrained by the challenge of building modular and robust systems. This capability has been a ", "arxiv_id": "2412.05299", "doi": "10.48550/arXiv.2412.05299"}
+{"id": "does-prompt-formatting-2024", "title": "Does Prompt Formatting Have Any Impact on LLM Performance?", "authors": ["Jia He", "Mukund Rungta", "David Koleczek", "Arshdeep Sekhon", "Franklin Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.10541", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the realm of Large Language Models (LLMs), prompt optimization is crucial for model performance. Although previous research has explored aspects like rephrasing prompt contexts, using various promp", "arxiv_id": "2411.10541"}
+{"id": "virtual-lab-ai-2024", "title": "The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation", "authors": ["Kyle Swanson", "Wesley Wu", "Nash L. Bulaong", "J. Pak", "James Y. Zou"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2024.11.11.623004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Science frequently benefits from teams of interdisciplinary researchers. However, most scientists don’t have access to experts from multiple fields. Fortunately, large language models (LLMs) have rece", "doi": "10.1101/2024.11.11.623004"}
+{"id": "magenticone-generalist-multiagent-2024", "title": "Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks", "authors": ["Adam Fourney", "Gagan Bansal", "Hussein Mozannar", "Cheng Tan", "Eduardo Salinas"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.04468", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern AI agents, driven by advances in large foundation models, promise to enhance our productivity and transform our lives by augmenting our knowledge and capabilities. To achieve this vision, AI ag", "arxiv_id": "2411.04468", "doi": "10.48550/arXiv.2411.04468"}
+{"id": "autokaggle-multiagent-framework-2024", "title": "AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions", "authors": ["Ziming Li", "Qianbo Zang", "David Ma", "Jiawei Guo", "Tuney Zheng"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.20424", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists da", "arxiv_id": "2410.20424", "doi": "10.48550/arXiv.2410.20424"}
+{"id": "optima-optimizing-effectiveness-2024", "title": "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System", "authors": ["Weize Chen", "Jiarui Yuan", "Cheng Qian", "Cheng Yang", "Zhiyuan Liu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2410.08115", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scala", "arxiv_id": "2410.08115", "doi": "10.48550/arXiv.2410.08115"}
+{"id": "verifierq-enhancing-llm-2024", "title": "VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers", "authors": ["Jianing Qi", "Hao Tang", "Zhigang Zhu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.08048", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in test time compute, particularly through the use of verifier models, have significantly enhanced the reasoning capabilities of Large Language Models (LLMs). This generator-verifi", "arxiv_id": "2410.08048", "doi": "10.48550/arXiv.2410.08048"}
+{"id": "survey-llmbased-multiagent-2024", "title": "A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges", "authors": ["Xinyi Li", "Sai Wang", "Siqi Zeng", "Yu Wu", "Yi Yang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s44336-024-00009-2", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The pursuit of more intelligent and credible autonomous systems, akin to human society, has been a long-standing endeavor for humans. Leveraging the exceptional reasoning and planning capabilities of ", "doi": "10.1007/s44336-024-00009-2"}
+{"id": "testgeneval-real-world-2024", "title": "TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark", "authors": ["Kush Jain", "Gabriele Synnaeve", "Baptiste Rozière"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.00752", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation models can help improve many common software tasks ranging from code completion to defect prediction. Most of the existing benchmarks for code generation LLMs focus on code authoring o", "arxiv_id": "2410.00752", "doi": "10.48550/arXiv.2410.00752"}
+{"id": "qwen25coder-technical-report-2024", "title": "Qwen2.5-Coder Technical Report", "authors": ["Binyuan Hui", "Jian Yang", "Zeyu Cui", "Jiaxi Yang", "Dayiheng Liu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.12186", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes six models: Qwen2.5-Coder-(0.5B/1.5B/3B/7B/14B/32B). As a code-spec", "arxiv_id": "2409.12186"}
+{"id": "improving-llm-reasoning-2024", "title": "Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent", "authors": ["Fatemeh Haji", "Mazal Bethany", "Maryam Tabar", "J. Chiang", "Anthony Rios"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.11527", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent strategies have emerged as a promising approach to enhance the reasoning abilities of Large Language Models (LLMs) by assigning specialized roles in the problem-solving process. Concurrent", "arxiv_id": "2409.11527", "doi": "10.48550/arXiv.2409.11527"}
+{"id": "hyperagent-generalist-software-2024", "title": "HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale", "authors": ["H. N. Phan", "Phong X. Nguyen", "Nghi D. Q. Bui"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.16299", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of aut", "arxiv_id": "2409.16299", "doi": "10.48550/arXiv.2409.16299"}
+{"id": "battleagentbench-benchmark-evaluating-2024", "title": "BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems", "authors": ["Wei Wang", "Dan Zhang", "Tao Feng", "Boyan Wang", "Jie Tang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.15971", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks, e.g., building single agents and multi-agent systems. Compared to single agents, multi-agent syst", "arxiv_id": "2408.15971", "doi": "10.48550/arXiv.2408.15971"}
+{"id": "appworld-controllable-world-2024", "title": "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents", "authors": ["H. Trivedi", "Tushar Khot", "Mareike Hartmann", "R. Manku", "Vinty Dong"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2407.18901", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autonomous agents that address day-to-day digital tasks (e.g., ordering groceries for a household), must not only operate multiple apps (e.g., notes, messaging, shopping app) via APIs, but also genera", "arxiv_id": "2407.18901", "doi": "10.48550/arXiv.2407.18901"}
+{"id": "proververifier-games-improve-2024", "title": "Prover-Verifier Games improve legibility of LLM outputs", "authors": ["J. Kirchner", "Yining Chen", "Harri Edwards", "Jan Leike", "Nat McAleese"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.13692", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: One way to increase confidence in the outputs of Large Language Models (LLMs) is to support them with reasoning that is clear and easy to check -- a property we call legibility. We study legibility in", "arxiv_id": "2407.13692", "doi": "10.48550/arXiv.2407.13692"}
+{"id": "survey-useful-llm-2024", "title": "A Survey of Useful LLM Evaluation", "authors": ["Jinjun Peng", "Sijia Cheng", "Egil Diau", "Yung-Yu Shih", "Po-Heng Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.00936", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs have gotten attention across various research domains due to their exceptional performance on a wide range of complex tasks. Therefore, refined methods to evaluate the capabilities of LLMs are ne", "arxiv_id": "2406.00936", "doi": "10.48550/arXiv.2406.00936"}
+{"id": "assessing-verifying-task-2024", "title": "Assessing and Verifying Task Utility in LLM-Powered Applications", "authors": ["Negar Arabzadeh", "Siging Huo", "N. Mehta", "Qinqyun Wu", "Chi Wang"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2405.02178", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a signific", "arxiv_id": "2405.02178", "doi": "10.48550/arXiv.2405.02178"}
+{"id": "heterogeneous-multiagent-reinforcement-2024", "title": "Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration", "authors": ["Xudong Guo", "Daming Shi", "Junjie Yu", "Wenhui Fan"], "year": 2024, "venue": "Neurocomputing", "source_url": "https://arxiv.org/abs/2404.03869", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain mult", "arxiv_id": "2404.03869", "doi": "10.48550/arXiv.2404.03869"}
+{"id": "stateflow-enhancing-llm-2024", "title": "StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows", "authors": ["Yiran Wu", "Tianwei Yue", "Shaokun Zhang", "Chi Wang", "Qingyun Wu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.11322", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: It is a notable trend to use Large Language Models (LLMs) to tackle complex tasks, e.g., tasks that require a sequence of actions and dynamic interaction with tools and external environments. In this ", "arxiv_id": "2403.11322", "doi": "10.48550/arXiv.2403.11322"}
+{"id": "gsmplus-comprehensive-benchmark-2024", "title": "GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers", "authors": ["Qintong Li", "Leyang Cui", "Xueliang Zhao", "Lingpeng Kong", "Wei Bi"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.19255", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. However, there are increasing debates regarding whether these models truly understan", "arxiv_id": "2402.19255", "doi": "10.48550/arXiv.2402.19255"}
+{"id": "mtbench101-finegrained-benchmark-2024", "title": "MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues", "authors": ["Ge Bai", "Jie Liu", "Xingyuan Bu", "Yancheng He", "Jiaheng Liu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.14762", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of Large Language Models (LLMs) has drastically enhanced dialogue systems. However, comprehensively evaluating the dialogue abilities of LLMs remains a challenge. Previous benchmarks have p", "arxiv_id": "2402.14762", "doi": "10.18653/v1/2024.acl-long.401"}
+{"id": "large-language-model-2024", "title": "Large Language Model based Multi-Agents: A Survey of Progress and Challenges", "authors": ["Taicheng Guo", "Xiuying Chen", "Yaqi Wang", "Ruidi Chang", "Shichao Pei"], "year": 2024, "venue": "International Joint Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2402.01680", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to their notable capabilities in planning and reasoning, LLMs have been utilized as autonomous agents fo", "arxiv_id": "2402.01680", "doi": "10.48550/arXiv.2402.01680"}
+{"id": "exploring-large-language-2024", "title": "Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects", "authors": ["Yuheng Cheng", "Ceyao Zhang", "Zhengwen Zhang", "Xiangrui Meng", "Sirui Hong"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2401.03428", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from ", "arxiv_id": "2401.03428", "doi": "10.48550/arXiv.2401.03428"}
+{"id": "survey-large-language-2023", "title": "A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly", "authors": ["Yifan Yao", "Jinhao Duan", "Kaidi Xu", "Yuanfang Cai", "Eric Sun"], "year": 2023, "venue": "High-Confidence Computing", "source_url": "https://arxiv.org/abs/2312.02003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabili", "arxiv_id": "2312.02003", "doi": "10.1016/j.hcc.2024.100211"}
+{"id": "benchmarl-benchmarking-multiagent-2023", "title": "BenchMARL: Benchmarking Multi-Agent Reinforcement Learning", "authors": ["Matteo Bettini", "Amanda Prorok", "Vincent Moens"], "year": 2023, "venue": "Journal of machine learning research", "source_url": "https://arxiv.org/abs/2312.01472", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a", "arxiv_id": "2312.01472", "doi": "10.48550/arXiv.2312.01472"}
+{"id": "gaia-benchmark-general-2023", "title": "GAIA: a benchmark for General AI Assistants", "authors": ["G. Mialon", "Clémentine Fourrier", "Craig Swift", "Thomas Wolf", "Yann LeCun"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.12983", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities suc", "arxiv_id": "2311.12983", "doi": "10.48550/arXiv.2311.12983"}
+{"id": "reasoning-large-language-2023", "title": "Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration", "authors": ["Zhenran Xu", "Senbao Shi", "Baotian Hu", "Jindi Yu", "Dongfang Li"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.08152", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks. Recent studies have explored human-like pr", "arxiv_id": "2311.08152", "doi": "10.48550/arXiv.2311.08152"}
+{"id": "memgpt-llms-as-2023", "title": "MemGPT: Towards LLMs as Operating Systems", "authors": ["Charles Packer", "Vivian Fang", "Shishir G. Patil", "Kevin Lin", "Sarah Wooders"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.08560", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using", "arxiv_id": "2310.08560", "doi": "10.48550/arXiv.2310.08560"}
+{"id": "llmcoordination-evaluating-analyzing-2023", "title": "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models", "authors": ["Saaket Agashe", "Yue Fan", "Anthony Reyna", "Xin Eric Wang"], "year": 2023, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2310.03903", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study in", "arxiv_id": "2310.03903", "doi": "10.18653/v1/2025.findings-naacl.448"}
+{"id": "dspy-compiling-declarative-2023", "title": "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines", "authors": ["O. Khattab", "Arnav Singhvi", "Paridhi Maheshwari", "Zhiyuan Zhang", "Keshav Santhanam"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.03714", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically i", "arxiv_id": "2310.03714"}
+{"id": "rise-potential-large-2023", "title": "The Rise and Potential of Large Language Model Based Agents: A Survey", "authors": ["Zhiheng Xi", "Wenxiang Chen", "Xin Guo", "Wei He", "Yiwen Ding"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2309.07864", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial", "arxiv_id": "2309.07864", "doi": "10.48550/arXiv.2309.07864"}
+{"id": "chateval-better-llmbased-2023", "title": "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate", "authors": ["Chi-Min Chan", "Weize Chen", "Yusheng Su", "Jianxuan Yu", "Wei Xue"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2308.07201", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost. With the emergence of large language models (LLMs), researchers have explored LLMs' pote", "arxiv_id": "2308.07201", "doi": "10.48550/arXiv.2308.07201"}
+{"id": "trustworthy-llms-survey-2023", "title": "Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment", "authors": ["Yang Liu", "Yuanshun Yao", "Jean-François Ton", "Xiaoying Zhang", "Ruocheng Guo"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2308.05374", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications.", "arxiv_id": "2308.05374", "doi": "10.48550/arXiv.2308.05374"}
+{"id": "rltf-reinforcement-learning-2023", "title": "RLTF: Reinforcement Learning from Unit Test Feedback", "authors": ["Jiate Liu", "Yiqin Zhu", "Kaiwen Xiao", "Qiang Fu", "Xiao Han"], "year": 2023, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2307.04349", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning", "arxiv_id": "2307.04349", "doi": "10.48550/arXiv.2307.04349"}
+{"id": "building-cooperative-embodied-2023", "title": "Building Cooperative Embodied Agents Modularly with Large Language Models", "authors": ["Hongxin Zhang", "Weihua Du", "Jiaming Shan", "Qinhong Zhou", "Yilun Du"], "year": 2023, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2307.02485", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we address challenging multi-agent cooperation problems with decentralized control, raw sensory observations, costly communication, and multi-objective tasks instantiated in various embo", "arxiv_id": "2307.02485", "doi": "10.48550/arXiv.2307.02485"}
+{"id": "judging-llmasajudge-mtbench-2023", "title": "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena", "authors": ["Lianmin Zheng", "Wei-Lin Chiang", "Ying Sheng", "Siyuan Zhuang", "Zhanghao Wu"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2306.05685", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we", "arxiv_id": "2306.05685"}
+{"id": "multiagent-collaboration-harnessing-2023", "title": "Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents", "authors": ["Yashar Talebirad", "Amirhossein Nadiri"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2306.03314", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we present a novel framework for enhancing the capabilities of large language models (LLMs) by leveraging the power of multi-agent systems. Our framework introduces a collaborative envi", "arxiv_id": "2306.03314", "doi": "10.48550/arXiv.2306.03314"}
+{"id": "gorilla-large-language-2023", "title": "Gorilla: Large Language Model Connected with Massive APIs", "authors": ["Shishir G. Patil", "Tianjun Zhang", "Xin Wang", "Joseph E. Gonzalez"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2305.15334", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their po", "arxiv_id": "2305.15334", "doi": "10.52202/079017-4020"}
+{"id": "improving-factuality-reasoning-2023", "title": "Improving Factuality and Reasoning in Language Models through Multiagent Debate", "authors": ["Yilun Du", "Shuang Li", "A. Torralba", "J. Tenenbaum", "Igor Mordatch"], "year": 2023, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2305.14325", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in language generation, understanding, and few-shot learning in recent years. An extensive body of work has explored how their pe", "arxiv_id": "2305.14325", "doi": "10.48550/arXiv.2305.14325"}
+{"id": "tree-thoughts-deliberate-2023", "title": "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", "authors": ["Shunyu Yao", "Dian Yu", "Jeffrey Zhao", "Izhak Shafran", "T. Griffiths"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2305.10601", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inferenc", "arxiv_id": "2305.10601", "doi": "10.48550/arXiv.2305.10601"}
+{"id": "generative-agents-interactive-2023", "title": "Generative Agents: Interactive Simulacra of Human Behavior", "authors": ["J. Park", "Joseph C. O’Brien", "Carrie J. Cai", "M. Morris", "Percy Liang"], "year": 2023, "venue": "ACM Symposium on User Interface Software and Technology", "source_url": "https://arxiv.org/abs/2304.03442", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, w", "arxiv_id": "2304.03442", "doi": "10.1145/3586183.3606763"}
+{"id": "camel-communicative-agents-2023", "title": "CAMEL: Communicative Agents for \"Mind\" Exploration of Large Language Model Society", "authors": ["G. Li", "Hasan Hammoud", "Hani Itani", "Dmitrii Khizbullin", "Bernard Ghanem"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2303.17760", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be", "arxiv_id": "2303.17760"}
+{"id": "check-your-facts-2023", "title": "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback", "authors": ["Baolin Peng", "Michel Galley", "Pengcheng He", "Hao Cheng", "Yujia Xie"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2302.12813", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to", "arxiv_id": "2302.12813", "doi": "10.48550/arXiv.2302.12813"}
+{"id": "large-language-models-2022", "title": "Large Language Models are Better Reasoners with Self-Verification", "authors": ["Yixuan Weng", "Minjun Zhu", "Fei Xia", "Bin Li", "Shizhu He"], "year": 2022, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2212.09561", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, com", "arxiv_id": "2212.09561", "doi": "10.18653/v1/2023.findings-emnlp.167"}
+{"id": "model-context-protocol-2024", "title": "Model context protocol: Introduction", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "more-llm-calls-2024", "title": "Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems", "authors": ["Lingjiao Chen", "J. Davis", "Boris Hanin", "Peter Bailis", "Ion Stoica"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2403.02419", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2403.02419"}
+{"id": "analyzing-failure-trucks-2024", "title": "Analyzing the Failure of a Truck's Rear Axle Shaft Requires a Thorough Investigation to Pinpoint the Root Cause of the Issue", "authors": ["Kumar Ashish", "S. Pandey", "Daya Shankar Diwakar"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.46610/joaaen.2024.v09i01.003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The analysis of a failed rear axle shaft from a dumper truck involved a comprehensive investigation to ascertain the underlying cause of the failure. Visual inspection, material analysis, mechanical t", "doi": "10.46610/joaaen.2024.v09i01.003"}
+{"id": "llmbased-multiagent-systems-2024", "title": "LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead", "authors": ["Junda He", "Christoph Treude", "David Lo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2404.04834", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2404.04834"}
+{"id": "autogen-enabling-nextgen-2023", "title": "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework", "authors": ["Qingyun Wu", "Gagan Bansal", "Jieyu Zhang", "Yiran Wu", "Shaokun Zhang"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/1a4c6856292b8c64d19a812a77f0aa6fd47cb96c", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "2023", "title": "TOWARDS A", "authors": ["Nelson Mafai Maguelva", "Hakdaoui Mustapha", "Frédéric Hubert"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/5c18d988a9f56d42a59e684c56af7dc1ada35659", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "programmed-please-moral-2026", "title": "Programmed to please: the moral and epistemic harms of AI sycophancy", "authors": ["C. Turner", "Nir Eisikovits"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s43681-026-01007-4", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s43681-026-01007-4"}
+{"id": "manatee-inferencetime-lightweight-2026", "title": "MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs", "authors": ["Chun Yan Ryan Kan", "Tommy Tran", "Vedant Yadav", "A.H. Cai", "Kevin Zhu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.18782", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Defending LLMs against adversarial jailbreak attacks remains an open challenge. Existing defenses rely on binary classifiers that fail when adversarial input falls outside the learned decision boundar", "arxiv_id": "2602.18782"}
+{"id": "concept-influence-leveraging-2026", "title": "Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution", "authors": ["Matthew Kowal", "Gonçalo Paulo", "Louis Jaburi", "Tom Tseng", "Lev E McKinney"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.14869", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language models are increasingly trained and fine-tuned, practitioners need methods to identify which training data drive specific behaviors, particularly unintended ones. Training Data Attri", "arxiv_id": "2602.14869"}
+{"id": "backdooring-bias-large-2026", "title": "Backdooring Bias in Large Language Models", "authors": ["Anudeep Das", "Prach Chantasantitam", "Gurjot Singh", "Lipeng He", "M. Ponomarenko"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.13427", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly deployed in settings where inducing a bias toward a certain topic can have significant consequences, and backdoor attacks can be used to produce such mode", "arxiv_id": "2602.13427"}
+{"id": "capabilityoriented-training-induced-2026", "title": "Capability-Oriented Training Induced Alignment Risk", "authors": ["Yujun Zhou", "Yue Huang", "Han Bao", "Kehan Guo", "Zhenwen Liang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.12124", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While most AI alignment research focuses on preventing models from generating explicitly harmful content, a more subtle risk is emerging: capability-oriented training induced exploitation. We investig", "arxiv_id": "2602.12124"}
+{"id": "inthewild-model-organisms-2026", "title": "In-the-Wild Model Organisms: Mitigating Undesirable Emergent Behaviors in Production LLM Post-Training via Data Attribution", "authors": ["Frank Xiao", "Santiago Aranguri"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.11079", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose activation-based data attribution, a method that traces behavioral changes in post-trained language models to responsible training datapoints. By computing activation-difference vectors for", "arxiv_id": "2602.11079"}
+{"id": "simple-llm-baselines-2026", "title": "Simple LLM Baselines are Competitive for Model Diffing", "authors": ["Elias Kempf", "Simon Schrodi", "Bartosz Cywi'nski", "Thomas Brox", "Neel Nanda"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.10371", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Standard LLM evaluations only test capabilities or dispositions that evaluators designed them for, missing unexpected differences such as behavioral shifts between model revisions or emergent misalign", "arxiv_id": "2602.10371"}
+{"id": "emergent-misalignment-easy-2026", "title": "Emergent Misalignment is Easy, Narrow Misalignment is Hard", "authors": ["Anna Soligo", "Edward Turner", "Senthooran Rajamanoharan", "Neel Nanda"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.07852", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Finetuning large language models on narrowly harmful datasets can cause them to become emergently misaligned, giving stereotypically `evil'responses across diverse unrelated settings. Concerningly, a ", "arxiv_id": "2602.07852"}
+{"id": "understanding-multimodal-finetuning-2026", "title": "Towards Understanding Multimodal Fine-Tuning: Spatial Features", "authors": ["Lachin Naghashyar", "Hunar Batra", "Ashkan Khakzar", "Philip Torr", "Ronald Clark"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.08713", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Contemporary Vision-Language Models (VLMs) achieve strong performance on a wide range of tasks by pairing a vision encoder with a pre-trained language model, fine-tuned for visual-text inputs. Yet des", "arxiv_id": "2602.08713"}
+{"id": "artificial-organisations-2026", "title": "Artificial Organisations", "authors": ["W. Waites"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.13275", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigate the risk posed by misaligned individuals throug", "arxiv_id": "2602.13275"}
+{"id": "split-personality-training-2026", "title": "Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities", "authors": ["F. Dietz", "William Wale", "Oscar Gilg", "Robert McCarthy", "Felix Michalak"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05532", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Detecting misalignment in large language models is challenging because models may learn to conceal misbehavior during training. Standard auditing techniques fall short: black-box methods often cannot ", "arxiv_id": "2602.05532"}
+{"id": "from-helpfulness-toxic-2026", "title": "From Helpfulness to Toxic Proactivity: Diagnosing Behavioral Misalignment in LLM Agents", "authors": ["Xinyue Wang", "Yuanhe Zhang", "Zheng Gong", "Haoran Gao", "Fanyu Meng"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04197", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The enhanced capabilities of LLM-based agents come with an emergency for model planning and tool-use abilities. Attributing to helpful-harmless trade-off from LLM alignment, agents typically also inhe", "arxiv_id": "2602.04197"}
+{"id": "trigger-haystack-extracting-2026", "title": "The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers", "authors": ["Blake Bullwinkel", "Giorgio Severi", "Keegan Hines", "Amanda Minnich", "Ram Shankar Siva Kumar"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.03085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Detecting whether a model has been poisoned is a longstanding problem in AI security. In this work, we present a practical scanner for identifying sleeper agent-style backdoors in causal language mode", "arxiv_id": "2602.03085"}
+{"id": "phantom-transfer-datalevel-2026", "title": "Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning", "authors": ["Andrew Draganov", "T. H. Dur", "Anandmayi Bhongade", "Mary Phuong"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04899", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an otherwise benign dataset, you cannot filter it out. We ac", "arxiv_id": "2602.04899"}
+{"id": "safetyefficacy-trade-off-2026", "title": "Safety-Efficacy Trade Off: Robustness against Data-Poisoning", "authors": ["Diego Granziol"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00822", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises from a fu", "arxiv_id": "2602.00822"}
+{"id": "assessing-domainlevel-susceptibility-2026", "title": "Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning", "authors": ["Abhishek Mishra", "Mugilan Arulvanan", "Reshma Ashok", "P. Petrova", "Deepesh Suranjandass"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00298", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Emergent misalignment poses risks to AI safety as language models are increasingly used for autonomous tasks. In this paper, we present a population of large language models (LLMs) fine-tuned on insec", "arxiv_id": "2602.00298"}
+{"id": "stepshield-when-not-2026", "title": "StepShield: When, Not Whether to Intervene on Rogue Agents", "authors": ["Gloria Felicia", "Michael Eniolade", "Jinfeng He", "Zitha Sasindran", "H. Kumar"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22136", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing agent safety benchmarks report binary accuracy, conflating early intervention with post-mortem analysis. A detector that flags a violation at step 8 enables intervention; one that reports it ", "arxiv_id": "2601.22136", "doi": "10.48550/arXiv.2601.22136"}
+{"id": "hairtrigger-alignment-blackbox-2026", "title": "Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment", "authors": ["Yavuz Faruk Bakman", "D. Yaldiz", "S. Avestimehr", "Sai Praneeth Karimireddy"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22313", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are rarely static and are frequently updated in practice. A growing body of alignment research has shown that models initially deemed\"aligned\"can exhibit misaligned behavi", "arxiv_id": "2601.22313", "doi": "10.48550/arXiv.2601.22313"}
+{"id": "prompt-injection-attacks-2026", "title": "Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems", "authors": ["Narek Maloyan", "Dmitry Namiot"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17548", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has fundamentally transformed software development workflows. ", "arxiv_id": "2601.17548", "doi": "10.48550/arXiv.2601.17548"}
+{"id": "values-science-ai-2026", "title": "Values in science and AI alignment research", "authors": ["L. Dung"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1080/0020174x.2026.2615773", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Roughly, empirical AI alignment research (AIA) is an area of AI research which investigates empirically how to design AI systems in line with human goals. This paper examines the role of non-epistemic", "doi": "10.1080/0020174x.2026.2615773"}
+{"id": "institutional-ai-governance-2026", "title": "Institutional AI: A Governance Framework for Distributional AGI Safety", "authors": ["Federico Pierucci", "Marcello Galisai", "Marcantonio Bracale", "Matteo Prandi", "Piercosma Bisconti"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.10599", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As LLM-based systems increasingly operate as agents embedded within human social and technical systems, alignment can no longer be treated as a property of an isolated model, but must be understood in", "arxiv_id": "2601.10599", "doi": "10.48550/arXiv.2601.10599"}
+{"id": "interactioncentric-cybersecurity-risks-2026", "title": "Interaction-Centric Cybersecurity Risks in LLM-Powered Dialogue Systems", "authors": ["Pratyush Jena", "Samira Zad", "Rehan Akbar"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CCWC67433.2026.11393850", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/CCWC67433.2026.11393850"}
+{"id": "generative-ai-paradox-2026", "title": "The Generative AI Paradox: GenAI and the Erosion of Trust, the Corrosion of Information Verification, and the Demise of Truth", "authors": ["Emilio Ferrara"], "year": 2026, "venue": "Future Internet", "source_url": "https://arxiv.org/abs/2601.00306", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) now produces text, images, audio, and video that can be perceptually convincing at scale and at negligible marginal cost. While public debate often frames the associated harms as", "arxiv_id": "2601.00306", "doi": "10.3390/fi18020073"}
+{"id": "learning-from-negative-2025", "title": "Learning from Negative Examples: Why Warning-Framed Training Data Teaches What It Warns Against", "authors": ["Tsogt-Ochir Enkhbayar"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.22293", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Warning-framed content in training data (e.g.,\"DO NOT USE - this code is vulnerable\") does not, it turns out, teach language models to avoid the warned-against behavior. In experiments reported here, ", "arxiv_id": "2512.22293", "doi": "10.48550/arXiv.2512.22293"}
+{"id": "artificial-just-artful-2025", "title": "Artificial or Just Artful? Do LLMs Bend the Rules in Programming?", "authors": ["O. Sghaier", "Kevin Delcourt", "H. Sahraoui"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.21028", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are widely used for automated code generation, yet their apparent successes often mask a tension between pretraining objectives and alignment choices. While pretraining en", "arxiv_id": "2512.21028", "doi": "10.48550/arXiv.2512.21028"}
+{"id": "neural-chameleons-language-2025", "title": "Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors", "authors": ["Max McGuinness", "Alex Serrano", "Luke Bailey", "Scott Emmons"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.11949", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Activation monitoring, which probes a model's internal states using lightweight classifiers, is an emerging tool for AI safety. However, its worst-case robustness under a misalignment threat model--wh", "arxiv_id": "2512.11949", "doi": "10.48550/arXiv.2512.11949"}
+{"id": "persistent-backdoor-attacks-2025", "title": "Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs", "authors": ["Jing Cui", "Yufei Han", "Jianbin Jiao", "Junge Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.14741", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the implanted backd", "arxiv_id": "2512.14741", "doi": "10.48550/arXiv.2512.14741"}
+{"id": "training-llms-honesty-2025", "title": "Training LLMs for Honesty via Confessions", "authors": ["Manas R. Joglekar", "Jeremy Chen", "Ga Wu", "J. Yosinski", "Jasmine Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08093", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) can be dishonest when reporting on their actions and beliefs -- for example, they may overstate their confidence in factual claims or cover up evidence of covert actions. ", "arxiv_id": "2512.08093", "doi": "10.48550/arXiv.2512.08093"}
+{"id": "sok-comprehensive-causality-2025", "title": "SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security", "authors": ["Wei Zhao", "Zhe Li", "Junfeng Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.04841", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) exhibit remarkable capabilities but remain vulnerable to adversarial manipulations such as jailbreaking, where crafted prompts bypass safety mechanisms. Understanding the ", "arxiv_id": "2512.04841", "doi": "10.48550/arXiv.2512.04841"}
+{"id": "martingale-score-unsupervised-2025", "title": "Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning", "authors": ["Zhonghao He", "Tianyi Alex Qiu", "Hirokazu Shirado", "Maarten Sap"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.02914", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in reasoning techniques have substantially improved the performance of large language models (LLMs), raising expectations for their ability to provide accurate, truthful, and reliable ", "arxiv_id": "2512.02914", "doi": "10.48550/arXiv.2512.02914"}
+{"id": "crossllm-generalization-behavioral-2025", "title": "Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains", "authors": ["Arun Chowdary Sanna"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19874", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI agents become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. While previous work has demon", "arxiv_id": "2511.19874", "doi": "10.48550/arXiv.2511.19874"}
+{"id": "devil-details-emergent-2025", "title": "The Devil in the Details: Emergent Misalignment, Format and Coherence in Open-Weights LLMs", "authors": ["C. Dickson"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.20104", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prior work has shown that fine-tuning models on a narrow domain with misaligned data can lead to broad misalignment - a phenomenon termed\"emergent misalignment\"(Betley et al. 2025). While all tested m", "arxiv_id": "2511.20104", "doi": "10.48550/arXiv.2511.20104"}
+{"id": "why-do-language-2025", "title": "Why Do Language Model Agents Whistleblow?", "authors": ["Kushal Agrawal", "Frank Xiao", "Guido Bergman", "Asa Cooper Stickland"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.17085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that contradic", "arxiv_id": "2511.17085", "doi": "10.48550/arXiv.2511.17085"}
+{"id": "detecting-sleeper-agents-2025", "title": "Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis", "authors": ["Shahin Zanbaghi", "Ryan Rostampour", "F. Abid", "Salim Al Jarmakani"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15992", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) can be backdoored to exhibit malicious behavior under specific deployment conditions while appearing safe during training a phenomenon known as\"sleeper agents.\"Recent work", "arxiv_id": "2511.15992", "doi": "10.48550/arXiv.2511.15992"}
+{"id": "vr-20-realtime-2025", "title": "VR 2.0 in Real-Time Industrial Safety Training for Hazard Recognition", "authors": ["A. Radhika", "T. Bhaskar", "A. Veerender", "M. Kumar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IETACS68750.2025.11385613", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The reasons of industrial accidents are usually lack of proper hazard recognition and inefficient safety training techniques. The paper is a Virtual Reality 2.0 (VR 2.0) system that combines AI-induce", "doi": "10.1109/IETACS68750.2025.11385613"}
+{"id": "dangers-poisoned-llms-2025", "title": "On The Dangers of Poisoned LLMs In Security Automation", "authors": ["Patrick Karlsen", "Even Eilertsen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.02600", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates some of the risks introduced by\"LLM poisoning,\"the intentional or unintentional introduction of malicious or biased data during model training. We demonstrate how a seemingly i", "arxiv_id": "2511.02600", "doi": "10.48550/arXiv.2511.02600"}
+{"id": "vibe-learning-education-2025", "title": "Vibe Learning: Education in the age of AI", "authors": ["Marcos Florencio", "Francielle Prieto"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01956", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The debate over whether\"thinking machines\"could replace human intellectual labor has existed in both public and expert discussions since the mid-twentieth century, when the concept and terminology of ", "arxiv_id": "2511.01956", "doi": "10.48550/arXiv.2511.01956"}
+{"id": "layer-truth-probing-2025", "title": "Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning", "authors": ["S. Churina", "Niranjan Chebrolu", "Kokil Jaidka"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.26829", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We show that continual pretraining on plausible misinformation can overwrite specific factual knowledge in large language models without degrading overall performance. Unlike prior poisoning work unde", "arxiv_id": "2510.26829", "doi": "10.48550/arXiv.2510.26829"}
+{"id": "multistakeholder-alignment-llmpowered-2025", "title": "Multi-Stakeholder Alignment in LLM-Powered Collaborative AI Systems: A Multi-Agent Framework for Intelligent Tutoring", "authors": ["A. P. Uchoa", "Carlo E. T. Oliveira", "Cláudia L. R. Motta", "Daniel Schneider"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23245", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models into Intelligent Tutoring Systems pre-sents significant challenges in aligning with diverse and often conflicting values from students, parents, teachers, and ", "arxiv_id": "2510.23245", "doi": "10.48550/arXiv.2510.23245"}
+{"id": "learning-partneraware-collaborators-2025", "title": "Learning \"Partner-Aware\" Collaborators in Multi-Party Collaboration", "authors": ["Abhijnan Nath", "Nikhil Krishnaswamy"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.22462", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly being deployed in agentic settings where they act as collaborators with humans. Therefore, it is increasingly important to be able to evaluate their abili", "arxiv_id": "2510.22462", "doi": "10.48550/arXiv.2510.22462"}
+{"id": "scalable-oversight-partitioned-2025", "title": "Towards Scalable Oversight via Partitioned Human Supervision", "authors": ["R. Yin", "Takashi Ishida", "Masashi Sugiyama"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.22500", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As artificial intelligence (AI) systems approach and surpass expert human performance across a broad range of tasks, obtaining high-quality human supervision for evaluation and training becomes increa", "arxiv_id": "2510.22500"}
+{"id": "lockin-phase-hypothesis-2025", "title": "The Lock-In Phase Hypothesis: Identity Consolidation as a Precursor to AGI", "authors": ["Marcelo M. Amaral", "Raymond Aschheim"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.20190", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) remain broadly open and highly steerable: they imitate at scale, accept arbitrary system prompts, and readily adopt multiple personae. By analogy to human development, we ", "arxiv_id": "2510.20190", "doi": "10.48550/arXiv.2510.20190"}
+{"id": "concrete-roadmap-safety-2025", "title": "A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring", "authors": ["Julia Schulz"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19476", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI systems approach dangerous capability levels where inability safety cases become insufficient, we need alternative approaches to ensure safety. This paper presents a roadmap for constructing saf", "arxiv_id": "2510.19476", "doi": "10.48550/arXiv.2510.19476"}
+{"id": "subliminal-corruption-mechanisms-2025", "title": "Subliminal Corruption: Mechanisms, Thresholds, and Interpretability", "authors": ["Reya Vir", "S. Bhatnagar"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19152", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As machine learning models are increasingly fine-tuned on synthetic data, there is a critical risk of subtle misalignments spreading through interconnected AI systems. This paper investigates sublimin", "arxiv_id": "2510.19152", "doi": "10.48550/arXiv.2510.19152"}
+{"id": "can-reasoning-models-2025", "title": "Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability", "authors": ["Artur Zolkowski", "W. Xing", "David Lindner", "F. Tramèr", "Erik Jenner"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19851", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent findings suggest that misaligned models may exhibit deceptive behavior, raising concerns about output trustworthiness. Chain-of-thought (CoT) is a promising tool for alignment monitoring: when ", "arxiv_id": "2510.19851", "doi": "10.48550/arXiv.2510.19851"}
+{"id": "catch-me-if-2025", "title": "Catch Me If You Can: Rogue AI Detection and Correction at Scale", "authors": ["Fatemeh Stodt", "Jan Stodt", "Mohammed B. Alshawki", "Javad Salimi Sratakhti", "Christoph Reich"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics14204122", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern AI systems can strategically misreport information when incentives diverge from truthfulness, posing risks for oversight and deployment. Prior studies often examine this behavior within a singl", "doi": "10.3390/electronics14204122"}
+{"id": "forgetting-forget-attention-2025", "title": "Forgetting to Forget: Attention Sink as A Gateway for Backdooring LLM Unlearning", "authors": ["Bingqi Shang", "Yiwei Chen", "Yihua Zhang", "Bingquan Shen", "Sijia Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.17021", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) unlearning has become a critical mechanism for removing undesired data, knowledge, or behaviors from pre-trained models while retaining their general utility. Yet, with the ", "arxiv_id": "2510.17021", "doi": "10.48550/arXiv.2510.17021"}
+{"id": "detecting-adversarial-finetuning-2025", "title": "Detecting Adversarial Fine-tuning with Auditing Agents", "authors": ["Sarah Egler", "John Schulman", "Nicholas Carlini"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.16255", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) providers expose fine-tuning APIs that let end users fine-tune their frontier LLMs. Unfortunately, it has been shown that an adversary with fine-tuning access to an LLM can ", "arxiv_id": "2510.16255", "doi": "10.48550/arXiv.2510.16255"}
+{"id": "evaluating-reducing-deceptive-2025", "title": "Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL", "authors": ["Marwa Abdulhai", "Ryan Cheng", "Aryansh Shrivastava", "Natasha Jaques", "Yarin Gal"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.14318", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare. However, their ability to produce deceptive outputs, whether", "arxiv_id": "2510.14318", "doi": "10.48550/arXiv.2510.14318"}
+{"id": "pots-proofoftrainingsteps-backdoor-2025", "title": "PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models", "authors": ["Issam Seddik", "Sami Souihi", "Mohamed Tamaazousti", "S. Piergiovanni"], "year": 2025, "venue": "2025 3rd International Conference on Foundation and Large Language Models (FLLM)", "source_url": "https://arxiv.org/abs/2510.15106", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models (LLMs) gain traction across critical domains, ensuring secure and trustworthy training processes has become a major concern. Backdoor attacks, among various threats—where mali", "arxiv_id": "2510.15106", "doi": "10.1109/FLLM67465.2025.11391059"}
+{"id": "ai-alignment-contemporary-2025", "title": "AI Alignment: A Contemporary Survey", "authors": ["Jiaming Ji", "Tianyi Qiu", "Boyuan Chen", "Jiayi Zhou", "Borong Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3770749", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview", "doi": "10.1145/3770749"}
+{"id": "narrow-finetuning-leaves-2025", "title": "Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences", "authors": ["Julian Minder", "Clément Dumas", "Stewart Slocum", "Helena Casademunt", "Cameron Holmes"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.13900", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Finetuning on narrow domains has become an essential tool to adapt Large Language Models (LLMs) to specific tasks and to create models with known unusual properties that are useful for research. We sh", "arxiv_id": "2510.13900", "doi": "10.48550/arXiv.2510.13900"}
+{"id": "ai-alignment-strategies-2025", "title": "AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?", "authors": ["Leonard Dung", "Florian Mai"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.11235", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI alignment research aims to develop techniques to ensure that AI systems do not cause harm. However, every alignment technique has failure modes, which are conditions in which there is a non-negligi", "arxiv_id": "2510.11235", "doi": "10.48550/arXiv.2510.11235"}
+{"id": "one-token-embedding-2025", "title": "One Token Embedding Is Enough to Deadlock Your Large Reasoning Model", "authors": ["Mohan Zhang", "Yihua Zhang", "Jinghan Jia", "Zhangyang Wang", "Sijia Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.15965", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern large reasoning models (LRMs) exhibit impressive multi-step problem-solving via chain-of-thought (CoT) reasoning. However, this iterative thinking mechanism introduces a new vulnerability surfa", "arxiv_id": "2510.15965", "doi": "10.48550/arXiv.2510.15965"}
+{"id": "thinking-longer-not-2025", "title": "Thinking Longer, Not Always Smarter: Evaluating LLM Capabilities in Hierarchical Legal Reasoning", "authors": ["Li Zhang", "Matthias Grabmair", "Morgan A. Gray", "Kevin Ashley"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.08710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Case-based reasoning is a cornerstone of U.S. legal practice, requiring professionals to argue about a current case by drawing analogies to and distinguishing from past precedents. While Large Languag", "arxiv_id": "2510.08710", "doi": "10.1145/3788646.3789522"}
+{"id": "poisoning-attacks-llms-2025", "title": "Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples", "authors": ["Alexandra Souly", "Javier Rando", "Ed Chapman", "Xander Davies", "Burak Hasircioglu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.07192", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversar", "arxiv_id": "2510.07192", "doi": "10.48550/arXiv.2510.07192"}
+{"id": "unified-threat-detection-2025", "title": "Unified Threat Detection and Mitigation Framework (UTDMF): Combating Prompt Injection, Deception, and Bias in Enterprise-Scale Transformers", "authors": ["S. Ravindran"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.04528", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid adoption of large language models (LLMs) in enterprise systems exposes vulnerabilities to prompt injection attacks, strategic deception, and biased outputs, threatening security, trust, and ", "arxiv_id": "2510.04528", "doi": "10.48550/arXiv.2510.04528"}
+{"id": "from-poisoned-aware-2025", "title": "From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs", "authors": ["Guangyu Shen", "Siyuan Cheng", "Xiangzhe Xu", "Yuan Zhou", "Hanxi Guo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.05169", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) can acquire deceptive behaviors through backdoor attacks, where the model executes prohibited actions whenever secret triggers appear in the input. Existing safety trainin", "arxiv_id": "2510.05169", "doi": "10.48550/arXiv.2510.05169"}
+{"id": "lhdeception-simulating-understanding-2025", "title": "LH-Deception: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions", "authors": ["Yang Xu", "Xuanming Zhang", "Min-Hsuan Yeh", "J. Dhamala", "Ousmane Dia"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.03999", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deception is a pervasive feature of human communication and an emerging concern in large language models (LLMs). While recent studies document instances of LLM deception, most evaluations remain confi", "arxiv_id": "2510.03999"}
+{"id": "backdoorpowered-prompt-injection-2025", "title": "Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods", "authors": ["Yulin Chen", "Haoran Li", "Yuan Sui", "Yangqiu Song", "Bryan Hooi"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2510.03705", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the development of technology, large language models (LLMs) have dominated the downstream natural language processing (NLP) tasks. However, because of the LLMs'instruction-following abilities and", "arxiv_id": "2510.03705", "doi": "10.48550/arXiv.2510.03705"}
+{"id": "malice-agentland-down-2025", "title": "Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain", "authors": ["L'eo Boisvert", "Abhay Puri", "Chandra Kiran Reddy Evuru", "Nicolas Chapados", "Quentin Cappart"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.05159", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The practice of fine-tuning AI agents on data from their own interactions--such as web browsing or tool use--, while being a strong general recipe for improving agentic capabilities, also introduces a", "arxiv_id": "2510.05159", "doi": "10.48550/arXiv.2510.05159"}
+{"id": "microsaccadeinspired-probing-positional-2025", "title": "Microsaccade-Inspired Probing: Positional Encoding Perturbations Reveal LLM Misbehaviours", "authors": ["Rui Melo", "Rui Abreu", "C. Păsăreanu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.01288", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We draw inspiration from microsaccades, tiny involuntary eye movements that reveal hidden dynamics of human perception, to propose an analogous probing method for large language models (LLMs). Just as", "arxiv_id": "2510.01288", "doi": "10.48550/arXiv.2510.01288"}
+{"id": "gspr-aligning-llm-2025", "title": "GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners", "authors": ["Haoran Li", "Yulin Chen", "Jin Zeng", "Hao Peng", "Huihao Jing"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.24418", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language models (LLMs) are increasingly integrated into numerous applications across various domains, LLMs'safety becomes a critical concern for both application developers and intended users", "arxiv_id": "2509.24418", "doi": "10.48550/arXiv.2509.24418"}
+{"id": "dive-into-agent-2025", "title": "Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents", "authors": ["Boxuan Zhang", "Yi Yu", "Jiaxuan Guo", "Jing Shao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.25302", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The widespread deployment of Large Language Model (LLM) agents across real-world applications has unlocked tremendous potential, while raising some safety concerns. Among these concerns, the self-repl", "arxiv_id": "2509.25302", "doi": "10.48550/arXiv.2509.25302"}
+{"id": "understanding-subliminal-learning-2025", "title": "Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer", "authors": ["Simon Schrodi", "Elias Kempf", "Fazl Barez", "Thomas Brox"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23886", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models can transfer hidden biases during distillation. For example, a teacher that\"likes owls\"can make its student\"like owls\"too, even when the training data consists only of lists of numbers", "arxiv_id": "2509.23886", "doi": "10.48550/arXiv.2509.23886"}
+{"id": "virus-infection-attack-2025", "title": "Virus Infection Attack on LLMs: Your Poisoning Can Spread \"VIA\" Synthetic Data", "authors": ["Zi Liang", "Qingqing Ye", "Alex X. Liu", "Yanyun Wang", "Jianliang Xu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23041", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widel", "arxiv_id": "2509.23041", "doi": "10.48550/arXiv.2509.23041"}
+{"id": "antiregulatory-ai-how-2025", "title": "Anti-Regulatory AI: How \"AI Safety\" is Leveraged Against Regulatory Oversight", "authors": ["Rui-Jie Yew", "Brian Judge"], "year": 2025, "venue": "Conference on Equity and Access in Algorithms, Mechanisms, and Optimization", "source_url": "https://arxiv.org/abs/2509.22872", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI companies increasingly develop and deploy privacy-enhancing technologies, bias-constraining measures, evaluation frameworks, and alignment techniques — framing them as addressing concerns related t", "arxiv_id": "2509.22872", "doi": "10.1145/3757887.3763017"}
+{"id": "can-large-language-2025", "title": "Can Large Language Models Develop Gambling Addiction?", "authors": ["Seungpil Lee", "Donghyeon Shin", "Yu-Ang Lee", "Sundong Kim"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.22818", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study identifies the specific conditions under which large language models exhibit human-like gambling addiction patterns, providing critical insights into their decision-making mechanisms and AI", "arxiv_id": "2509.22818", "doi": "10.48550/arXiv.2509.22818"}
+{"id": "backdoor-attribution-elucidating-2025", "title": "Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models", "authors": ["Miao Yu", "Zhenhong Zhou", "Moayad Aloqaily", "Kun Wang", "Biwei Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21761", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fine-tuned Large Language Models (LLMs) are vulnerable to backdoor attacks through data poisoning, yet the internal mechanisms governing these attacks remain a black box. Previous research on interpre", "arxiv_id": "2509.21761", "doi": "10.48550/arXiv.2509.21761"}
+{"id": "where-did-it-2025", "title": "Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing", "authors": ["Zhe Li", "Wei Zhao", "Yige Li", "Junfeng Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.02334", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their deployment is frequently undermined by undesirable behaviors such as generating harmful content, factual inaccuracies,", "arxiv_id": "2510.02334", "doi": "10.48550/arXiv.2510.02334"}
+{"id": "bigrpo-bidirectional-optimization-2025", "title": "bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs", "authors": ["Wence Ji", "Jiancan Wu", "Ai-Guo Li", "Shuyi Zhang", "Junkang Wu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.19775", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of large language models (LLMs), their robustness against adversarial manipulations, particularly jailbreak backdoor attacks, has become critically important. Existing appro", "arxiv_id": "2509.19775", "doi": "10.48550/arXiv.2509.19775"}
+{"id": "drex-benchmark-detecting-2025", "title": "D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models", "authors": ["Satyapriya Krishna", "Andy Zou", "Rahul Gupta", "E. Jones", "Nick Winter"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.17938", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The safety and alignment of Large Language Models (LLMs) are critical for their responsible deployment. Current evaluation methods predominantly focus on identifying and preventing overtly harmful out", "arxiv_id": "2509.17938", "doi": "10.48550/arXiv.2509.17938"}
+{"id": "domainspecific-constitutional-ai-2025", "title": "Domain-Specific Constitutional AI: Enhancing Safety in LLM-Powered Mental Health Chatbots", "authors": ["Chenhan Lyu", "Yutong Song", "Pengfei Zhang", "Amir M. Rahmani"], "year": 2025, "venue": "International Conference on Wearable and Implantable Body Sensor Networks", "source_url": "https://arxiv.org/abs/2509.16444", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Mental health applications have emerged as a critical area in computational health, driven by rising global rates of mental illness, the integration of AI in psychological care, and the need for scala", "arxiv_id": "2509.16444", "doi": "10.1109/BSN66969.2025.11337405"}
+{"id": "aquallm-evaluating-accuracy-2025", "title": "AQUA-LLM: Evaluating Accuracy, Quantization, and Adversarial Robustness Trade-offs in LLMs for Cybersecurity Question Answering", "authors": ["Onat Güngör", "Roshan Sood", "Harold Wang", "Tajana Simunic"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.13514", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have recently demonstrated strong potential for cybersecurity question answering (QA), supporting decision-making in real-time threat detection and response workflows. How", "arxiv_id": "2509.13514", "doi": "10.48550/arXiv.2509.13514"}
+{"id": "comprehensive-survey-trustworthiness-2025", "title": "A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models", "authors": ["Yanbo Wang", "Yongcan Yu", "Jian Liang", "R. He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.03871", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The development of Long-CoT reasoning has advanced LLM performance across various tasks, including language understanding, complex problem solving, and code generation. This paradigm enables models to", "arxiv_id": "2509.03871", "doi": "10.48550/arXiv.2509.03871"}
+{"id": "lethe-purifying-backdoored-2025", "title": "Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution", "authors": ["Chen Chen", "Yuchen Sun", "Jiaxin Gao", "Xueluan Gong", "Qian Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.21004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have seen significant advancements, achieving superior performance in various Natural Language Processing (NLP) tasks. However, they remain vulnerable to backdoor attacks,", "arxiv_id": "2508.21004", "doi": "10.48550/arXiv.2508.21004"}
+{"id": "poison-once-refuse-2025", "title": "Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs", "authors": ["Md. Abdullah Al Mamun", "Ihsen Alouani", "Nael B. Abu-Ghazaleh"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.20333", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are aligned to meet ethical standards and safety requirements by training them to refuse answering harmful or unsafe prompts. In this paper, we demonstrate how adversaries", "arxiv_id": "2508.20333", "doi": "10.48550/arXiv.2508.20333"}
+{"id": "investigation-group-query-2025", "title": "An Investigation on Group Query Hallucination Attacks", "authors": ["Kehao Miao", "Xiaolong Jin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.19321", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the widespread use of large language models (LLMs), understanding their potential failure modes during user interactions is essential. In practice, users often pose multiple questions in a single", "arxiv_id": "2508.19321", "doi": "10.48550/arXiv.2508.19321"}
+{"id": "attacking-llms-ai-2025", "title": "Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models", "authors": ["Qiming Guo", "Jinwen Tang", "Xin Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.17674", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through tw", "arxiv_id": "2508.17674", "doi": "10.48550/arXiv.2508.17674"}
+{"id": "large-language-models-2025", "title": "Large language models for clinical decision support in gastroenterology and hepatology", "authors": ["I. Wiest", "Mamatha Bhat", "Jan Clusmann", "Carolin V. Schneider", "Xiaofeng Jiang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41575-025-01108-1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1038/s41575-025-01108-1"}
+{"id": "conceptguard-neurosymbolic-safety-2025", "title": "ConceptGuard: Neuro-Symbolic Safety Guardrails via Sparse Interpretable Jailbreak Concepts", "authors": ["Darpan Aswal", "C'eline Hudelot"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2508.16325", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models have found success in a variety of applications. However, their safety remains a concern due to the existence of various jailbreaking methods. Despite significant efforts, alignm", "arxiv_id": "2508.16325"}
+{"id": "mechanistic-exploration-backdoored-2025", "title": "Mechanistic Exploration of Backdoored Large Language Model Attention Patterns", "authors": ["M. Baker", "Lakshmi Babu Saheer"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.15847", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Backdoor attacks creating'sleeper agents'in large language models (LLMs) pose significant safety risks. This study employs mechanistic interpretability to explore resulting internal structural differe", "arxiv_id": "2508.15847", "doi": "10.48550/arXiv.2508.15847"}
+{"id": "ai-testing-should-2025", "title": "AI Testing Should Account for Sophisticated Strategic Behaviour", "authors": ["Vojtěch Kovařík", "Eric Chen", "Sami Petersen", "Alexis Ghersengorin", "Vincent Conitzer"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.14927", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This position paper argues for two claims regarding AI testing and evaluation. First, to remain informative about deployment behaviour, evaluations need account for the possibility that AI systems und", "arxiv_id": "2508.14927", "doi": "10.48550/arXiv.2508.14927"}
+{"id": "ciata-risk-assessment-2025", "title": "CIA+TA Risk Assessment for AI Reasoning Vulnerabilities", "authors": ["Yuksel Aydin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.15839", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI systems increasingly influence critical decisions, they face threats that exploit reasoning mechanisms rather than technical infrastructure. We present a framework for cognitive cybersecurity, a", "arxiv_id": "2508.15839", "doi": "10.48550/arXiv.2508.15839"}
+{"id": "sovereignty-usages-generative-2025", "title": "Sovereignty and usages of Generative Artificial Intelligence in Academic Work", "authors": ["Julien Lesbegueries", "Myriam Lamolle"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/GACLM67198.2025.11232118", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Artificial Intelligence (AI), particularly Large Language Models, are increasingly adopted within academic settings to accelerate research and enhance or evolve pedagogical practices. While", "doi": "10.1109/GACLM67198.2025.11232118"}
+{"id": "backdoor-threats-large-2025", "title": "Backdoor threats in large language models—a survey", "authors": ["Shuai Liu", "Yiheng Pan", "K. Hong", "Ruite Fei", "Chenhao Lin"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s11432-024-4351-3", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s11432-024-4351-3"}
+{"id": "efficient-switchable-safety-2025", "title": "Efficient Switchable Safety Control in LLMs via Magic-Token-Guided Co-Training", "authors": ["Jianfeng Si", "Lin Sun", "Zhewen Tan", "Xiangzheng Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.14904", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current methods for content safety in Large Language Models (LLMs), such as Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), often rely on multi-stage training pipel", "arxiv_id": "2508.14904", "doi": "10.48550/arXiv.2508.14904"}
+{"id": "beyond-promptinduced-lies-2025", "title": "Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts", "authors": ["Zhaomin Wu", "Mingzhe Du", "See-kiong Ng", "Bingsheng He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.06361", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are widely deployed in reasoning, planning, and decision-making tasks, making their trustworthiness critical. A significant and underexplored risk is intentional deception", "arxiv_id": "2508.06361", "doi": "10.48550/arXiv.2508.06361"}
+{"id": "integrated-alignment-2025", "title": "Towards Integrated Alignment", "authors": ["B. Y. Reis", "W. L. Cava"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.06592", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI adoption expands across human society, the problem of aligning AI models to match human preferences remains a grand challenge. Currently, the AI alignment field is deeply divided between behavio", "arxiv_id": "2508.06592", "doi": "10.48550/arXiv.2508.06592"}
+{"id": "backdoor-samples-detection-2025", "title": "Backdoor samples detection based on perturbation discrepancy consistency in pre-trained language models", "authors": ["Zuquan Peng", "Jianming Fu", "Lixin Zou", "Li Zheng", "Yanzhen Ren"], "year": 2025, "venue": "Neural Networks", "source_url": "https://arxiv.org/abs/2509.05318", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "arxiv_id": "2509.05318", "doi": "10.1016/j.neunet.2025.108025"}
+{"id": "single-direction-truth-2025", "title": "A Single Direction of Truth: An Observer Model's Linear Residual Probe Exposes and Steers Contextual Hallucinations", "authors": ["Charles O'Neill", "Slava Chalnev", "Chi Chi Zhao", "Max Kirkby", "Mudith Jayasekara"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.23221", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Contextual hallucinations -- statements unsupported by given context -- remain a significant challenge in AI. We demonstrate a practical interpretability insight: a generator-agnostic observer model d", "arxiv_id": "2507.23221", "doi": "10.48550/arXiv.2507.23221"}
+{"id": "watch-weights-unsupervised-2025", "title": "Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs", "authors": ["Ziqian Zhong", "Aditi Raghunathan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.00161", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The releases of powerful open-weight large language models (LLMs) are often not accompanied by access to their full training data. Existing interpretability methods, particularly those based on activa", "arxiv_id": "2508.00161", "doi": "10.48550/arXiv.2508.00161"}
+{"id": "how-personnel-security-2025", "title": "How Personnel Security can Inform the New World of AI Insider Risk", "authors": ["Paul Martin", "Sarah Mercer"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1080/03071847.2025.2550122", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: There is currently no meaningful analysis of the interplay between the rapidly evolving domain of AI and the traditional world of personnel security, despite rapid growth in the use of AI in business ", "doi": "10.1080/03071847.2025.2550122"}
+{"id": "subliminal-learning-language-2025", "title": "Subliminal Learning: Language models transmit behavioral traits via hidden signals in data", "authors": ["Alex Cloud", "Minh Le", "James Chua", "Jan Betley", "Anna Sztyber-Betley"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.14805", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a\"teacher\"model with some trait T (such", "arxiv_id": "2507.14805", "doi": "10.48550/arXiv.2507.14805"}
+{"id": "webguard-building-generalizable-2025", "title": "WebGuard: Building a Generalizable Guardrail for Web Agents", "authors": ["Boyuan Zheng", "Zeyi Liao", "Scott Salisbury", "Z. Liu", "Michael Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.14293", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid development of autonomous web agents powered by Large Language Models (LLMs), while greatly elevating efficiency, exposes the frontier risk of taking unintended or harmful actions. This situ", "arxiv_id": "2507.14293", "doi": "10.48550/arXiv.2507.14293"}
+{"id": "implementing-grassroots-logic-2026", "title": "Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI", "authors": ["Ehud Shapiro"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06934", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Grassroots Logic Programs (GLP) is a concurrent logic programming language with variables partitioned into paired \\emph{readers} and \\emph{writers}, conjuring both linear logic and futures/promises: a", "arxiv_id": "2602.06934"}
+{"id": "types-grassroots-logic-2026", "title": "Types for Grassroots Logic Programs", "authors": ["Ehud Shapiro"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17957", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer", "arxiv_id": "2601.17957", "doi": "10.48550/arXiv.2601.17957"}
+{"id": "new-compiler-stack-2026", "title": "The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers", "authors": ["Shuoming Zhang", "Jiacheng Zhao", "Qiuchun Yu", "Chunwei Xia", "Zheng Wang"], "year": 2026, "venue": "CCF Transactions on High Performance Computing", "source_url": "https://arxiv.org/abs/2601.02045", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This survey has provided a systematic overview of the emerging field of LLM-enabled compilation by addressing several key research questions. We first answered how LLMs are being integrated by proposi", "arxiv_id": "2601.02045", "doi": "10.1007/s42514-025-00270-x"}
+{"id": "vsavisualstructural-alignment-uitocode-2025", "title": "VSA:Visual-Structural Alignment for UI-to-Code", "authors": ["Xian Wu", "Ming Zhang", "Zhiyu Fang", "Fei Li", "Bin Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.20034", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The automation of user interface development has the potential to accelerate software delivery by mitigating intensive manual implementation. Despite the advancements in Large Multimodal Models for de", "arxiv_id": "2512.20034", "doi": "10.48550/arXiv.2512.20034"}
+{"id": "modular-layout-synthesis-2025", "title": "Modular Layout Synthesis (MLS): Front-end Code via Structure Normalization and Constrained Generation", "authors": ["Chong Liu", "Ming Zhang", "Fei Li", "Hao Zhou", "Xiaoshuang Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.18996", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated front-end engineering drastically reduces development cycles and minimizes manual coding overhead. While Generative AI has shown promise in translating designs to code, current solutions oft", "arxiv_id": "2512.18996", "doi": "10.48550/arXiv.2512.18996"}
+{"id": "prompt-less-smile-2025", "title": "Prompt Less, Smile More: MTP with Semantic Engineering in Lieu of Prompt Engineering", "authors": ["Jayanaka L. Dantanarayana", "Savini Kashmira", "Thakee Nathees", "Zichen Zhang", "K. Flautner"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19427", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI-Integrated programming is emerging as a foundational paradigm for building intelligent systems with large language models (LLMs). Recent approaches such as Meaning Typed Programming (MTP) automate ", "arxiv_id": "2511.19427", "doi": "10.48550/arXiv.2511.19427"}
+{"id": "agint-agentic-graph-2025", "title": "Agint: Agentic Graph Compilation for Software Engineering Agents", "authors": ["Abhi Chivukula", "Jay Somasundaram", "VijayaSai Somasundaram"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19635", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based coding agents are increasingly common but still face challenges in context management, latency, reliability, reproducibility, and scalability. We present Agint, an agentic graph compiler, in", "arxiv_id": "2511.19635", "doi": "10.48550/arXiv.2511.19635"}
+{"id": "case-learned-cloud-2025", "title": "A Case for Learned Cloud Emulators", "authors": ["Archit Bhatnagar", "Yiming Qiu", "Sarah McClure", "Sylvia Ratnasamy", "Ang Chen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3772356.3772404", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Creating and maintaining cloud infrastructure via \"DevOps programs\" is essential to using the cloud. However, developing and testing the DevOps programs requires resource provisioning in the cloud, wh", "doi": "10.1145/3772356.3772404"}
+{"id": "multiple-schemaconformant-declarative-2025", "title": "Multiple Schema-Conformant Declarative Code Generation", "authors": ["Mehant Kammakomati", "Srikanth Tamilselvam"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASE63991.2025.00344", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many enterprise systems including large-scale deployment platforms like Ansible provide a declarative user interface through programming languages like JavaScript Object Notation (JSON). These systems", "doi": "10.1109/ASE63991.2025.00344"}
+{"id": "reducing-hallucinations-llmgenerated-2025", "title": "Reducing Hallucinations in LLM-Generated Code via Semantic Triangulation", "authors": ["Yihan Dai", "Sijie Liang", "Haotian Xu", "Peichu Xie", "Sergey Mechtaev"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.12288", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When generating code from natural language prompts, an LLM samples programs from a probability distribution, many of which might be incorrect. Sample consensus techniques - such as majority voting or ", "arxiv_id": "2511.12288", "doi": "10.48550/arXiv.2511.12288"}
+{"id": "atlas-artifact-generation-2025", "title": "ATLAS: Artifact Generation Through Layered Constraints and LLM x MDE Synergy", "authors": ["Tong Ma", "Hui Lai", "Hui Wang", "Zhenhua Tian", "Jizhou Wang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.25890", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ATLAS unifies Large Language Models with Model-Driven Engineering to generate regulator-ready artifacts and machine-checkable evidence for safety- and compliance-critical domains. ATLAS integrates thr", "arxiv_id": "2510.25890"}
+{"id": "adaptrack-constrained-decoding-2025", "title": "AdapTrack: Constrained Decoding without Distorting LLM's Output Intent", "authors": ["Yongming Li", "Jia Li", "Ge Li", "Zhi Jin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.17376", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language model-based code generation and completion tools have been widely adopted, but they may sometimes produce code that does not meet necessary constraints, such as syntactic correctness or API e", "arxiv_id": "2510.17376", "doi": "10.48550/arXiv.2510.17376"}
+{"id": "learning-guarantee-type-2025", "title": "Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis", "authors": ["Zhechong Huang", "Zhao Zhang", "Ruyi Ji", "Tingxuan Xia", "Qihao Zhu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.10216", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate", "arxiv_id": "2510.10216"}
+{"id": "programming-language-techniques-2025", "title": "Programming Language Techniques for Bridging LLM Code Generation Semantic Gaps", "authors": ["Yalong Du", "Chaozheng Wang", "Huaijin Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3759425.3763383", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models have demonstrated remarkable capabilities in automated code generation, yet their statistical nature and black-box characteristics create significant semantic gaps manifested thr", "doi": "10.1145/3759425.3763383"}
+{"id": "position-vibe-coding-2025", "title": "Position: Vibe Coding Needs Vibe Reasoning: Improving Vibe Coding with Formal Verification", "authors": ["Jacqueline Mitchell", "Yasser Shaaban"], "year": 2025, "venue": "Proceedings of the 1st ACM SIGPLAN International Workshop on Language Models and Programming Languages", "source_url": "https://arxiv.org/abs/2511.00202", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: “Vibe coding” — the practice of developing software through iteratively conversing with a large language model (LLM) — has exploded in popularity within the last year. However, developers report key l", "arxiv_id": "2511.00202", "doi": "10.1145/3759425.3763390"}
+{"id": "automated-discovery-test-2025", "title": "Automated Discovery of Test Oracles for Database Management Systems Using LLMs", "authors": ["Qiuyang Mang", "Runyuan He", "Suyang Zhong", "Xiaoxuan Liu", "Huanchen Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.06663", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Since 2020, automated testing for Database Management Systems (DBMSs) has flourished, uncovering hundreds of bugs in widely-used systems. A cornerstone of these techniques is test oracle, which typica", "arxiv_id": "2510.06663", "doi": "10.48550/arXiv.2510.06663"}
+{"id": "play-by-type-2025", "title": "Play by the Type Rules: Inferring Constraints for LLM Functions in Declarative Programs", "authors": ["Parker Glenn", "Alfy Samuel", "Daben Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.20208", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrating LLM powered operators in declarative query languages allows for the combination of cheap and interpretable functions with powerful, generalizable language model reasoning. However, in orde", "arxiv_id": "2509.20208", "doi": "10.48550/arXiv.2509.20208"}
+{"id": "evaluating-mitigating-errors-2025", "title": "Evaluating and Mitigating Errors in LLM-Generated Web API Integrations", "authors": ["Daniel Maninger", "Leon Chemnitz", "Amir Molzam Sharifloo", "Tushar Lamba", "Jannis Brugger"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.20172", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: API integration is a cornerstone of our digital infrastructure, enabling software systems to connect and interact. However, as shown by many studies, writing or generating correct code to invoke APIs,", "arxiv_id": "2509.20172"}
+{"id": "refinestat-efficient-exploration-2025", "title": "REFINESTAT: Efficient Exploration for Probabilistic Program Synthesis", "authors": ["Madhav Kanda", "Shubham Ugare", "Sasa Misailovic"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.01082", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Probabilistic programming offers a powerful framework for modeling uncertainty, yet statistical model discovery in this domain entails navigating an immense search space under strict domain-specific c", "arxiv_id": "2509.01082", "doi": "10.48550/arXiv.2509.01082"}
+{"id": "chopchop-programmable-framework-2025", "title": "ChopChop: A Programmable Framework for Semantically Constraining the Output of Language Models", "authors": ["Shaan Nagy", "Timothy Zhou", "Nadia Polikarpova", "Loris D'antoni"], "year": 2025, "venue": "Proc. ACM Program. Lang.", "source_url": "https://arxiv.org/abs/2509.00360", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models (LMs) can generate code but cannot guarantee its correctness—often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding of", "arxiv_id": "2509.00360", "doi": "10.1145/3776708"}
+{"id": "llm-agents-generating-2025", "title": "LLM Agents for Generating Microservice-based Applications: how complex is your specification?", "authors": ["D. Yellin"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2508.20119", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper we evaluate the capabilities of LLM Agents in generating code for real-world problems. Specifically, we explore code synthesis for microservice-based applications, a widely used architec", "arxiv_id": "2508.20119"}
+{"id": "correctnessguaranteed-code-generation-2025", "title": "Correctness-Guaranteed Code Generation via Constrained Decoding", "authors": ["Lingxiao Li", "Salar Rahili", "Yiwei Zhao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.15866", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable dur", "arxiv_id": "2508.15866", "doi": "10.48550/arXiv.2508.15866"}
+{"id": "constrained-decoding-diffusion-2025", "title": "Constrained Decoding of Diffusion LLMs with Context-Free Grammars", "authors": ["Niels Mündler", "Jasper Dekoninck", "Martin T. Vechev"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.10111", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to ", "arxiv_id": "2508.10111", "doi": "10.48550/arXiv.2508.10111"}
+{"id": "formal-verification-llmgenerated-2025", "title": "Towards Formal Verification of LLM-Generated Code from Natural Language Prompts", "authors": ["Aaron Councilman", "David Fu", "Aryan Gupta", "Chengxiao Wang", "David Grove"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.13290", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the past few years LLMs have emerged as a tool that can aid programmers by taking natural language descriptions and generating code based on it. However, the reliability of LLM code generation and ", "arxiv_id": "2507.13290", "doi": "10.48550/arXiv.2507.13290"}
+{"id": "unveiling-potential-diffusion-2025", "title": "Unveiling the Potential of Diffusion Large Language Model in Controllable Generation", "authors": ["Zhen Xiong", "Yujun Cai", "Zhecheng Li", "Yiwei Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.04504", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Controllable generation is a fundamental task in NLP with many applications, providing a basis for function calling to agentic communication. However, even state-of-the-art autoregressive Large Langua", "arxiv_id": "2507.04504", "doi": "10.48550/arXiv.2507.04504"}
+{"id": "fast-controlled-generation-2025", "title": "Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling", "authors": ["Ben Lipkin", "Benjamin LeBrun", "Jacob Hoover Vigly", "João Loula", "David R. MacIver"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.05410", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is n", "arxiv_id": "2504.05410", "doi": "10.48550/arXiv.2504.05410"}
+{"id": "logically-constrained-decoding-2025", "title": "Logically Constrained Decoding", "authors": ["F. Ma", "Alan J. Hu", "Daniel Kreymer", "Daniel Li", "David Adkins"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.mathnlp-main.11", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Constrained decoding is a state-of-the-art technique for restricting the output of a Large Language Model (LLM) to obey syntactic rules, e.g., a regular expression or context-free grammar. In this pap", "doi": "10.18653/v1/2025.mathnlp-main.11"}
+{"id": "rustassistant-llms-fix-2025", "title": "RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code", "authors": ["Pantazis Deligiannis", "Akash Lal", "N. Mehrotra", "Rishi Kapoor Poddar", "Aseem Rastogi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE55347.2025.00022", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Rust programming language, with its safety guarantees, has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++. Th", "doi": "10.1109/ICSE55347.2025.00022"}
+{"id": "enhancing-code-generation-2025", "title": "Enhancing Code Generation for Low-Resource Languages: No Silver Bullet", "authors": ["Alessandro Giagnorio", "Alberto Martin-Lopez", "Gabriele Bavota"], "year": 2025, "venue": "IEEE International Conference on Program Comprehension", "source_url": "https://arxiv.org/abs/2501.19085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of ", "arxiv_id": "2501.19085", "doi": "10.1109/ICPC66645.2025.00058"}
+{"id": "qwen25-technical-report-2024", "title": "Qwen2.5 Technical Report", "authors": ["Qwen An Yang", "Baosong Yang", "Beichen Zhang", "Binyuan Hui", "Bo Zheng"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.15115", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved d", "arxiv_id": "2412.15115", "doi": "10.48550/arXiv.2412.15115"}
+{"id": "syzygy-dual-codetest-2024", "title": "Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis", "authors": ["Manish Shetty", "Naman Jain", "Adwait Godbole", "S. Seshia", "Koushik Sen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.14234", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite extensive usage in high-performance, low-level systems programming applications, C is susceptible to vulnerabilities due to manual memory management and unsafe pointer operations. Rust, a mode", "arxiv_id": "2412.14234", "doi": "10.48550/arXiv.2412.14234"}
+{"id": "statically-contextualizing-large-2024", "title": "Statically Contextualizing Large Language Models with Typed Holes", "authors": ["Andrew Blinn", "Xiang Li", "J. Kim", "Cyrus Omar"], "year": 2024, "venue": "Proc. ACM Program. Lang.", "source_url": "https://arxiv.org/abs/2409.00921", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate code ", "arxiv_id": "2409.00921", "doi": "10.1145/3689728"}
+{"id": "llama-3-herd-2024", "title": "The Llama 3 Herd of Models", "authors": ["Abhimanyu Dubey", "Abhinav Jauhri", "Abhinav Pandey", "Abhishek Kadian", "Ahmad Al-Dahle"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2407.21783", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support ", "arxiv_id": "2407.21783"}
+{"id": "gemma-2-improving-2024", "title": "Gemma 2: Improving Open Language Models at a Practical Size", "authors": ["Gemma Team Morgane Riviere", "Shreya Pathak", "Pier Giuseppe Sessa", "Cassidy Hardin", "Surya Bhupatiraju"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.00118", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we a", "arxiv_id": "2408.00118", "doi": "10.48550/arXiv.2408.00118"}
+{"id": "what-wrong-your-2024", "title": "What is wrong with your code generated by large language models? An extensive study", "authors": ["Shihan Dou", "Haoxiang Jia", "Shenxi Wu", "Huiyuan Zheng", "Weikang Zhou"], "year": 2024, "venue": "Science China Information Sciences", "source_url": "https://arxiv.org/abs/2407.06153", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predo", "arxiv_id": "2407.06153", "doi": "10.1007/s11432-025-4632-8"}
+{"id": "code-less-align-2024", "title": "Code Less, Align More: Efficient LLM Fine-tuning for Code Generation with Data Pruning", "authors": ["Yun-Da Tsai", "Mingjie Liu", "Haoxing Ren"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.05040", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent work targeting large language models (LLMs) for code generation demonstrated that increasing the amount of training data through synthetic code generation often leads to exceptional performance", "arxiv_id": "2407.05040", "doi": "10.48550/arXiv.2407.05040"}
+{"id": "comparative-study-dsl-2024", "title": "A Comparative Study of DSL Code Generation: Fine-Tuning vs. Optimized Retrieval Augmentation", "authors": ["Nastaran Bassamzadeh", "Chhaya Methani"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.02742", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Natural Language to Code Generation has made significant progress in recent years with the advent of Large Language Models(LLMs). While generation for general-purpose languages like C, C++, and Python", "arxiv_id": "2407.02742", "doi": "10.48550/arXiv.2407.02742"}
+{"id": "swtbench-testing-validating-2024", "title": "SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents", "authors": ["Niels Mündler", "Mark Niklas Müller", "Jingxuan He", "Martin T. Vechev"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2406.12952", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Rigorous software testing is crucial for developing and maintaining high-quality code, making automated test generation a promising avenue for both improving software quality and boosting the effectiv", "arxiv_id": "2406.12952", "doi": "10.52202/079017-2601"}
+{"id": "compilation-quotient-cq-2024", "title": "Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages", "authors": ["V. Szabó", "Dominik Winterer", "Zhendong Su"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.04778", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Today's programmers can choose from an exceptional range of programming languages, each with its own traits, purpose, and complexity. A key aspect of a language's complexity is how hard it is to compi", "arxiv_id": "2406.04778", "doi": "10.48550/arXiv.2406.04778"}
+{"id": "systematic-literature-review-2024", "title": "A Systematic Literature Review on Large Language Models for Automated Program Repair", "authors": ["Quanjun Zhang", "Chunrong Fang", "Yang Xie", "Yuxiang Ma", "Weisong Sun"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.01466", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) attempts to patch software bugs and reduce manual debugging efforts. Very recently, with the advances in Large Language Models (LLMs), an increasing number of APR techni", "arxiv_id": "2405.01466", "doi": "10.48550/arXiv.2405.01466"}
+{"id": "bugs-large-language-2024", "title": "Bugs in large language models generated code: an empirical study", "authors": ["Florian Tambon", "Arghavan Moradi-Dakhel", "Amin Nikanjam", "Foutse Khomh", "Michel C. Desmarais"], "year": 2024, "venue": "Empirical Software Engineering", "source_url": "https://arxiv.org/abs/2403.08937", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) for code have gained significant attention recently. They can generate code in different programming languages based on provided prompts, fulfilling a long-lasting dream i", "arxiv_id": "2403.08937", "doi": "10.1007/s10664-025-10614-4"}
+{"id": "syncode-llm-generation-2024", "title": "SynCode: LLM Generation with Grammar Augmentation", "authors": ["Shubham Ugare", "Tarun Suresh", "Hangoo Kang", "Sasa Misailovic", "Gagandeep Singh"], "year": 2024, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2403.01632", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typi", "arxiv_id": "2403.01632"}
+{"id": "constrained-decoding-fillinthemiddle-2024", "title": "Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars", "authors": ["Daniel Melcer", "Nathan Fulton", "Sanjay Krishna Gouda", "Haifeng Qian"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2402.17988", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models are powerful tools for program synthesis and advanced auto-completion, but come with no guarantee that their output code is syntactically correct. This paper contributes an incre", "arxiv_id": "2402.17988"}
+{"id": "guiding-llms-right-2024", "title": "Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation", "authors": ["Luca Beurer-Kellner", "Marc Fischer", "Martin T. Vechev"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2403.06988", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we sh", "arxiv_id": "2403.06988", "doi": "10.48550/arXiv.2403.06988"}
+{"id": "masked-hardattention-transformers-2023", "title": "Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages", "authors": ["Andy Yang", "David Chiang", "Dana Angluin"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2310.13897", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages. In this paper, we establish exact characterizations of", "arxiv_id": "2310.13897", "doi": "10.52202/079017-0327"}
+{"id": "survey-hallucination-large-2023", "title": "A Survey of Hallucination in Large Foundation Models", "authors": ["Vipula Rawte", "A. Sheth", "Amitava Das"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2309.05922", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucination in a foundation model (FM) refers to the generation of content that strays from factual reality or includes fabricated information. This survey paper provides an extensive overview of re", "arxiv_id": "2309.05922", "doi": "10.48550/arXiv.2309.05922"}
+{"id": "copiloting-copilots-fusing-2023", "title": "Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair", "authors": ["Yuxiang Wei", "Chun Xia", "Lingming Zhang"], "year": 2023, "venue": "ESEC/SIGSOFT FSE", "source_url": "https://arxiv.org/abs/2309.00608", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have bee", "arxiv_id": "2309.00608", "doi": "10.1145/3611643.3616271"}
+{"id": "exploring-parameterefficient-finetuning-2023", "title": "Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models", "authors": ["M. Weyssow", "Xin Zhou", "Kisub Kim", "David Lo", "H. Sahraoui"], "year": 2023, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2308.10462", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) demonstrate impressive capabilities to generate accurate code snippets given natural language intents in a zero-shot manner, i.e., without the need for specific fine-tunin", "arxiv_id": "2308.10462", "doi": "10.1145/3714461"}
+{"id": "efficient-guided-generation-2023", "title": "Efficient Guided Generation for Large Language Models", "authors": ["Brandon T. Willard", "Rémi Louf"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2307.09702", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this article we show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an ef", "arxiv_id": "2307.09702", "doi": "10.48550/arXiv.2307.09702"}
+{"id": "multiple-scalable-polyglot-2023", "title": "MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation", "authors": ["Federico Cassano", "John Gouwar", "Daniel Nguyen", "S. Nguyen", "Luna Phipps-Costin"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2023.3267446", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Although contemporary code generation models are trained on corpora with several pr", "doi": "10.1109/TSE.2023.3267446"}
+{"id": "cotran-llmbased-code-2023", "title": "CoTran: An LLM-Based Code Translator Using Reinforcement Learning with Feedback from Compiler and Symbolic Execution", "authors": ["Prithwish Jana", "Piyush Jha", "Haoyang Ju", "Gautham Kishore", "Aryan Mahajan"], "year": 2023, "venue": "European Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2306.06755", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Existing LLM-ba", "arxiv_id": "2306.06755", "doi": "10.3233/FAIA240968"}
+{"id": "leveraging-rust-types-2023", "title": "Leveraging Rust Types for Program Synthesis", "authors": ["Jonáš Fiala", "Shachar Itzhaky", "Peter Müller", "N. Polikarpova", "Ilya Sergey"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3591278", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Rust type system guarantees memory safety and data-race freedom. However, to satisfy Rust's type rules, many familiar implementation patterns must be adapted substantially. These necessary adaptat", "doi": "10.1145/3591278"}
+{"id": "gpt4-technical-report-2023", "title": "GPT-4 Technical Report", "authors": ["OpenAI Josh Achiam", "Steven Adler", "Sandhini Agarwal", "L. Ahmad", "Ilge Akkaya"], "year": 2023, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2303.08774", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 ", "arxiv_id": "2303.08774"}
+{"id": "measuring-impact-programming-2023", "title": "Measuring The Impact Of Programming Language Distribution", "authors": ["Gabriel Orlanski", "Kefan Xiao", "Xavier García", "Jeffrey Hui", "Joshua Howland"], "year": 2023, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2302.01973", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present th", "arxiv_id": "2302.01973", "doi": "10.48550/arXiv.2302.01973"}
+{"id": "prompting-programming-query-2022", "title": "Prompting Is Programming: A Query Language for Large Language Models", "authors": ["Luca Beurer-Kellner", "Marc Fischer", "Martin T. Vechev"], "year": 2022, "venue": "Proc. ACM Program. Lang.", "source_url": "https://arxiv.org/abs/2212.06094", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used t", "arxiv_id": "2212.06094", "doi": "10.1145/3591300"}
+{"id": "github-2022", "title": "GitHub", "authors": ["Sufyan bin Uzayr"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1201/9781003229100-6", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1201/9781003229100-6"}
+{"id": "typescript-2022", "title": "TypeScript", "authors": ["Sufyan bin Uzayr"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1201/9781003203728-1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1201/9781003203728-1"}
+{"id": "structured-outputs-2025", "title": "Structured Outputs", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "survey-code-generation-2024", "title": "Survey on Code Generation for Low resource and Domain Specific Programming Languages", "authors": ["Sathvik Joel", "J. Wu", "Fatemeh H. Fard"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2410.03981", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2410.03981"}
+{"id": "various-2024", "title": "Various", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "lost-translation-study-2024", "title": "Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "incorrect-type-deducted-2024", "title": ". Incorrect type deducted for accumulator in reduce", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "monitorguided-decoding-code-2023", "title": "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context", "authors": ["Lakshya A Agrawal", "Aditya Kanade", "Navin Goyal", "Shuvendu K. Lahiri", "S. Rajamani"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/9b405bcf0c10a220e848eed43573ffc3477e13b8", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "immaculate-practical-llm-2026", "title": "IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation", "authors": ["Yanpei Guo", "Wenjie Qu", "Linyu Wu", "Shengfang Zhai", "Lionel Z. Wang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.22700", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Commercial large language models are typically deployed as black-box API services, requiring users to trust providers to execute inference correctly and report token usage honestly. We present IMMACUL", "arxiv_id": "2602.22700"}
+{"id": "sustainable-llm-inference-2026", "title": "Sustainable LLM Inference using Context-Aware Model Switching", "authors": ["Yuvarani", "Akashdeep Singh", "Zahra Fathanah", "Salsabila Harlen", "Syeikha Syafura Al-Zahra binti Zahari"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.22261", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have become central to many AI applications, but their growing energy consumption raises serious sustainability concerns. A key limitation in current AI deployments is the relian", "arxiv_id": "2602.22261"}
+{"id": "confidencedriven-multiscale-model-2026", "title": "Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference", "authors": ["Bo-Wei Chen", "Chung-Chi Chen", "An-Zi Yen"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.22090", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven", "arxiv_id": "2602.22090"}
+{"id": "power-limitations-aggregation-2026", "title": "Power and Limitations of Aggregation in Compound AI Systems", "authors": ["Nivasini Ananthakrishnan", "Meena Jagadeesan"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.21556", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When designing compound AI systems, a common approach is to query multiple copies of the same model and aggregate the responses to produce a synthesized output. Given the homogeneity of these models, ", "arxiv_id": "2602.21556"}
+{"id": "sweprotege-learning-selectively-2026", "title": "SWE-Prot\\'eg\\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents", "authors": ["Patrick Tser Jern Kon", "Archana Pradeep", "Ang Chen", "Alexander P. Ellis", "Warren Hunt"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.22124", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, w", "arxiv_id": "2602.22124"}
+{"id": "pyramid-moa-probabilistic-2026", "title": "Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference", "authors": ["Arindam Khaled"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.19509", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) face a persistent trade-off between inference cost and reasoning capability. While\"Oracle\"models (e.g., Llama-3-70B) achieve state-of-the-art accuracy, they are prohibitiv", "arxiv_id": "2602.19509"}
+{"id": "skillorchestra-learning-route-2026", "title": "SkillOrchestra: Learning to Route Agents via Skill Transfer", "authors": ["Jiayu Wang", "Yifei Ming", "Zixuan Ke", "Shafiq Joty", "Aws Albarghouthi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.19672", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input", "arxiv_id": "2602.19672"}
+{"id": "adaptive-data-augmentation-2026", "title": "Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition", "authors": ["Mi Tang", "Yangyang Yu", "Aolin Ding", "M. Pouyan", "T.B. Bao"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.19385", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recognizing implicit visual and textual patterns is essential in many real-world applications of modern AI. However, tackling long-tail pattern recognition tasks remains challenging for current pre-tr", "arxiv_id": "2602.19385"}
+{"id": "automated-extraction-mechanical-2026", "title": "Automated Extraction of Mechanical Constitutive Models from Scientific Literature using Large Language Models: Applications in Cultural Heritage Conservation", "authors": ["Ruijuan Hu", "Yue Wu", "Tianhao Su", "Y. Wang", "Shunbo Hu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.16551", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The preservation of cultural heritage is increasingly transitioning towards data-driven predictive maintenance and\"Digital Twin\"construction. However, the mechanical constitutive models required for h", "arxiv_id": "2602.16551"}
+{"id": "next-paradigm-usercentric-2026", "title": "The Next Paradigm Is User-Centric Agent, Not Platform-Centric Service", "authors": ["Luankang Zhang", "Hang Lv", "Qi-an Pan", "Ke Wang", "Yonghao Huang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.15682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern digital services have evolved into indispensable tools, driving the present large-scale information systems. Yet, the prevailing platform-centric model, where services are optimized for platfor", "arxiv_id": "2602.15682"}
+{"id": "mobilityaware-cache-framework-2026", "title": "Mobility-Aware Cache Framework for Scalable LLM-Based Human Mobility Simulation", "authors": ["Hua Yan", "Heng Tan", "Yingxue Zhang", "Yu Yang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.16727", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large-scale human mobility simulation is critical for applications such as urban planning, epidemiology, and transportation analysis. Recent works treat large language models (LLMs) as human agents to", "arxiv_id": "2602.16727"}
+{"id": "efficient-strategy-finetuning-2026", "title": "An efficient strategy for fine-tuning large language models", "authors": ["B. Marsh", "Adam Michaleas", "Darrell O. Ricke", "Shaun Monera", "Shriya Zembruski"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.3389/frai.2026.1665992", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \n \n Large Language Models (LLMs) achieve strong performance on many Natural Language Processing tasks, but adapting them to domain-specific applications is resource-intensive due to the cost of curati", "doi": "10.3389/frai.2026.1665992"}
+{"id": "picking-right-specialist-2026", "title": "Picking the Right Specialist: Attentive Neural Process-based Selection of Task-Specialized Models as Tools for Agentic Healthcare Systems", "authors": ["Pramit Saha", "Joshua Strong", "M. Alsharid", "Divyanshu Mishra", "J. Noble"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.14901", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Task-specialized models form the backbone of agentic healthcare systems, enabling the agents to answer clinical queries across tasks such as disease diagnosis, localization, and report generation. Yet", "arxiv_id": "2602.14901"}
+{"id": "calibration-large-language-2026", "title": "On Calibration of Large Language Models: From Response To Capability", "authors": ["Sin-Han Yang", "Cheng-Kuang Wu", "Chengxi Wu", "Chieh-Yen Lin", "Yun-Nung Chen"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.13540", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are widely deployed as general-purpose problem solvers, making accurate confidence estimation critical for reliable use. Prior work on LLM calibration largely focuses on r", "arxiv_id": "2602.13540"}
+{"id": "adaptevolve-improving-efficiency-2026", "title": "AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection", "authors": ["Pretam Ray", "P. Brahma", "Zicheng Liu", "E. Barsoum"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.11931", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evolutionary agentic systems intensify the trade-off between computational efficiency and reasoning capability by repeatedly invoking large language models (LLMs) during inference. This setting raises", "arxiv_id": "2602.11931"}
+{"id": "lacy-what-small-2026", "title": "LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss", "authors": ["Szilvia Ujv'ary", "Louis B'ethune", "Pierre Ablin", "Jo˜ao Monteiro", "Marco Cuturi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.12005", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially ", "arxiv_id": "2602.12005"}
+{"id": "fair-comprehensive-evaluation-2026", "title": "Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems", "authors": ["Wanxin Wu", "He Zhu", "Yixia Li", "Lei Yang", "Jie Zhao"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.11877", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Existing route", "arxiv_id": "2602.11877"}
+{"id": "rooflinebench-benchmarking-framework-2026", "title": "RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis", "authors": ["Zhen Bi", "Xueshu Chen", "Luoyang Sun", "Yuhan Yao", "Qing Shen"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.11506", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware. However, obj", "arxiv_id": "2602.11506"}
+{"id": "prefillshare-shared-prefill-2026", "title": "PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving", "authors": ["Sunghyeon Woo", "Hoseung Kim", "S. Shim", "Minjung Jo", "Hyunjoon Jeong"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.12029", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems increasingly orchestrate multiple specialized language models to solve complex real-world problems, often invoking them over a shared context. This execution pattern repeatedly pro", "arxiv_id": "2602.12029"}
+{"id": "learning-configure-agentic-2026", "title": "Learning to Configure Agentic AI Systems", "authors": ["Aditya Taparia", "Som Sagar", "Ransalu Senanayake"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.11574", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Configuring LLM-based agent systems involves choosing workflows, tools, token budgets, and prompts from a large combinatorial design space, and is typically handled today by fixed large templates or h", "arxiv_id": "2602.11574"}
+{"id": "boute-costefficient-llm-2026", "title": "BOute: Cost-Efficient LLM Serving with Heterogeneous LLMs and GPUs via Multi-Objective Bayesian Optimization", "authors": ["Youhe Jiang", "Fangcheng Fu", "Eiko Yoneki"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.10729", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid growth of large language model (LLM) deployments has made cost-efficient serving systems essential. Recent efforts to enhance system cost-efficiency adopt two main perspectives: (i) An algor", "arxiv_id": "2602.10729"}
+{"id": "llms-encode-their-2026", "title": "LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations", "authors": ["William Lugoloobi", "Thomas Foster", "William Bankes", "Chris Russell"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.09924", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of", "arxiv_id": "2602.09924"}
+{"id": "routing-cascades-user-2026", "title": "Routing, Cascades, and User Choice for LLMs", "authors": ["Rafid Mahmood"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.09902", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To mitigate the trade-offs between performance and costs, LLM providers route user tasks to different models based on task difficulty and latency. We study the effect of LLM routing with respect to us", "arxiv_id": "2602.09902"}
+{"id": "ojbkq-objectivejoint-babaiklein-2026", "title": "OJBKQ: Objective-Joint Babai-Klein Quantization", "authors": ["Xinyu Wang", "Ziyu Zhao", "Peng Lu", "Yu Gu", "Xiao-Wen Chang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.08376", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Post-training quantization (PTQ) is widely used to compress large language models without retraining. However, many existing weight-only methods rely on heuristic objectives and greedy rounding, thus ", "arxiv_id": "2602.08376"}
+{"id": "nmusketeers-reinforcement-learning-2026", "title": "$n$-Musketeers: Reinforcement Learning Shapes Collaboration Among Language Models", "authors": ["Ryozo Masukawa", "Sanggeon Yun", "Hyunwoo Oh", "S. Jeong,", "Raheeb Hassa"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.09173", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent progress in reinforcement learning with verifiable rewards (RLVR) shows that small, specialized language models (SLMs) can exhibit structured reasoning without relying on large monolithic LLMs.", "arxiv_id": "2602.09173"}
+{"id": "dont-always-pick-2026", "title": "Don't Always Pick the Highest-Performing Model: An Information Theoretic View of LLM Ensemble Selection", "authors": ["Yigit Turkmen", "Baturalp Buyukates", "Melih Bastopcu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.08003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are often ensembled together to improve overall reliability and robustness, but in practice models are strongly correlated. This raises a fundamental question: which model", "arxiv_id": "2602.08003"}
+{"id": "relaygen-intrageneration-model-2026", "title": "RelayGen: Intra-Generation Model Switching for Efficient Reasoning", "authors": ["Jiwon Song", "Yoon Kim", "Jae－Joon Kim"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06454", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large reasoning models (LRMs) achieve strong performance on complex reasoning tasks by generating long, multi-step reasoning trajectories, but inference-time scaling incurs substantial deployment cost", "arxiv_id": "2602.06454"}
+{"id": "acar-adaptive-complexity-2026", "title": "ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces", "authors": ["Ramchand Kumaresan"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.21231", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present ACAR (Adaptive Complexity and Attribution Routing), a measurement framework for studying multi-model orchestration under auditable conditions. ACAR uses self-consistency variance (sigma) co", "arxiv_id": "2602.21231"}
+{"id": "among-us-measuring-2026", "title": "Among Us: Measuring and Mitigating Malicious Contributions in Model Collaboration Systems", "authors": ["Ziyuan Yang", "Wenxuan Ding", "Shangbin Feng", "Yulia Tsvetkov"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05176", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models (LMs) are increasingly used in collaboration: multiple LMs trained by different parties collaborate through routing systems, multi-agent debate, model merging, and more. Critical safet", "arxiv_id": "2602.05176"}
+{"id": "mentorcollab-selective-largetosmall-2026", "title": "MentorCollab: Selective Large-to-Small Inference-Time Guidance for Efficient Reasoning", "authors": ["Haojin Wang", "Yike Wang", "Shangbin Feng", "Hanna Hajishirzi", "Yulia Tsvetkov"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05307", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large reasoning models (LRMs) achieve strong performance by producing long chains of thought, but their inference costs are high and often generate redundant reasoning. Small language models (SLMs) ar", "arxiv_id": "2602.05307"}
+{"id": "singlemulti-evolution-loop-2026", "title": "The Single-Multi Evolution Loop for Self-Improving Model Collaboration Systems", "authors": ["Shangbin Feng", "Kishan Panaganti", "Yulia Tsvetkov", "Wenhao Yu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05182", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Model collaboration -- systems where multiple language models (LMs) collaborate -- combines the strengths of diverse models with cost in loading multiple LMs. We improve efficiency while preserving th", "arxiv_id": "2602.05182"}
+{"id": "interfaze-future-ai-2026", "title": "Interfaze: The Future of AI is built on Task-Specific Small Models", "authors": ["Harsha Vardhan Khurdula", "Vineet Agarwal", "Yoeven D Khemlani"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04101", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present Interfaze, a system that treats modern LLM applications as a problem of building and acting over context, not just picking the right monolithic model. Instead of a single transformer, we co", "arxiv_id": "2602.04101"}
+{"id": "disentangling-causal-importance-2026", "title": "Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration", "authors": ["Sudipto Ghosh", "Sujoy Nath", "Sunny Manchanda", "Tanmoy Chakraborty"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04291", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration", "arxiv_id": "2602.04291"}
+{"id": "budgetaware-agentic-routing-2026", "title": "Budget-Aware Agentic Routing via Boundary-Guided Training", "authors": ["Caiqi Zhang", "Menglin Xia", "Xuchao Zhang", "Daniel Madrigal", "Ankur Mallick"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.21227", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routin", "arxiv_id": "2602.21227"}
+{"id": "colt-lightweight-multillm-2026", "title": "COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation", "authors": ["Annabelle Sujun Tang", "Christopher Priebe", "Lianhui Qin", "Hadi Esmaeilzadeh"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.01935", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Model serving costs dominate AI systems, making compiler optimization essential for scalable deployment. Recent works show that a large language model (LLM) can guide compiler search by reasoning over", "arxiv_id": "2602.01935"}
+{"id": "trust-by-design-2026", "title": "Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing", "authors": ["Mika Okamoto", "Ansel Kaplan Erol", "Glenn Scott Matlin"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02386", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: How should Large Language Model (LLM) practitioners select the right model for a task without wasting money? We introduce BELLA (Budget-Efficient LLM Selection via Automated skill-profiling), a framew", "arxiv_id": "2602.02386"}
+{"id": "r2router-new-paradigm-2026", "title": "R2-Router: A New Paradigm for LLM Routing with Reasoning", "authors": ["Jiaqi Xue", "Qian Lou", "Jiarong Xing", "Heng Huang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02823", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As LLMs proliferate with diverse capabilities and costs, LLM routing has emerged by learning to predict each LLM's quality and cost for a given query, then selecting the one with high quality and low ", "arxiv_id": "2602.02823"}
+{"id": "dynamic-mix-precision-2026", "title": "Dynamic Mix Precision Routing for Efficient Multi-step LLM Interaction", "authors": ["Yuanzhe Li", "Jianing Deng", "Jingtong Hu", "Tianlong Chen", "Song Wang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02711", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLM) achieve strong performance in long-horizon decision-making tasks through multi-step interaction and reasoning at test time. While practitioners commonly believe a higher ta", "arxiv_id": "2602.02711"}
+{"id": "flowsteer-interactive-agentic-2026", "title": "FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning", "authors": ["Mingda Zhang", "Haoran Luo", "Tiesunlong Shen", "Qika Lin", "Xiaoying Tang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.01664", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, a variety of powerful agentic workflows have been applied to solve a wide range of human problems. However, existing workflow orchestration still faces key challenges, including high ", "arxiv_id": "2602.01664"}
+{"id": "inferenceonly-prompt-projection-2026", "title": "Inference-Only Prompt Projection for Safe Text-to-Image Generation with TV Guarantees", "authors": ["Minhyuk Lee", "Hyekyung Yoon", "Myungjoo Kang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00616", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Text-to-Image (T2I) diffusion models enable high-quality open-ended synthesis, but their real-world deployment demands safeguards that suppress unsafe generations without degrading benign prompt-image", "arxiv_id": "2602.00616"}
+{"id": "pay-hints-not-2026", "title": "Pay for Hints, Not Answers: LLM Shepherding for Cost-Efficient Inference", "authors": ["Ziming Dong", "Hardik Sharma", "E. O’Toole", "J. Champati", "Kui Wu"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22132", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) deliver state-of-the-art performance on complex reasoning tasks, but their inference costs limit deployment at scale. Small Language Models (SLMs) offer dramatic cost savi", "arxiv_id": "2601.22132", "doi": "10.48550/arXiv.2601.22132"}
+{"id": "effective-lora-adapter-2026", "title": "Effective LoRA Adapter Routing using Task Representations", "authors": ["Akash Dhasade", "Anne-Marie Kermarrec", "Igor Pavlovic", "Diana Petrescu", "Rafael Pires"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21795", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Low-rank adaptation (LoRA) enables parameter efficient specialization of large language models (LLMs) through modular adapters, resulting in rapidly growing public adapter pools spanning diverse tasks", "arxiv_id": "2601.21795", "doi": "10.48550/arXiv.2601.21795"}
+{"id": "more-bang-buck-2026", "title": "More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)", "authors": ["Sagi Meir", "Tommer D. Keidar", "Noam Levi", "S. Reuveni", "Barak Hirshberg"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21522", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a mor", "arxiv_id": "2601.21522", "doi": "10.48550/arXiv.2601.21522"}
+{"id": "moco-onestop-shop-2026", "title": "MoCo: A One-Stop Shop for Model Collaboration Research", "authors": ["Shangbin Feng", "Yuyang Bai", "Ziyuan Yang", "Yike Wang", "Zhaoxuan Tan"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21257", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Advancing beyond single monolithic language models (LMs), recent research increasingly recognizes the importance of model collaboration, where multiple LMs collaborate, compose, and complement each ot", "arxiv_id": "2601.21257", "doi": "10.48550/arXiv.2601.21257"}
+{"id": "federate-router-learning-2026", "title": "Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations", "authors": ["Baris Askin", "Shivam Patel", "Anupam Nayak", "Andrea Vigano", "Jiin Woo"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.22318", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly accessed as remotely hosted services by edge and enterprise clients that cannot run frontier models locally. Since models vary widely in capability and pr", "arxiv_id": "2601.22318", "doi": "10.48550/arXiv.2601.22318"}
+{"id": "joint-continual-learning-2026", "title": "Joint Continual Learning of Local Language Models and Cloud Offloading Decisions with Budget Constraints", "authors": ["Evan Chen", "Wenzhi Fang", "Shiqiang Wang", "Christopher G. Brinton"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00166", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Locally deployed Small Language Models (SLMs) must continually support diverse tasks under strict memory and computation constraints, making selective reliance on cloud Large Language Models (LLMs) un", "arxiv_id": "2602.00166"}
+{"id": "formulaone-prompting-adaptive-2026", "title": "Formula-One Prompting: Adaptive Reasoning Through Equations For Applied Mathematics", "authors": ["Natapong Nitarach", "Pittawat Taveekitworachai", "Kunat Pipatanakul"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19302", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompting techniques such as Chain-of-Thought (CoT) and Program-of-Thought (PoT) improve LLM mathematical reasoning by structuring intermediate steps in natural language or code. However, applied math", "arxiv_id": "2601.19302", "doi": "10.48550/arXiv.2601.19302"}
+{"id": "proteus-slaaware-routing-2026", "title": "PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems", "authors": ["Amit Singh Bhatti", "Vishal Vaddina", "Dagnachew Birru"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19402", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Production LLM deployments serve diverse workloads where cost and quality requirements vary by customer tier, time of day, and query criticality. Model serving systems accept latency SLOs directly. LL", "arxiv_id": "2601.19402", "doi": "10.48550/arXiv.2601.19402"}
+{"id": "caster-breaking-costperformance-2026", "title": "CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient Routing", "authors": ["Shanyv Liu", "Xuyang Yuan", "Tao Chen", "Zijun Zhan", "Zhu Han"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19793", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Graph-based Multi-Agent Systems (MAS) enable complex cyclic workflows but suffer from inefficient static model allocation, where deploying strong models uniformly wastes computation on trivial sub-tas", "arxiv_id": "2601.19793", "doi": "10.48550/arXiv.2601.19793"}
+{"id": "mmrbench-comprehensive-benchmark-2026", "title": "MMR-Bench: A Comprehensive Benchmark for Multimodal LLM Routing", "authors": ["Haoxuan Ma", "Guannan Lai", "Han-Jia Ye"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17814", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multimodal large language models (MLLMs) have advanced rapidly, yet heterogeneity in architecture, alignment strategies, and efficiency means that no single model is uniformly superior across tasks. I", "arxiv_id": "2601.17814", "doi": "10.48550/arXiv.2601.17814"}
+{"id": "greenserv-energyefficient-contextaware-2026", "title": "GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference", "authors": ["Thomas Ziller", "Shashikant Ilager", "A. Tundo", "Ezio Bartocci", "L. Mariani"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17551", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) demonstrate remarkable capabilities, but their broad deployment is limited by significant computational resource demands, particularly energy consumption during inference.", "arxiv_id": "2601.17551", "doi": "10.48550/arXiv.2601.17551"}
+{"id": "rele-scalable-system-2026", "title": "ReLE: A Scalable System and Structured Benchmark for Diagnosing Capability Anisotropy in Chinese LLMs", "authors": ["R. Fang", "Jian Li", "Wei Chen", "Bin Hu", "Ying-Cong Chen"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.17399", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved rapid progress in Chinese language understanding, yet accurately evaluating their capabilities remains challenged by benchmark saturation and prohibitive com", "arxiv_id": "2601.17399", "doi": "10.48550/arXiv.2601.17399"}
+{"id": "perplexity-paradox-why-2026", "title": "The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts", "authors": ["W. Johnson"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.15843", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In\"Compress or Route?\"(Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r>= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to Hu", "arxiv_id": "2602.15843"}
+{"id": "scmas-constructing-costefficient-2026", "title": "SC-MAS: Constructing Cost-Efficient Multi-Agent Systems with Edge-Level Heterogeneous Collaboration", "authors": ["Di Zhao", "Longhui Ma", "Siwei Wang", "Miao Wang", "Yibo Kong"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.09434", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-based Multi-Agent Systems (MAS) enhance complex problem solving through multi-agent collaboration, but often incur substantially higher costs than single-agent systems. Rece", "arxiv_id": "2601.09434", "doi": "10.48550/arXiv.2601.09434"}
+{"id": "assessing-data-extraction-2026", "title": "Assessing data extraction in randomized clinical trials with large language models", "authors": ["Zuhaer Yisha", "Peng Zou", "Sheng Li", "Lin Zhang", "Linfa Guo"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1186/s12874-025-02729-5", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Data extraction is an essential step in evidence synthesis but remains time-consuming and prone to human error. Large language models (LLMs) such as ChatGPT-4 and Claude 3 Opus may offer partial autom", "doi": "10.1186/s12874-025-02729-5"}
+{"id": "personadual-balancing-personalization-2026", "title": "PersonaDual: Balancing Personalization and Objectivity via Adaptive Reasoning", "authors": ["Xiaoyou Liu", "Xinyi Mou", "Shengbin Yue", "Liang Wang", "Yuqing Wang"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.08679", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As users increasingly expect LLMs to align with their preferences, personalized information becomes valuable. However, personalized information can be a double-edged sword: it can improve interaction ", "arxiv_id": "2601.08679", "doi": "10.48550/arXiv.2601.08679"}
+{"id": "llmrouterbench-massive-benchmark-2026", "title": "LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing", "authors": ["Hao Li", "Yiqun Zhang", "Zhaoyan Guo", "Chenxu Wang", "Shengji Tang"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.07206", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) routing assigns each query to the most suitable model from an ensemble. We introduce LLMRouterBench, a large-scale benchmark and unified framework for LLM routing. It compri", "arxiv_id": "2601.07206", "doi": "10.48550/arXiv.2601.07206"}
+{"id": "cost-accuracy-longterm-2026", "title": "Cost and accuracy of long-term memory in Distributed Multi-Agent Systems based on Large Language Models", "authors": ["Benedict Wolff", "Jacopo Bennati"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.07978", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Distributed multi-agent systems (DMAS) based on large language models (LLMs) enable collaborative intelligence while preserving data privacy. However, systematic evaluations of long-term memory under ", "arxiv_id": "2601.07978", "doi": "10.48550/arXiv.2601.07978"}
+{"id": "adafuse-adaptive-ensemble-2026", "title": "AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs", "authors": ["Cheng Cui", "Tianxin Wei", "Ziyi Chen", "Ruizhong Qiu", "Zhichen Zeng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.06022", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical w", "arxiv_id": "2601.06022", "doi": "10.48550/arXiv.2601.06022"}
+{"id": "haps-hierarchical-llm-2026", "title": "HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search", "authors": ["Zihang Tian", "Rui Li", "Jingsen Zhang", "Xiaohe Bo", "Wei Huo"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.05903", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures while over", "arxiv_id": "2601.05903", "doi": "10.48550/arXiv.2601.05903"}
+{"id": "orchestrating-intelligence-confidenceaware-2026", "title": "Orchestrating Intelligence: Confidence-Aware Routing for Efficient Multi-Agent Collaboration across Multi-Scale Models", "authors": ["Jingbo Wang", "Sendong Zhao", "Jiatong Liu", "Hao Wang", "Wanting Li"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.04861", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While multi-agent systems (MAS) have demonstrated superior performance over single-agent approaches in complex reasoning tasks, they often suffer from significant computational inefficiencies. Existin", "arxiv_id": "2601.04861", "doi": "10.48550/arXiv.2601.04861"}
+{"id": "glimprouter-efficient-collaborative-2026", "title": "GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts", "authors": ["Wenhao Zeng", "Xuteng Zhang", "Yuling Shi", "Chao Hu", "Yuting Chen"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.05110", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Reasoning Models (LRMs) achieve remarkable performance by explicitly generating multi-step chains of thought, but this capability incurs substantial inference latency and computational cost. Col", "arxiv_id": "2601.05110", "doi": "10.48550/arXiv.2601.05110"}
+{"id": "introlm-introspective-language-2026", "title": "IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation", "authors": ["Hossein Hosseini Kasnavieh", "Gholamreza Haffari", "Chris Leckie", "A. N. Toosi"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.03511", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely o", "arxiv_id": "2601.03511", "doi": "10.48550/arXiv.2601.03511"}
+{"id": "ral2m-retrieval-augmented-2026", "title": "RAL2M: Retrieval Augmented Learning-To-Match Against Hallucination in Compliance-Guaranteed Service Systems", "authors": ["Mengze Hong", "Di Jiang", "Jiangtao Wen", "Zhiyang Su", "Yawen Li"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.02917", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucination is a major concern in LLM-driven service systems, necessitating explicit knowledge grounding for compliance-guaranteed responses. In this paper, we introduce Retrieval-Augmented Learning", "arxiv_id": "2601.02917", "doi": "10.48550/arXiv.2601.02917"}
+{"id": "synthetic-intuition-system1system2-2026", "title": "Synthetic Intuition: A System-1/System-2 Architecture for Fast and Slow Thinking in Large Language Models", "authors": ["Jasmin Jarsania", "Vivek Patel"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CCWC67433.2026.11393760", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/CCWC67433.2026.11393760"}
+{"id": "selective-ragenhanced-hybrid-2026", "title": "A Selective RAG-Enhanced Hybrid ML-LLM Framework for Efficient and Explainable Fatigue Prediction Using Wearable Sensor Data", "authors": ["Soonho Ha", "Taeyoung Lee", "Hyungjun Seo", "Sujung Yoon", "Hwamin Lee"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.3390/bioengineering13010058", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fatigue is a multifactorial phenomenon affecting both physical and psychological performance, particularly in high-stress occupations. Although wearable sensors enable continuous monitoring, conventio", "doi": "10.3390/bioengineering13010058"}
+{"id": "you-only-need-2025", "title": "You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference", "authors": ["Ryan Shamim"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00847", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern AI inference systems treat transformer execution as mandatory, conflating model capability with execution necessity. We reframe inference as a control-plane decision problem: determining when e", "arxiv_id": "2601.00847", "doi": "10.48550/arXiv.2601.00847"}
+{"id": "theoretical-foundations-scaling-2025", "title": "Theoretical Foundations of Scaling Law in Familial Models", "authors": ["Huan Song", "Qingfei Zhao", "Ting Long", "Shuyu Tian", "Hongjun An"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.23407", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural scaling laws have become foundational for optimizing large language model (LLM) training, yet they typically assume a single dense model output. This limitation effectively overlooks\"Familial m", "arxiv_id": "2512.23407", "doi": "10.48550/arXiv.2512.23407"}
+{"id": "vlrouterbench-benchmark-visionlanguage-2025", "title": "VL-RouterBench: A Benchmark for Vision-Language Model Routing", "authors": ["Zhehao Huang", "Baijiong Lin", "Jingyuan Zhang", "Jingyi Wang", "Yuhang Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.23562", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-model routing has evolved from an engineering technique into essential infrastructure, yet existing work lacks a systematic, reproducible benchmark for evaluating vision-language models (VLMs). ", "arxiv_id": "2512.23562", "doi": "10.48550/arXiv.2512.23562"}
+{"id": "sloconditioned-action-routing-2025", "title": "SLO-Conditioned Action Routing for Retrieval-Augmented Generation: Objective Ablation and Failure Modes", "authors": ["Bharath Nunepalli"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00841", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) introduces a practical control problem: retrieval depth and generation behavior must be chosen per query to satisfy service-level objectives (SLOs) such as cost, r", "arxiv_id": "2601.00841", "doi": "10.48550/arXiv.2601.00841"}
+{"id": "energyaware-routing-large-2025", "title": "Energy-Aware Routing to Large Reasoning Models", "authors": ["Austin R. Ellis-Mohr", "Max Hartman", "Lav R. Varshney"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00823", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it i", "arxiv_id": "2601.00823", "doi": "10.48550/arXiv.2601.00823"}
+{"id": "reliable-llmbased-edgecloudexpert-2025", "title": "Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems", "authors": ["Qiushuo Hou", "Sangwoo Park", "Matteo Zecchin", "Yunlong Cai", "Guanding Yu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.20012", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications, assisting with tasks including troubleshooting, standards interpretation, and network opt", "arxiv_id": "2512.20012", "doi": "10.48550/arXiv.2512.20012"}
+{"id": "calibrating-llm-judges-2025", "title": "Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation", "authors": ["Bhaktipriya Radharapu", "Eshika Saxena", "Ke Li", "Chenxi Whitehouse", "Adina Williams"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.22245", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As LLM-based judges become integral to industry applications, obtaining well-calibrated uncertainty estimates efficiently has become critical for production deployment. However, existing techniques, s", "arxiv_id": "2512.22245", "doi": "10.48550/arXiv.2512.22245"}
+{"id": "forgetful-but-faithful-2025", "title": "Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents", "authors": ["Saad Alqithami"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.12856", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As generative agents become increasingly sophisticated and deployed in long-term interactive scenarios, their memory management capabilities emerge as a critical bottleneck for both performance and pr", "arxiv_id": "2512.12856", "doi": "10.48550/arXiv.2512.12856"}
+{"id": "hybridflow-resourceadaptive-subtask-2025", "title": "HybridFlow: Resource-Adaptive Subtask Routing for Efficient Edge-Cloud LLM Inference", "authors": ["Jiangwen Dong", "Jiayu Li", "Tianhang Zheng", "Wanyu Lin"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2512.22137", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Edge-cloud collaborative inference is becoming a practical necessity for LLM-powered edge devices: on-device models often cannot afford the required reasoning capability, while cloud-only inference co", "arxiv_id": "2512.22137"}
+{"id": "local-llm-ensembles-2025", "title": "Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition", "authors": ["João Lucas Luz Lima Sarcinelli", "D. Silva"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.10043", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower-resource ", "arxiv_id": "2512.10043", "doi": "10.48550/arXiv.2512.10043"}
+{"id": "poodle-seamlessly-scaling-2025", "title": "Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement", "authors": ["Nils Strassenburg", "Boris Glavic", "T. Rabl"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.05525", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and c", "arxiv_id": "2512.05525", "doi": "10.48550/arXiv.2512.05525"}
+{"id": "robon-routed-online-2025", "title": "RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs", "authors": ["Jonathan Geuter", "Gregor Kornhardt"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.05542", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Best-of-$n$ is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of-$n$ relies on a single mod", "arxiv_id": "2512.05542", "doi": "10.48550/arXiv.2512.05542"}
+{"id": "featurizeddecomposition-join-lowcost-2025", "title": "Featurized-Decomposition Join: Low-Cost Semantic Joins with Guarantees", "authors": ["Sepanta Zeighami", "Shreya Shankar", "Aditya G. Parameswaran"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.05399", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are being increasingly used within data systems to process large datasets with text fields. A broad class of such tasks involves a semantic join-joining two tables based o", "arxiv_id": "2512.05399", "doi": "10.48550/arXiv.2512.05399"}
+{"id": "large-language-models-2025-2", "title": "Large Language Models as Generalist Policies for Network Optimization", "authors": ["Duo Wu", "Linjia Kang", "Zhimin Wang", "Fangxin Wang", "Wei Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.11839", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Designing control policies to ensure robust network services is essential to modern digital infrastructure. However, the dominant paradigm for network optimization relies on designing specialist polic", "arxiv_id": "2512.11839", "doi": "10.48550/arXiv.2512.11839"}
+{"id": "incontext-distillation-selfconsistency-2025", "title": "In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs", "authors": ["Vishnu Sarukkai", "Asanshay Gupta", "James Hong", "Michael Gharbi", "Kayvon Fatahalian"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.02543", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The world currently has an abundance of ideas for how to use new LLM agents, and developers seek to rapidly prototype and test new agentic designs. However, executing agents at scale using high-capaci", "arxiv_id": "2512.02543", "doi": "10.48550/arXiv.2512.02543"}
+{"id": "lpcd-unified-framework-2025", "title": "LPCD: Unified Framework from Layer-Wise to Submodule Quantization", "authors": ["Yuma Ichikawa", "Yudai Fujimoto", "Akira Sakai"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.01546", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Post-training quantization (PTQ) aims to preserve model-level behavior; however, most methods focus on individual linear layers. Even recent extensions, such as QEP and LoaQ, which mitigate error prop", "arxiv_id": "2512.01546", "doi": "10.48550/arXiv.2512.01546"}
+{"id": "art-adaptive-response-2025", "title": "ART: Adaptive Response Tuning Framework - A Multi-Agent Tournament-Based Approach to LLM Response Optimization", "authors": ["Omer Jauhar Khan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.00617", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, single-model responses often exhibit inconsistencies, hallucinations, ", "arxiv_id": "2512.00617", "doi": "10.48550/arXiv.2512.00617"}
+{"id": "optimizing-netgpt-routingbased-2025", "title": "Optimizing NetGPT via Routing-Based Synergy and Reinforcement Learning", "authors": ["Yuxuan Chen", "Rongpeng Li", "Xianfu Chen", "Celimuge Wu", "Chenghui Peng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.22217", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) agents at the network edge offer low-latency execution for routine queries. In contrast, complex requests often require the superior capability of cloud models, incurring hi", "arxiv_id": "2511.22217", "doi": "10.48550/arXiv.2511.22217"}
+{"id": "bamas-structuring-budgetaware-2025", "title": "BAMAS: Structuring Budget-Aware Multi-Agent Systems", "authors": ["Liming Yang", "Junyu Luo", "Xuanzhe Liu", "Yiling Lou", "Zhenpeng Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.21572", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-based multi-agent systems have emerged as a powerful paradigm for enabling autonomous agents to solve complex tasks. As these systems scale in complexity, cost becomes an im", "arxiv_id": "2511.21572", "doi": "10.48550/arXiv.2511.21572"}
+{"id": "personalizedrouter-personalized-llm-2025", "title": "PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling", "authors": ["Zhongjie Dai", "Tao Feng", "Jiaxuan You"], "year": 2025, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2511.16883", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The growing number of Large Language Models (LLMs) with diverse capabilities and response styles provides users with a wider range of choices, which presents challenges in selecting appropriate LLMs, ", "arxiv_id": "2511.16883", "doi": "10.48550/arXiv.2511.16883"}
+{"id": "optimizing-pytorch-inference-2025", "title": "Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems", "authors": ["Kirill Nagaitsev", "Luka Grbčić", "Samuel Williams", "Costin Iancu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.16964", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized model compiler", "arxiv_id": "2511.16964", "doi": "10.48550/arXiv.2511.16964"}
+{"id": "hintaugmented-reranking-efficient-2025", "title": "Hint-Augmented Re-ranking: Efficient Product Search using LLM-Based Query Decomposition", "authors": ["Yilun Zhu", "Nikhita Vedula", "S. Malmasi"], "year": 2025, "venue": "IJCNLP-AACL", "source_url": "https://arxiv.org/abs/2511.13994", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Search queries with superlatives (e.g., best, most popular) require comparing candidates across multiple dimensions, demanding linguistic understanding and domain knowledge. We show that LLMs can unco", "arxiv_id": "2511.13994", "doi": "10.48550/arXiv.2511.13994"}
+{"id": "profits-optimization-llm-2025", "title": "Towards Profits Optimization in LLM Inference Model Deployment at the Network Edge", "authors": ["Yuqi Zhang", "Danyang Zheng", "Huanlai Xing", "Honghui Xu", "Chengzong Peng"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IPCCC66453.2025.11304692", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models (LLMs) have empowered robots and drones with autonomous decision-making capabilities. Due to the stringent real-time requirements of these applications, LLM in", "doi": "10.1109/IPCCC66453.2025.11304692"}
+{"id": "conformal-constrained-policy-2025", "title": "Conformal Constrained Policy Optimization for Cost-Effective LLM Agents", "authors": ["Wenwen Si", "Sooyong Jang", "Insup Lee", "O. Bastani"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.11828", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We propose a nov", "arxiv_id": "2511.11828", "doi": "10.48550/arXiv.2511.11828"}
+{"id": "intelligence-per-watt-2025", "title": "Intelligence per Watt: Measuring Intelligence Efficiency of Local AI", "authors": ["Jon Saad-Falcon", "A. Narayan", "Hakki O. Akengin", "J. Griffin", "Herumb Shandilya"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.07885", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) queries are predominantly processed by frontier models in centralized cloud infrastructure. Rapidly growing demand strains this paradigm, and cloud providers struggle to sca", "arxiv_id": "2511.07885", "doi": "10.48550/arXiv.2511.07885"}
+{"id": "c3po-optimized-large-2025", "title": "C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning", "authors": ["Antonios Valkanas", "Soumyasundar Pal", "P. Rumiantsev", "Yingxue Zhang", "Mark Coates"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.07396", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved impressive results on complex reasoning tasks, but their high inference cost remains a major barrier to real-world deployment. A promising solution is to use", "arxiv_id": "2511.07396", "doi": "10.48550/arXiv.2511.07396"}
+{"id": "sdag-subjectbased-directed-2025", "title": "S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning", "authors": ["Jiangwen Dong", "Zehui Lin", "Wanyu Lin", "Mingjin Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.06727", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved impressive performance in complex reasoning problems. Their effectiveness highly depends on the specific nature of the task, especially the required domain k", "arxiv_id": "2511.06727", "doi": "10.48550/arXiv.2511.06727"}
+{"id": "colm-collaborative-large-2025", "title": "CoLM: Collaborative Large Models via A Client-Server Paradigm", "authors": ["Siqi Huang", "Sida Huang", "Hongyuan Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.06991", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively generate resp", "arxiv_id": "2511.06991", "doi": "10.48550/arXiv.2511.06991"}
+{"id": "confidenceguided-stepwise-model-2025", "title": "Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning", "authors": ["Sangmook Lee", "Dohyung Kim", "Hyukhun Koh", "Nakyeong Yang", "Kyomin Jung"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.06190", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in Large Language Models (LLMs) - particularly model scaling and test-time techniques - have greatly enhanced the reasoning capabilities of language models at the expense of higher inf", "arxiv_id": "2511.06190", "doi": "10.48550/arXiv.2511.06190"}
+{"id": "resourceefficient-multimodal-intelligence-2025", "title": "Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models", "authors": ["Mayank Saini", "Arit Kumar Bishwas"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.06441", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI moves beyond text, large language models (LLMs) increasingly power vision, audio, and document understanding; however, their high inference costs hinder real-time, scalable deployment. Conversel", "arxiv_id": "2511.06441", "doi": "10.48550/arXiv.2511.06441"}
+{"id": "optimalagentselection-stateaware-routing-2025", "title": "Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration", "authors": ["Jingbo Wang", "Sendong Zhao", "Hao Wang", "Yuzhen Fan", "Lizhe Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.02200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of multi-agent systems powered by large language models (LLMs) has unlocked new frontiers in complex task-solving, enabling diverse agents to integrate unique expertise, collaborate flex", "arxiv_id": "2511.02200", "doi": "10.48550/arXiv.2511.02200"}
+{"id": "spectr-fast-speculative-2023", "title": "SpecTr: Fast Speculative Decoding via Optimal Transport", "authors": ["Ziteng Sun", "A. Suresh", "Jae Hun Ro", "Ahmad Beirami", "Himanshu Jain"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2310.15141", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow", "arxiv_id": "2310.15141", "doi": "10.48550/arXiv.2310.15141"}
+{"id": "llama-open-efficient-2023", "title": "LLaMA: Open and Efficient Foundation Language Models", "authors": ["Hugo Touvron", "Thibaut Lavril", "Gautier Izacard", "Xavier Martinet", "M. Lachaux"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2302.13971", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art mod", "arxiv_id": "2302.13971"}
+{"id": "augmented-language-models-2023", "title": "Augmented Language Models: a Survey", "authors": ["G. Mialon", "Roberto Dessì", "M. Lomeli", "Christoforos Nalmpantis", "Ramakanth Pasunuru"], "year": 2023, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2302.07842", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler ", "arxiv_id": "2302.07842"}
+{"id": "accelerating-large-language-2023", "title": "Accelerating Large Language Model Decoding with Speculative Sampling", "authors": ["Charlie Chen", "Sebastian Borgeaud", "G. Irving", "Jean-Baptiste Lespiau", "L. Sifre"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2302.01318", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present speculative sampling, an algorithm for accelerating transformer decoding by enabling the generation of multiple tokens from each transformer call. Our algorithm relies on the observation th", "arxiv_id": "2302.01318", "doi": "10.48550/arXiv.2302.01318"}
+{"id": "demonstratesearchpredict-composing-retrieval-2022", "title": "Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP", "authors": ["O. Khattab", "Keshav Santhanam", "Xiang Lisa Li", "David Leo Wright Hall", "Percy Liang"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2212.14024", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combi", "arxiv_id": "2212.14024", "doi": "10.48550/arXiv.2212.14024"}
+{"id": "successive-prompting-decomposing-2022", "title": "Successive Prompting for Decomposing Complex Questions", "authors": ["Dheeru Dua", "Shivanshu Gupta", "Sameer Singh", "Matt Gardner"], "year": 2022, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2212.04092", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language model", "arxiv_id": "2212.04092", "doi": "10.48550/arXiv.2212.04092"}
+{"id": "fast-inference-from-2022", "title": "Fast Inference from Transformers via Speculative Decoding", "authors": ["Yaniv Leviathan", "Matan Kalman", "Yossi Matias"], "year": 2022, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2211.17192", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model. In this work we introduce speculative decoding - an algorithm to sample from ", "arxiv_id": "2211.17192", "doi": "10.48550/arXiv.2211.17192"}
+{"id": "smoothquant-accurate-efficient-2022", "title": "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", "authors": ["Guangxuan Xiao", "Ji Lin", "Mickael Seznec", "Julien Demouth", "Song Han"], "year": 2022, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2211.10438", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accura", "arxiv_id": "2211.10438", "doi": "10.48550/arXiv.2211.10438"}
+{"id": "ask-me-anything-2022", "title": "Ask Me Anything: A simple strategy for prompting language models", "authors": ["Simran Arora", "A. Narayan", "Mayee F. Chen", "Laurel J. Orr", "Neel Guha"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2210.02441", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a britt", "arxiv_id": "2210.02441", "doi": "10.48550/arXiv.2210.02441"}
+{"id": "decomposed-prompting-modular-2022", "title": "Decomposed Prompting: A Modular Approach for Solving Complex Tasks", "authors": ["Tushar Khot", "H. Trivedi", "Matthew Finlayson", "Yao Fu", "Kyle Richardson"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2210.02406", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual r", "arxiv_id": "2210.02406", "doi": "10.48550/arXiv.2210.02406"}
+{"id": "leasttomost-prompting-enables-2022", "title": "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models", "authors": ["Denny Zhou", "Nathanael Scharli", "Le Hou", "Jason Wei", "Nathan Scales"], "year": 2022, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2205.10625", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than", "arxiv_id": "2205.10625", "doi": "10.48550/arXiv.2205.10625"}
+{"id": "semisupervised-cascaded-clustering-2022", "title": "Semi-Supervised Cascaded Clustering for Classification of Noisy Label Data", "authors": ["Ashit Gupta", "A. Deodhar", "Tathagata Mukherjee", "V. Runkana"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2205.02209", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The performance of supervised classiﬁcation techniques often deteriorates when the data has noisy labels. Even the semi- supervised classiﬁcation approaches have largely focused only on the problem of", "arxiv_id": "2205.02209", "doi": "10.48550/arXiv.2205.02209"}
+{"id": "comprehensive-study-posttraining-2023", "title": "A Comprehensive Study on Post-Training Quantization for Large Language Models", "authors": ["Z. Yao", "Cheng Li", "Xiaoxia Wu", "Stephen Youn", "Yuxiong He"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2303.08302", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2303.08302"}
+{"id": "2025-ai-agent-2026", "title": "The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems", "authors": ["Leon Staufer", "K. Feng", "Kevin Wei", "Luke Bailey", "Yawen Duan"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.17753", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecosyst", "arxiv_id": "2602.17753"}
+{"id": "ecogym-evaluating-llms-2026", "title": "EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies", "authors": ["Xavier Hu", "Jinxiang Xia", "Shengze Xu", "Kangqi Song", "Yishuo Yuan"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.09514", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Long-horizon planning is widely recognized as a core capability of autonomous LLM-based agents; however, current evaluation frameworks suffer from being largely episodic, domain-specific, or insuffici", "arxiv_id": "2602.09514"}
+{"id": "safepro-evaluating-safety-2026", "title": "SafePro: Evaluating the Safety of Professional-Level AI Agents", "authors": ["KAI-QING Zhou", "Shreedhar Jangam", "Ashwin Nagarajan", "Tejas Polu", "S. Oruganti"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.06663", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model-based agents are rapidly evolving from simple conversational assistants into autonomous systems capable of performing complex, professional-level tasks in various domains. While t", "arxiv_id": "2601.06663", "doi": "10.48550/arXiv.2601.06663"}
+{"id": "reliable-agent-engineering-2025", "title": "Reliable agent engineering should integrate machine-compatible organizational principles", "authors": ["R. P. Xian", "Garry Gabison", "Ahmed M. Alaa", "Christoph Riedl", "Grigorios G. Chrysos"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.07665", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI agents built on large language models (LLMs) become increasingly embedded in society, issues of coordination, control, delegation, and accountability are entangled with concerns over their relia", "arxiv_id": "2512.07665", "doi": "10.48550/arXiv.2512.07665"}
+{"id": "upbench-dynamically-evolving-2025", "title": "UpBench: A Dynamically Evolving Real-World Labor-Market Agentic Benchmark Framework Built for Human-Centric AI", "authors": ["Darvin Yi", "Teng Liu", "Mattie Terzolo", "Lance Hasson", "Ayan Sinha"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.12306", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language model (LLM) agents increasingly undertake digital work, reliable frameworks are needed to evaluate their real-world competence, adaptability, and capacity for human collaboration. Ex", "arxiv_id": "2511.12306", "doi": "10.48550/arXiv.2511.12306"}
+{"id": "gdpval-evaluating-ai-2025", "title": "GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks", "authors": ["Tejal Patwardhan", "Rachel Dias", "Elizabeth Proehl", "Grace Kim", "Michele Wang"], "year": 2025, "venue": "Robotics", "source_url": "https://arxiv.org/abs/2510.04374", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce GDPval, a benchmark evaluating AI model capabilities on realworld economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupat", "arxiv_id": "2510.04374", "doi": "10.48550/arXiv.2510.04374"}
+{"id": "ai-productivity-index-2025", "title": "The AI Productivity Index (APEX)", "authors": ["Bertie Vidgen", "Abby Fennelly", "Evan Pinnix", "Chirag Mahapatra", "Zach Richards"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.25721", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present an extended version of the AI Productivity Index (APEX-v1-extended), a benchmark for assessing whether frontier models are capable of performing economically valuable tasks in four jobs: in", "arxiv_id": "2509.25721", "doi": "10.48550/arXiv.2509.25721"}
+{"id": "rexbench-can-coding-2025", "title": "RExBench: Can coding agents autonomously implement AI research extensions?", "authors": ["Nicholas Edwards", "Yukyung Lee", "Yujun Mao", "Yulu Qin", "Sebastian Schuster"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.22598", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agents based on Large Language Models (LLMs) have shown promise for performing sophisticated software engineering tasks autonomously. In addition, there has been progress towards developing agents tha", "arxiv_id": "2506.22598", "doi": "10.48550/arXiv.2506.22598"}
+{"id": "hcast-humancalibrated-autonomy-2025", "title": "HCAST: Human-Calibrated Autonomy Software Tasks", "authors": ["David Rein", "Joel Becker", "Amy Deng", "Seraphina Nix", "C. Canal"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.17354", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To understand and predict the societal impacts of highly autonomous AI systems, we need benchmarks with grounding, i.e., metrics that directly connect AI performance to real-world effects we care abou", "arxiv_id": "2503.17354", "doi": "10.48550/arXiv.2503.17354"}
+{"id": "benchmark-expertlevel-academic-2025", "title": "A benchmark of expert-level academic questions to assess AI capabilities", "authors": ["Long Phan", "Alice Gatti", "Ziwen Han", "Nathaniel Li", "Josephina Hu"], "year": 2025, "venue": "Nature", "source_url": "https://arxiv.org/abs/2501.14249", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve more than 90% ac", "arxiv_id": "2501.14249", "doi": "10.1038/s41586-025-09962-4"}
+{"id": "frontiermath-benchmark-evaluating-2024", "title": "FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI", "authors": ["Elliott S. Glazer", "Ege Erdil", "T. Besiroglu", "Diego Chicharro", "Evan Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.04872", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians. The questions cover most major branches of m", "arxiv_id": "2411.04872"}
+{"id": "mlebench-evaluating-machine-2024", "title": "MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering", "authors": ["Jun Shern Chan", "Neil Chowdhury", "Oliver Jaffe", "James Aung", "Dane Sherburn"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.07095", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a dive", "arxiv_id": "2410.07095", "doi": "10.48550/arXiv.2410.07095"}
+{"id": "bench-benchmark-toolagentuser-2024", "title": "τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains", "authors": ["Shunyu Yao", "Noah Shinn", "Pedram Razavi", "Karthik Narasimhan"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.12045", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing benchmarks do not test language agents on their interaction with human users or ability to follow domain-specific rules, both of which are vital for deploying them in real world applications.", "arxiv_id": "2406.12045", "doi": "10.48550/arXiv.2406.12045"}
+{"id": "osworld-benchmarking-multimodal-2024", "title": "OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments", "authors": ["Tianbao Xie", "Danyang Zhang", "Jixuan Chen", "Xiaochuan Li", "Siheng Zhao"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2404.07972", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and producti", "arxiv_id": "2404.07972", "doi": "10.48550/arXiv.2404.07972"}
+{"id": "scenarios-transition-agi-2024", "title": "Scenarios for the Transition to AGI", "authors": ["Anton Korinek", "Donghyun Suh"], "year": 2024, "venue": "Social Science Research Network", "source_url": "https://arxiv.org/abs/2403.12107", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We analyze how output and wages behave under different scenarios for technological progress that may culminate in Artificial General Intelligence (AGI), defined as the ability of AI systems to perform", "arxiv_id": "2403.12107", "doi": "10.2139/ssrn.4762962"}
+{"id": "chatbot-arena-open-2024", "title": "Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference", "authors": ["Wei-Lin Chiang", "Lianmin Zheng", "Ying Sheng", "Anastasios Nikolas Angelopoulos", "Tianle Li"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2403.04132", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we int", "arxiv_id": "2403.04132", "doi": "10.48550/arXiv.2403.04132"}
+{"id": "agentbench-evaluating-llms-2023", "title": "AgentBench: Evaluating LLMs as Agents", "authors": ["Xiao Liu", "Hao Yu", "Hanchen Zhang", "Yifan Xu", "Xuanyu Lei"], "year": 2023, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2308.03688", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The potential of Large Language Model (LLM) as agents has been widely acknowledged recently. Thus, there is an urgent need to quantitatively \\textit{evaluate LLMs as agents} on challenging tasks in in", "arxiv_id": "2308.03688", "doi": "10.48550/arXiv.2308.03688"}
+{"id": "mind2web-generalist-agent-2023", "title": "Mind2Web: Towards a Generalist Agent for the Web", "authors": ["Xiang Deng", "Yu Gu", "Boyuan Zheng", "Shijie Chen", "Samuel Stevens"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2306.06070", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets f", "arxiv_id": "2306.06070", "doi": "10.48550/arXiv.2306.06070"}
+{"id": "claude-sonnet-45-2025", "title": "Claude sonnet 4.5 system card. System card", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "grok-4-model-2025", "title": "Grok 4 model card", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "openai-2025", "title": "OpenAI", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "chatgpt-agent-system-2025", "title": "Chatgpt agent system card", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "canaries-coal-mine-2025", "title": "Canaries in the coal mine? six facts about the recent employment effects of artificial intelligence", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/None", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "visualwebarena-evaluating-multimodal-2024", "title": "VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks", "authors": ["Jing Yu Koh", "Robert Lo", "Lawrence Jang", "Vikram Duvvur", "Ming Chong Lim"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2401.13649", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2401.13649"}
+{"id": "codified-context-infrastructure-2026", "title": "Codified Context: Infrastructure for AI Agents in a Complex Codebase", "authors": ["Aristidis Vasilopoulos"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.20478", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based agentic coding assistants lack persistent memory: they lose coherence across sessions, forget project conventions, and repeat known mistakes. Recent studies characterize how developers confi", "arxiv_id": "2602.20478"}
+{"id": "knapspec-selfspeculative-decoding-2026", "title": "KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem", "authors": ["Seong-Hwan Cha", "Gyuwan Kim", "D. Han", "Tao Yang", "Insu Han"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.20217", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Self-speculative decoding (SSD) accelerates LLM inference by skipping layers to create an efficient draft model, yet existing methods often rely on static heuristics that ignore the dynamic computatio", "arxiv_id": "2602.20217"}
+{"id": "magicagent-generalized-agent-2026", "title": "MagicAgent: Towards Generalized Agent Planning", "authors": ["Xuhui Ren", "Shaokang Dong", "Cheng Yang", "Qingying Gao", "Yunbin Zhao"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.19000", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized plan", "arxiv_id": "2602.19000"}
+{"id": "seccodeprm-process-reward-2026", "title": "SecCodePRM: A Process Reward Model for Code Security", "authors": ["Weicheng Yu", "Ravi Mangal", "Yinyi Luo", "Kai Hu", "Jingxuan He"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.10418", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either r", "arxiv_id": "2602.10418"}
+{"id": "agentic-ai-modernization-2026", "title": "Agentic AI Modernization: Transforming Institutional Infrastructure Through Orchestrated Multi-Agent LLM Framework", "authors": ["Mahesh Kumar Damarched"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.32996/jcsts.2026.8.4.1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While managing constrained funds and strict regulatory requirements, the higher education institutions are under unprecedented pressure to modernize outdated information systems, such as mainframe-bas", "doi": "10.32996/jcsts.2026.8.4.1"}
+{"id": "graphbased-agent-memory-2026", "title": "Graph-based Agent Memory: Taxonomy, Techniques, and Applications", "authors": ["Chang Yang", "Chuang Zhou", "Yilin Xiao", "Su Dong", "Luyao Zhuang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.05665", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Memory emerges as the core module in the Large Language Model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable", "arxiv_id": "2602.05665"}
+{"id": "agentspawn-adaptive-multiagent-2026", "title": "AgentSpawn: Adaptive Multi-Agent Collaboration Through Dynamic Spawning for Long-Horizon Code Generation", "authors": ["I. Costa"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.07072", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Long-horizon code generation requires sustained context and adaptive expertise across domains. Current multi-agent systems use static workflows that cannot adapt when runtime analysis reveals unantici", "arxiv_id": "2602.07072"}
+{"id": "ancoder-anchored-code-2026", "title": "AnCoder: Anchored Code Generation via Discrete Diffusion Models", "authors": ["Anton Xue", "Litu Rout", "C. Caramanis", "S. Shakkottai"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.17688", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fai", "arxiv_id": "2602.17688"}
+{"id": "large-language-models-2026", "title": "Large Language Models in Software Documentation and Modeling: A Literature Review and Findings", "authors": ["Lukás Radoský", "Ivan Polasek"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.04938", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative artificial intelligence attracts significant attention, especially with the introduction of large language models. Its capabilities are being exploited to solve various software engineering", "arxiv_id": "2602.04938"}
+{"id": "laafd-llmbased-agents-2026", "title": "LAAFD: LLM-based Agents for Accelerated FPGA Design", "authors": ["Maxim Moraru", "Kamalavasan Kamalakkannan", "J. Dominguez-Trujillo", "Patrick Diehl", "Atanu Barai"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: FPGAs offer high performance, low latency, and energy efficiency for accelerated computing, yet adoption in scientific and edge settings is limited by the specialized hardware expertise required. High", "arxiv_id": "2602.06085"}
+{"id": "beyond-quantity-trajectory-2026", "title": "Beyond Quantity: Trajectory Diversity Scaling for Code Agents", "authors": ["Guhong Chen", "Chen Sun", "Cheng Fu", "Qiyao Wang", "Zhihong Huang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.03219", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the dimini", "arxiv_id": "2602.03219"}
+{"id": "from-task-solving-2026", "title": "From Task Solving to Robust Real-World Adaptation in LLM Agents", "authors": ["Pouya Pezeshkpour", "Estevam Hruschka"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.02760", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models are increasingly deployed as specialized agents that plan, call tools, and take actions over extended horizons. Yet many existing evaluations assume a\"clean interface\"where dynam", "arxiv_id": "2602.02760"}
+{"id": "chipbench-nextstep-benchmark-2026", "title": "ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design", "authors": ["Zhongkai Yu", "Chenyang Zhou", "Yichen Lin", "Hejia Zhang", "Haotian Ye"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21448", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Large Language Models (LLMs) show significant potential in hardware engineering, current benchmarks suffer from saturation and limited task diversity, failing to reflect LLMs'performance in real", "arxiv_id": "2601.21448", "doi": "10.48550/arXiv.2601.21448"}
+{"id": "llmbased-vulnerability-detection-2026", "title": "LLM-based Vulnerability Detection at Project Scale: An Empirical Study", "authors": ["Fengjie Li", "Jiajun Jiang", "Dong Chen", "Ying Xiong"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19239", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we present the first comprehensive empirical study of specialized LLM-based detectors and compare them with traditional static analyzers at the project scale. Specifically, our study ev", "arxiv_id": "2601.19239", "doi": "10.48550/arXiv.2601.19239"}
+{"id": "sage-steerable-agentic-2026", "title": "SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback", "authors": ["Fangyuan Xu", "Rujun Han", "Yanfei Chen", "Zifeng Wang", "I-Hung Hsu"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.18202", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep search agents, which aim to answer complex questions requiring reasoning across multiple documents, can significantly speed up the information-seeking process. Collecting human annotations for th", "arxiv_id": "2601.18202", "doi": "10.48550/arXiv.2601.18202"}
+{"id": "large-language-model-2026", "title": "Large Language Model Agent for User-friendly Chemical Process Simulations", "authors": ["Jingkang Liang", "Niklas Groll", "Gürkan Sin"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11650", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern process simulators enable detailed process design, simulation, and optimization; however, constructing and interpreting simulations is time-consuming and requires expert knowledge. This limits ", "arxiv_id": "2601.11650", "doi": "10.48550/arXiv.2601.11650"}
+{"id": "what-do-llm-2026", "title": "What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding", "authors": ["Siyuan Liu", "Hongbang Yuan", "Xinze Li", "Ziyue Zhu", "Yixin Cao"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.09503", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks, yet their ability to generalize across varying environments remains a under-e", "arxiv_id": "2601.09503", "doi": "10.48550/arXiv.2601.09503"}
+{"id": "slidesgenbench-evaluating-slides-2026", "title": "SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics", "authors": ["Yunqiao Yang", "Wenbo Li", "Houxing Ren", "Zimu Lu", "Ke Wang"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.09487", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid evolution of Large Language Models (LLMs) has fostered diverse paradigms for automated slide generation, ranging from code-driven layouts to image-centric synthesis. However, evaluating thes", "arxiv_id": "2601.09487", "doi": "10.48550/arXiv.2601.09487"}
+{"id": "controlled-selfevolution-algorithmic-2026", "title": "Controlled Self-Evolution for Algorithmic Code Optimization", "authors": ["Tu Hu", "Ronghao Chen", "Shuo Zhang", "Jianghao Yin", "Mou Xiao Feng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.07348", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Self-evolution methods enhance code generation through iterative\"generate-verify-refine\"cycles, yet existing approaches suffer from low exploration efficiency, failing to discover solutions with super", "arxiv_id": "2601.07348", "doi": "10.48550/arXiv.2601.07348"}
+{"id": "stelp-secure-transpilation-2026", "title": "STELP: Secure Transpilation and Execution of LLM-Generated Programs", "authors": ["Swapnil Shinde", "Sahil Wadhwa", "Andy Luo", "Akshay Gupta", "M. Sorower"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.05467", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs place them", "arxiv_id": "2601.05467", "doi": "10.48550/arXiv.2601.05467"}
+{"id": "verpo-verifiable-dense-2026", "title": "VeRPO: Verifiable Dense Reward Policy Optimization for Code Generation", "authors": ["Longwen Wang", "Xuan'er Wu", "Xiaohui Hu", "Yirui Liu", "Yuankai Fan"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.03525", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Effective reward design is a central challenge in Reinforcement Learning (RL) for code generation. Mainstream pass/fail outcome rewards enforce functional correctness via executing unit tests, but the", "arxiv_id": "2601.03525", "doi": "10.48550/arXiv.2601.03525"}
+{"id": "hearsay-benchmark-do-2026", "title": "HearSay Benchmark: Do Audio LLMs Leak What They Hear?", "authors": ["Jin Wang", "Liang Lin", "Kaiwen Luo", "Weiliu Wang", "Yitian Chen"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.03783", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Audio Large Language Models (ALLMs) have achieved remarkable progress in understanding and generation, their potential privacy implications remain largely unexplored. This paper takes the first ", "arxiv_id": "2601.03783", "doi": "10.48550/arXiv.2601.03783"}
+{"id": "ecomstage-stagewise-orientationspecific-2026", "title": "EComStage: Stage-wise and Orientation-specific Benchmarking for Large Language Models in E-commerce", "authors": ["Kaiyan Zhao", "Zijie Meng", "Zheyong Xie", "Jinhao Duan", "Yao Hu"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.02752", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-based agents are increasingly deployed in e-commerce applications to assist customer services in tasks such as product inquiries, recommendations, and order management. Exis", "arxiv_id": "2601.02752", "doi": "10.48550/arXiv.2601.02752"}
+{"id": "agentic-memory-learning-2026", "title": "Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents", "authors": ["Yi Yu", "Liuyi Yao", "Yuexiang Xie", "Qingquan Tan", "Jiaqi Feng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.01885", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows, making effective memory management critical. Existing methods typically handle l", "arxiv_id": "2601.01885", "doi": "10.48550/arXiv.2601.01885"}
+{"id": "stellar-searchbased-testing-2026", "title": "STELLAR: A Search-Based Testing Framework for Large Language Model Applications", "authors": ["Lev Sorokin", "I. Vasilev", "Ken E. Friedl", "Andrea Stocco"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00497", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-based applications are increasingly deployed across various domains, including customer service, education, and mobility. However, these systems are prone to inaccurate, fic", "arxiv_id": "2601.00497", "doi": "10.48550/arXiv.2601.00497"}
+{"id": "cotdeceptoradversarial-code-obfuscation-2025", "title": "CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents", "authors": ["Haoyan Li", "Mingjin Li", "Jinxin Zuo", "Siqi Li", "Xiao Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.21250", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced LLM vulnerability detectors are believed to provide ", "arxiv_id": "2512.21250", "doi": "10.48550/arXiv.2512.21250"}
+{"id": "swerank-multilingual-multiturn-2025", "title": "SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization", "authors": ["R. Reddy", "Ye Liu", "Wenting Zhao", "Jae Doo", "Tarun Suresh"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.20482", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Maintaining large-scale, multilingual codebases hinges on accurately localizing issues, which requires mapping natural-language error descriptions to the relevant functions that need to be modified. H", "arxiv_id": "2512.20482", "doi": "10.48550/arXiv.2512.20482"}
+{"id": "explainable-finegrained-safeguarding-2025", "title": "Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection", "authors": ["Junjun Pan", "Yixin Liu", "Rui Miao", "Kaize Ding", "Yu Zheng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.18733", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting ma", "arxiv_id": "2512.18733", "doi": "10.48550/arXiv.2512.18733"}
+{"id": "ai-code-wild-2025", "title": "AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software", "authors": ["Bin Wang", "Wenjie Yu", "YiLu Zhong", "Hao Yu", "Keke Lian"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.18567", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) for code generation are becoming integral to modern software development, but their real-world prevalence and security impact remain poorly understood. We present the firs", "arxiv_id": "2512.18567", "doi": "10.48550/arXiv.2512.18567"}
+{"id": "deepcode-open-agentic-2025", "title": "DeepCode: Open Agentic Coding", "authors": ["Zongwei Li", "Zhonghang Li", "Zirui Guo", "Xubin Ren", "Chao Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.07921", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods still face si", "arxiv_id": "2512.07921", "doi": "10.48550/arXiv.2512.07921"}
+{"id": "backportbench-multilingual-benchmark-2025", "title": "BackportBench: A Multilingual Benchmark for Automated Backporting of Patches", "authors": ["Zhiqing Zhong", "Jiaming Huang", "Pinjia He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.01396", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use older, vulne", "arxiv_id": "2512.01396", "doi": "10.48550/arXiv.2512.01396"}
+{"id": "prompt-perturbation-fraction-2025", "title": "Prompt perturbation and fraction facilitation sometimes strengthen Large Language Model scores", "authors": ["Mike Thelwall"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.01330", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) can be tasked with scoring texts according to pre-defined criteria and on a defined scale, but there is no recognised optimal prompting strategy for this. This article foc", "arxiv_id": "2512.01330", "doi": "10.48550/arXiv.2512.01330"}
+{"id": "review-research-aiassisted-2025", "title": "A Review of Research on AI-Assisted Code Generation and AI-Driven Code Review", "authors": ["Yuzhi Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.54097/d6775287", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the significant breakthroughs of deep learning technologies such as large language models (LLMs) in the field of code analysis, AI has evolved from an auxiliary tool to a key technology that deep", "doi": "10.54097/d6775287"}
+{"id": "kernelband-steering-llmbased-2025", "title": "KernelBand: Steering LLM-based Kernel Optimization via Hardware-Aware Multi-Armed Bandits", "authors": ["Dezhi Ran", "S. Xie", "Mingfang Ji", "Anmin Liu", "Mengzhou Wu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2511.18868", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: High-performance GPU kernels are critical for efficient LLM serving, yet their optimization remains a bottleneck requiring deep system expertise. While code LLMs show promise in generating functionall", "arxiv_id": "2511.18868"}
+{"id": "nalamainz-at-blp2025-2025", "title": "NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation", "authors": ["Hossain Shaikh Saadi", "Faria Alam", "Mario Sanz-Guerrero", "Minh Duc Bui", "Manuel Mager"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.16787", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produce", "arxiv_id": "2511.16787", "doi": "10.48550/arXiv.2511.16787"}
+{"id": "making-llms-reliable-2025", "title": "Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions", "authors": ["Alejandro R. Jadad"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.07669", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes. This ga", "arxiv_id": "2511.07669", "doi": "10.48550/arXiv.2511.07669"}
+{"id": "specificationguided-vulnerability-detection-2025", "title": "Specification-Guided Vulnerability Detection with Large Language Models", "authors": ["Hao Zhu", "Jia Li", "Cuiyun Gao", "Jiaru Qian", "Yihong Dong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.04014", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved remarkable progress in code understanding tasks. However, they demonstrate limited performance in vulnerability detection and struggle to distinguish vulnera", "arxiv_id": "2511.04014", "doi": "10.48550/arXiv.2511.04014"}
+{"id": "textttremind-understanding-deductive-2025", "title": "\\texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs", "authors": ["Jun Gao", "Yun Peng", "Xiaoxue Ren"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.00488", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \\emph{deductive code reasoning", "arxiv_id": "2511.00488", "doi": "10.48550/arXiv.2511.00488"}
+{"id": "catarena-evaluating-evolutionary-2025", "title": "CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments", "authors": ["Lingyue Fu", "Xin Ding", "Li Pan", "Yaoming Zhu", "Shaolei Zhang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.26852", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current evaluation for Large Language Model (LLM) code agents predominantly focus on generating functional code in single-turn scenarios, which fails to evaluate the agent's capability for continuous ", "arxiv_id": "2510.26852"}
+{"id": "autostreampipe-llm-assisted-2025", "title": "AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines", "authors": ["Abolfazl Younesi", "Zahra Najafabadi Samani", "T. Fahringer"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23408", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Data pipelines are essential in stream processing as they enable the efficient collection, processing, and delivery of real-time data, supporting rapid data analysis. In this paper, we present AutoStr", "arxiv_id": "2510.23408", "doi": "10.48550/arXiv.2510.23408"}
+{"id": "magentic-marketplace-opensource-2025", "title": "Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets", "authors": ["Gagan Bansal", "Wenyue Hua", "Zezhou Huang", "Adam Fourney", "Amanda Swearngin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.25779", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also raise many qu", "arxiv_id": "2510.25779", "doi": "10.48550/arXiv.2510.25779"}
+{"id": "cudaforge-agent-framework-2025", "title": "CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization", "authors": ["Zijian Zhang", "Rong Wang", "Shiyang Li", "Yuebo Luo", "Mingyi Hong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01884", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Developing efficient CUDA kernels is increasingly critical for AI applications such as large-scale LLM training. However, manual kernel design is both costly and time-consuming, motivating automatic a", "arxiv_id": "2511.01884", "doi": "10.48550/arXiv.2511.01884"}
+{"id": "coderl-improving-code-2025", "title": "CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment", "authors": ["Xue Jiang", "Yihong Dong", "Mengyang Liu", "Hongyi Deng", "Tian Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.18471", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional", "arxiv_id": "2510.18471", "doi": "10.48550/arXiv.2510.18471"}
+{"id": "saber-efficient-sampling-2025", "title": "Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model", "authors": ["Yihong Dong", "Zhaoyu Ma", "Xue Jiang", "Zhiyuan Fan", "Jiaru Qian"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.18165", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Diffusion language models (DLMs) are emerging as a powerful and promising alternative to the dominant autoregressive paradigm, offering inherent advantages in parallel generation and bidirectional con", "arxiv_id": "2510.18165", "doi": "10.48550/arXiv.2510.18165"}
+{"id": "retrievalaugmented-code-generation-2025", "title": "Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches", "authors": ["Yicheng Tao", "Yao Qin", "Yepang Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.04905", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in large language models (LLMs) have substantially improved automated code generation. While function-level and file-level generation have achieved promising results, real-world so", "arxiv_id": "2510.04905", "doi": "10.48550/arXiv.2510.04905"}
+{"id": "multiagent-codeorchestrated-generation-2025", "title": "Multi-Agent Code-Orchestrated Generation for Reliable Infrastructure-as-Code", "authors": ["Rana Nameer Hussain Khan", "Dawood Wasif", "Jin-Hee Cho", "Ali Butt"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.03902", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing complexity of cloud-native infrastructure has made Infrastructure-as-Code (IaC) essential for reproducible and scalable deployments. While large language models (LLMs) have shown promis", "arxiv_id": "2510.03902", "doi": "10.48550/arXiv.2510.03902"}
+{"id": "bytesized32refactored-extensible-interactive-2025", "title": "ByteSized32Refactored: Towards an Extensible Interactive Text Games Corpus for LLM World Modeling and Evaluation", "authors": ["Haonan Wang", "Junfeng Sun", "Xingdi Yuan", "Ruoyao Wang", "Ziang Xiao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23979", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Simulating interactive world models remains a core challenge in Large Language Models(LLMs). In this work, we introduce the ByteSized32Refactored, a refactored, modular, and extensible implementation ", "arxiv_id": "2509.23979", "doi": "10.48550/arXiv.2509.23979"}
+{"id": "disagreements-reasoning-how-2025", "title": "Disagreements in Reasoning: How a Model's Thinking Process Dictates Persuasion in Multi-Agent Systems", "authors": ["Haodong Zhao", "Jidong Li", "Zhaomin Wu", "Tianjie Ju", "Zhuosheng Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21054", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid proliferation of recent Multi-Agent Systems (MAS), where Large Language Models (LLMs) and Large Reasoning Models (LRMs) usually collaborate to solve complex problems, necessitates a deep und", "arxiv_id": "2509.21054", "doi": "10.48550/arXiv.2509.21054"}
+{"id": "from-evaluation-enhancement-2025", "title": "From Evaluation to Enhancement: Large Language Models for Zero-Knowledge Proof Code Generation", "authors": ["Zhantong Xue", "Pingchuan Ma", "Zhaoyu Wang", "Shuai Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.11708", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Zero-knowledge proofs (ZKPs) are increasingly deployed in domains such as privacy-preserving authentication, verifiable computation, and secure finance. However, authoring ZK programs remains challeng", "arxiv_id": "2509.11708", "doi": "10.48550/arXiv.2509.11708"}
+{"id": "sweeffi-reevaluating-software-2025", "title": "SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints", "authors": ["Zhiyuan Fan", "Kirill Vasilevski", "Dayi Lin", "Boyuan Chen", "Yihao Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.09853", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advancement of large language models (LLMs) and code agents has demonstrated significant potential to assist software engineering (SWE) tasks, such as autonomous issue resolution and feature addit", "arxiv_id": "2509.09853", "doi": "10.48550/arXiv.2509.09853"}
+{"id": "large-language-model-2025", "title": "Large Language Model Unlearning for Source Code", "authors": ["Xue Jiang", "Yihong Dong", "Zheng Fang", "Yingwei Ma", "Tangxinyu Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.17125", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Large Language Models (LLMs) excel at code generation, their inherent tendency toward verbatim memorization of training data introduces critical risks like copyright infringement, insecure emiss", "arxiv_id": "2506.17125", "doi": "10.48550/arXiv.2506.17125"}
+{"id": "sysllmatic-large-language-2025", "title": "SysLLMatic: Large Language Models are Software System Optimizers", "authors": ["Huiyun Peng", "Arjun Gupte", "Ryan Hasler", "N. Eliopoulos", "Chien Chou Ho"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.01249", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic software system optimization can improve software speed and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to gener", "arxiv_id": "2506.01249", "doi": "10.48550/arXiv.2506.01249"}
+{"id": "exploring-dataefficient-adaptation-2024", "title": "Exploring Data-Efficient Adaptation of Large Language Models for Code Generation", "authors": ["Xue Jiang", "Yihong Dong", "Zhi Jin", "Ge Li"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2403.00046", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Although Large Language Models (LLMs) have made significant progress in code generation, they still struggle with code generation tasks in specific scenarios. These scenarios usually necessitate the a", "arxiv_id": "2403.00046", "doi": "10.1145/3772721"}
+{"id": "llmbased-framework-support-2025", "title": "LLM-based framework to support the construction of valid formal models", "authors": ["Gábor Guta", "Gábor Kusper"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.17048/fmfai.2025.78", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The use of large language models (LLMs) in software development is becoming increasingly widespread, despite well-known concerns regarding their reliability. A significant risk arises from relying on ", "doi": "10.17048/fmfai.2025.78"}
+{"id": "artifactsbench-bridging-visualinteractive-2025", "title": "ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation", "authors": ["Chenchen Zhang", "Yuhang Li", "Can Xu", "Jiaheng Liu", "Ao Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.04952", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The generative capabilities of Large Language Models (LLMs) are rapidly expanding from static code to dynamic, interactive visual artifacts. This progress is bottlenecked by a critical evaluation gap:", "arxiv_id": "2507.04952", "doi": "10.48550/arXiv.2507.04952"}
+{"id": "cweval-outcomedriven-evaluation-2025", "title": "CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation", "authors": ["Jinjun Peng", "Leyi Cui", "Kele Huang", "Junfeng Yang", "Baishakhi Ray"], "year": 2025, "venue": "2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)", "source_url": "https://arxiv.org/abs/2501.08200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing, enhancing productivity across various tasks. While identifying incorrect code is often stra", "arxiv_id": "2501.08200", "doi": "10.1109/LLM4Code66737.2025.00009"}
+{"id": "codearena-collective-evaluation-2025", "title": "CodeArena: A Collective Evaluation Platform for LLM Code Generation", "authors": ["Mingzhe Du", "A. Luu", "Bin Ji", "Xiaobao Wu", "Dong Huang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.01295", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have reshaped code generation by synergizing their exceptional comprehension of natural language and programming syntax, thereby substantially boosting developer productiv", "arxiv_id": "2503.01295", "doi": "10.48550/arXiv.2503.01295"}
+{"id": "enhancing-llm-code-2025", "title": "Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Improved Accuracy, Reliability, and Latency", "authors": ["Nazmus Ashrafi", "Salah Bouktif", "Mohammed Mediani"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.02133", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The use of large language models (LLMs) for automated code generation has emerged as a significant focus within AI research. As these pretrained models continue to evolve, their ability to understand ", "arxiv_id": "2505.02133", "doi": "10.48550/arXiv.2505.02133"}
+{"id": "rethinking-verification-llm-2025", "title": "Rethinking Verification for LLM Code Generation: From Generation to Testing", "authors": ["Zihan Ma", "Taolin Zhang", "Maosong Cao", "Jun'nan Liu", "Wenwei Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.06920", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have recently achieved notable success in code-generation benchmarks such as HumanEval and LiveCodeBench. However, a detailed examination reveals that these evaluation sui", "arxiv_id": "2507.06920", "doi": "10.48550/arXiv.2507.06920"}
+{"id": "prompt-variability-effects-2025", "title": "Prompt Variability Effects On LLM Code Generation", "authors": ["Andrei Paleyes", "Radzim Sendyka", "Diana Robinson", "Christian Cabrera", "Neil D. Lawrence"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.10204", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation is one of the most active areas of application of Large Language Models (LLMs). While LLMs lower barriers to writing code and accelerate development process, the overall quality of gen", "arxiv_id": "2506.10204", "doi": "10.48550/arXiv.2506.10204"}
+{"id": "unseen-horizons-unveiling-2024", "title": "Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar", "authors": ["Yuanliang Zhang", "Yifan Xie", "Shanshan Li", "Ke Liu", "Chong Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.08109", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, large language models (LLMs) have shown strong potential in code generation tasks. However, there are still gaps before they can be fully applied in actual software development processes. Ac", "arxiv_id": "2412.08109", "doi": "10.48550/arXiv.2412.08109"}
+{"id": "evaluation-llm-code-2024", "title": "An evaluation of LLM code generation capabilities through graded exercises", "authors": ["Álvaro Barbero Jiménez"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.16292", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models have shown prominent capabilities in generating functional code from natural language descriptions. However, a standardized way to evaluate these capabilities in an objective and", "arxiv_id": "2410.16292", "doi": "10.48550/arXiv.2410.16292"}
+{"id": "your-code-generated-2023", "title": "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation", "authors": ["Jiawei Liu", "Chun Xia", "Yuyao Wang", "Lingming Zhang"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2305.01210", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis prob", "arxiv_id": "2305.01210"}
+{"id": "carbon-footprint-evaluation-2025", "title": "Carbon Footprint Evaluation of Code Generation through LLM as a Service", "authors": ["Tina Vartziotis", "Maximilian Schmidt", "George Dasoulas", "Ippolyti Dellatolas", "Stefano Attademo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.01036", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Due to increased computing use, data centers consume and emit a lot of energy and carbon. These contributions are expected to rise as big data analytics, digitization, and large AI models grow and bec", "arxiv_id": "2504.01036", "doi": "10.48550/arXiv.2504.01036"}
+{"id": "effectiveness-llmasajudge-code-2025", "title": "On the Effectiveness of LLM-as-a-Judge for Code Generation and Summarization", "authors": ["Giuseppe Crupi", "Rosalia Tufano", "Alejandro Velasco", "A. Mastropaolo", "Denys Poshyvanyk"], "year": 2025, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2507.16587", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have been recently exploited as judges for complex natural language processing tasks, such as Q&A (Question & Answer). The basic idea is to delegate to an LLM the assessme", "arxiv_id": "2507.16587", "doi": "10.1109/TSE.2025.3586082"}
+{"id": "linguistics-theory-meets-2024", "title": "Linguistics Theory Meets LLM: Code-Switched Text Generation via Equivalence Constrained Large Language Models", "authors": ["Garry Kuwanto", "Chaitanya Agarwal", "Genta Indra Winata", "Derry Tanti Wijaya"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.22660", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code-switching, the phenomenon of alternating between two or more languages in a single conversation, presents unique challenges for Natural Language Processing (NLP). Most existing research focuses o", "arxiv_id": "2410.22660", "doi": "10.48550/arXiv.2410.22660"}
+{"id": "exploring-effectiveness-llm-2024", "title": "Exploring the Effectiveness of LLM based Test-driven Interactive Code Generation: User Study and Empirical Evaluation", "authors": ["Sarah Fakhoury", "Aaditya Naik", "Georgios Sakkas", "Saikat Chakraborty", "Madan Musuvathi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3639478.3643525", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce a novel workflow, TICODER, designed to enhance the trust and accuracy of LLM-based code generation through interactive and guided intent formalization. TICODER partially formalizes ambigu", "doi": "10.1145/3639478.3643525"}
+{"id": "llmbased-interactive-code-2024", "title": "LLM-based Interactive Code Generation: Empirical Evaluation", "authors": ["D. Shaikhelislamov", "M. Drobyshevskiy", "Andrey Belevantsev"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ISPRAS64596.2024.10899123", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM -generated code is pr", "doi": "10.1109/ISPRAS64596.2024.10899123"}
+{"id": "rubric-all-you-2025", "title": "Rubric Is All You Need: Improving LLM-Based Code Evaluation With Question-Specific Rubrics", "authors": ["Aditya Pathak", "Rachit Gandhi", "Vaibhav Uttam", "Devansh", "Yashwanth Kumar Nakka"], "year": 2025, "venue": "International Computing Education Research Workshop", "source_url": "https://arxiv.org/abs/2503.23989", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Since the emergence of Large Language Models (LLMs) popularized by the release of GPT-3 and ChatGPT, LLMs have shown remarkable promise in programming-related tasks. While code generation using LLMs h", "arxiv_id": "2503.23989", "doi": "10.1145/3702652.3744220"}
+{"id": "copilot-arena-platform-2025", "title": "Copilot Arena: A Platform for Code LLM Evaluation in the Wild", "authors": ["Wayne Chi", "Valerie Chen", "A. Angelopoulos", "Wei-Lin Chiang", "Aditya Mittal"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.09328", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating in-the-wild coding capabilities of large language models (LLMs) is a challenging endeavor with no clear solution. We introduce Copilot Arena, a platform to collect user preferences for code", "arxiv_id": "2502.09328", "doi": "10.48550/arXiv.2502.09328"}
+{"id": "assessing-correctness-llmbased-2025", "title": "Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation", "authors": ["Arindam Sharma", "Cristina David"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.11620", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we explore uncertainty estimation as a proxy for correctness in LLM-generated code. To this end, we adapt two state-of-the-art techniques from natural language generation -- one based on", "arxiv_id": "2502.11620", "doi": "10.48550/arXiv.2502.11620"}
+{"id": "pennylang-pioneering-llmbased-2025", "title": "PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset", "authors": ["Haider Asif", "Abdul Basit", "Nouhaila Innan", "Muhammad Kashif", "Alberto Marchisio"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.02497", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) offer powerful capabilities in code generation, natural language understanding, and domain-specific reasoning. Their application to quantum software development remains li", "arxiv_id": "2503.02497", "doi": "10.48550/arXiv.2503.02497"}
+{"id": "projecteval-benchmark-programming-2025", "title": "ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation", "authors": ["Kaiyuan Liu", "Youcheng Pan", "Jing Li", "Daojing He", "Yang Xiang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.07010", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, LLM agents have made rapid progress in improving their programming capabilities. However, existing benchmarks lack the ability to automatically evaluate from users' perspective, and also lac", "arxiv_id": "2503.07010", "doi": "10.48550/arXiv.2503.07010"}
+{"id": "simulationguided-llmbased-code-2025", "title": "On Simulation-Guided LLM-based Code Generation for Safe Autonomous Driving Software", "authors": ["Ali Nouri", "Johan Andersson", "Kailash De Jesus Hornig", "Zhennan Fei", "Emil Knabe"], "year": 2025, "venue": "International Conference on Evaluation & Assessment in Software Engineering", "source_url": "https://arxiv.org/abs/2504.02141", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Driving System (ADS) is a safety-critical software system responsible for the interpretation of the vehicle’s environment and making decisions accordingly. The unbounded complexity of the dr", "arxiv_id": "2504.02141", "doi": "10.1145/3756681.3756987"}
+{"id": "inducing-vulnerable-code-2025", "title": "Inducing Vulnerable Code Generation in LLM Coding Assistants", "authors": ["Binqi Zeng", "Quan Zhang", "Chijin Zhou", "Gwihwan Go", "Yu Jiang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.15867", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Due to insufficient domain knowledge, LLM coding assistants often reference related solutions from the Internet to address programming problems. However, incorporating external information into LLMs' ", "arxiv_id": "2504.15867", "doi": "10.48550/arXiv.2504.15867"}
+{"id": "agentbased-evaluation-framework-2025", "title": "An Agent-based Evaluation Framework for Complex Code Generation", "authors": ["Xinchen Wang", "Ruida Hu", "Pengfei Gao", "Chao Peng", "Cuiyun Gao"], "year": 2025, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2504.13472", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated strong capabilities in code generation, underscoring the critical need for rigorous and comprehensive evaluation. Existing evaluation approaches fall int", "arxiv_id": "2504.13472", "doi": "10.1109/ASE63991.2025.00200"}
+{"id": "make-every-move-2024", "title": "Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS", "authors": ["Matthew DeLorenzo", "A. B. Chowdhury", "Vasudev Gohil", "Shailja Thakur", "Ramesh Karri"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.03289", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing large language models (LLMs) for register transfer level code generation face challenges like compilation failures and suboptimal power, performance, and area (PPA) efficiency. This is due to", "arxiv_id": "2402.03289", "doi": "10.48550/arXiv.2402.03289"}
+{"id": "deployabilitycentric-infrastructureascode-generation-2025", "title": "Deployability-Centric Infrastructure-as-Code Generation: An LLM-based Iterative Framework", "authors": ["Tianyi Zhang", "Shidong Pan", "Zejun Zhang", "Zhenchang Xing", "Xiaoyu Sun"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2506.05623", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2506.05623"}
+{"id": "agents4plc-automating-closedloop-2024", "title": "Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents", "authors": ["Zihan Liu", "Ruinan Zeng", "Dongxia Wang", "G. Peng", "Jingyi Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.14209", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In industrial control systems, the generation and verification of Programmable Logic Controller (PLC) code are critical for ensuring operational efficiency and safety. While Large Language Models (LLM", "arxiv_id": "2410.14209", "doi": "10.48550/arXiv.2410.14209"}
+{"id": "evaluation-code-llms-2024", "title": "Evaluation of Code LLMs on Geospatial Code Generation", "authors": ["Piotr Gramacki", "Bruno Martins", "Piotr Szyma'nski"], "year": 2024, "venue": "GeoAI@SIGSPATIAL", "source_url": "https://arxiv.org/abs/2410.04617", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software development support tools have been studied for a long time, with recent approaches using Large Language Models (LLMs) for code generation. These models can generate Python code for data scie", "arxiv_id": "2410.04617", "doi": "10.1145/3687123.3698286"}
+{"id": "can-chatgpt-support-2024", "title": "Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation", "authors": ["Kailun Jin", "Chung-Yu Wang", "Hung Viet Pham", "Hadi Hemmati"], "year": 2024, "venue": "IEEE Working Conference on Mining Software Repositories", "source_url": "https://arxiv.org/abs/2402.11702", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these", "arxiv_id": "2402.11702", "doi": "10.1145/3643991.3645074"}
+{"id": "verimind-agentic-llm-2025", "title": "VeriMind: Agentic LLM for Automated Verilog Generation with a Novel Evaluation Metric", "authors": ["Bardia Nadimi", "Ghali Omar Boutaib", "Hao Zheng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.16514", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Designing Verilog modules requires meticulous attention to correctness, efficiency, and adherence to design specifications. However, manually writing Verilog code remains a complex and time-consuming ", "arxiv_id": "2503.16514", "doi": "10.48550/arXiv.2503.16514"}
+{"id": "rubric-all-you-2025-2", "title": "Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics", "authors": ["Aditya Pathak", "Rachit Gandhi", "Vaibhav Uttam", "Devansh", "Yashwanth Kumar Nakka"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2503.23989", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2503.23989"}
+{"id": "drccoder-automated-drc-2024", "title": "DRC-Coder: Automated DRC Checker Code Generation Using LLM Autonomous Agent", "authors": ["Chen-Chia Chang", "Chia-Tung Ho", "Yaguang Li", "Yiran Chen", "Haoxing Ren"], "year": 2024, "venue": "ACM International Symposium on Physical Design", "source_url": "https://arxiv.org/abs/2412.05311", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the advanced technology nodes, the integrated design rule checker (DRC) is often utilized in place and route tools for fast optimization loops for power-performance-area. Implementing integrated DR", "arxiv_id": "2412.05311", "doi": "10.1145/3698364.3705347"}
+{"id": "user-centric-evaluation-2024", "title": "User Centric Evaluation of Code Generation Tools (Invited Paper)", "authors": ["Tanha Miah", "Hong Zhu"], "year": 2024, "venue": "International Conference on Artificial Intelligence Testing", "source_url": "https://arxiv.org/abs/2402.03130", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advance of machine learning (ML) technology, large language models (LLMs) are increasingly explored as an intelligent tool to generate program code from natural language specifications.", "arxiv_id": "2402.03130", "doi": "10.1109/AITest62860.2024.00022"}
+{"id": "can-llm-replace-2023", "title": "Can LLM Replace Stack Overflow? A Study on Robustness and Reliability of Large Language Model Code Generation", "authors": ["Li Zhong", "Zilong Wang"], "year": 2023, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2308.10335", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, large language models (LLMs) have shown an extraordinary ability to understand natural language and generate programming code. It has been a common practice for software engineers to consult", "arxiv_id": "2308.10335", "doi": "10.1609/aaai.v38i19.30185"}
+{"id": "automatic-generation-benchmarks-2024", "title": "Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks", "authors": ["E. Farchi", "Shmulik Froimovich", "Rami Katan", "Orna Raz"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.21071", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs can be used in a variety of code related tasks such as translating from one programming language to another, implementing natural language requirements and code summarization. Artifacts generated", "arxiv_id": "2410.21071", "doi": "10.48550/arXiv.2410.21071"}
+{"id": "bias-assessment-mitigation-2023", "title": "Bias Assessment and Mitigation in LLM-based Code Generation", "authors": ["Dong Huang", "Qingwen Bu", "Jie M. Zhang", "Xiaofei Xie", "Junjie Chen"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2309.14345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2309.14345"}
+{"id": "clarifygpt-empowering-llmbased-2023", "title": "ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification", "authors": ["Fangwen Mu", "Lin Shi", "Song Wang", "Zhuohao Yu", "Binquan Zhang"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.10996", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. In ", "arxiv_id": "2310.10996", "doi": "10.48550/arXiv.2310.10996"}
+{"id": "review-code-generation-2023", "title": "A Review on Code Generation with LLMs: Application and Evaluation", "authors": ["Jianxun Wang", "Yixiang Chen"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MedAI59581.2023.00044", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation is a longstanding subject in the field of computer science and software engineering, which aims at realizing an agent capable of writing code automatically aligning with human desire. ", "doi": "10.1109/MedAI59581.2023.00044"}
+{"id": "llm-generative-ai-2024", "title": "LLM Generative AI and Students’ Exam Code Evaluation: Qualitative and Quantitative Analysis", "authors": ["Ema Smolic", "Marko Pavelic", "Bartol Boras", "I. Mekterovic", "Tomislav Jagust"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MIPRO60963.2024.10569820", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Since the introduction of generative artificial intelligence (GAI) technology in the context of large language models (LLMs), it has been widely used for information extraction and/or extrapolation fr", "doi": "10.1109/MIPRO60963.2024.10569820"}
+{"id": "codepde-inference-framework-2025", "title": "CodePDE: An Inference Framework for LLM-driven PDE Solver Generation", "authors": ["Shanda Li", "Tanya Marwah", "Junhong Shen", "Weiwei Sun", "Andrej Risteski"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.08783", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Partial differential equations (PDEs) are fundamental to modeling physical systems, yet solving them remains a complex challenge. Traditional numerical solvers rely on expert knowledge to implement an", "arxiv_id": "2505.08783", "doi": "10.48550/arXiv.2505.08783"}
+{"id": "coffe-code-efficiency-2025", "title": "COFFE: A Code Efficiency Benchmark for Code Generation", "authors": ["Yun Peng", "Jun Wan", "Yichen Li", "Xiaoxue Ren"], "year": 2025, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2502.02827", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation has largely improved development efficiency in the era of large language models (LLMs). With the ability to follow instructions, current LLMs can be prompted to generate code solutions", "arxiv_id": "2502.02827", "doi": "10.1145/3715727"}
+{"id": "bitsaicr-automated-code-2025", "title": "BitsAI-CR: Automated Code Review via LLM in Practice", "authors": ["Tao Sun", "Jian Xu", "Yuanpeng Li", "Zhaodong Yan", "Ge Zhang"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2501.15134", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review remains a critical yet resource-intensive process in software development, particularly challenging in large-scale industrial environments. While Large Language Models (LLMs) show promise ", "arxiv_id": "2501.15134", "doi": "10.1145/3696630.3728552"}
+{"id": "chatassert-llmbased-test-2025", "title": "ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance", "authors": ["I. Hayet", "Adam Scott", "Marcelo d’Amorim"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2024.3519159", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the", "doi": "10.1109/TSE.2024.3519159"}
+{"id": "soleval-benchmarking-large-2025", "title": "SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation", "authors": ["Zhiyuan Peng", "Xin Yin", "Rui Qian", "Peiqin Lin", "Yongkang Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.18793", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have transformed code generation. However, most existing approaches focus on mainstream languages such as Python and Java, neglecting the Solidity language, the predominan", "arxiv_id": "2502.18793", "doi": "10.48550/arXiv.2502.18793"}
+{"id": "safegenbench-benchmark-framework-2025", "title": "SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code", "authors": ["Xinghang Li", "Jingzhe Ding", "Chao Peng", "Bing Zhao", "Xiang Gao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.05692", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The code generation capabilities of large language models(LLMs) have emerged as a critical dimension in evaluating their overall performance. However, prior research has largely overlooked the securit", "arxiv_id": "2506.05692", "doi": "10.48550/arXiv.2506.05692"}
+{"id": "llm-test-generation-2025", "title": "LLM Test Generation via Iterative Hybrid Program Analysis", "authors": ["Sijia Gu", "Noor Nashid", "Ali Mesbah"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.13580", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automating unit test generation remains a significant challenge, particularly for complex methods in real-world projects. While Large Language Models (LLMs) have made strides in code generation, they ", "arxiv_id": "2503.13580", "doi": "10.1145/3744916.3764553"}
+{"id": "verilogeval-evaluating-large-2023", "title": "VerilogEval: Evaluating Large Language Models for Verilog Code Generation", "authors": ["Mingjie Liu", "N. Pinckney", "Brucek Khailany", "Haoxing Ren"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2309.07544", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating ", "arxiv_id": "2309.07544", "doi": "10.48550/arXiv.2309.07544"}
+{"id": "performance-study-llmgenerated-2024", "title": "A Performance Study of LLM-Generated Code on Leetcode", "authors": ["Tristan Coignion", "Clément Quinton", "Romain Rouvoy"], "year": 2024, "venue": "International Conference on Evaluation & Assessment in Software Engineering", "source_url": "https://arxiv.org/abs/2407.21579", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study evaluates the efficiency of code generation by Large Language Models (LLMs) and measures their performance against human-crafted solutions using a dataset from Leetcode. We compare 18 LLMs,", "arxiv_id": "2407.21579", "doi": "10.1145/3661167.3661221"}
+{"id": "verina-benchmarking-verifiable-2025", "title": "VERINA: Benchmarking Verifiable Code Generation", "authors": ["Zhe Ye", "Zhengxu Yan", "Jingxuan He", "Timothé Kasriel", "Kaiyu Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.23135", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable", "arxiv_id": "2505.23135", "doi": "10.48550/arXiv.2505.23135"}
+{"id": "tigercoder-novel-suite-2025", "title": "TigerCoder: A Novel Suite of LLMs for Code Generation in Bangla", "authors": ["Nishat Raihan", "Antonios Anastasopoulos", "Marcos Zampieri"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.09101", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite being the 5th most spoken language, Bangla remains underrepresented in Large Language Models (LLMs), particularly for code generation. This primarily stems from the scarcity of high-quality da", "arxiv_id": "2509.09101", "doi": "10.48550/arXiv.2509.09101"}
+{"id": "evaluating-language-models-2024", "title": "Evaluating Language Models for Efficient Code Generation", "authors": ["Jiawei Liu", "Songru Xie", "Junhao Wang", "Yuxiang Wei", "Yifeng Ding"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.06450", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce Differential Performance Evaluation (DPE), a framework designed to reliably evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding benchmarks often fail t", "arxiv_id": "2408.06450", "doi": "10.48550/arXiv.2408.06450"}
+{"id": "secureagentbench-benchmarking-secure-2025", "title": "SecureAgentBench: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios", "authors": ["Junkai Chen", "Huihui Huang", "Yunbo Lyu", "Junwen An", "Jieke Shi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.22097", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) powered code agents are rapidly transforming software engineering by automating tasks such as testing, debugging, and repairing, yet the security risks of their generated co", "arxiv_id": "2509.22097", "doi": "10.48550/arXiv.2509.22097"}
+{"id": "codejudge-evaluating-code-2024", "title": "CodeJudge: Evaluating Code Generation with Large Language Models", "authors": ["Weixi Tong", "Tianyi Zhang"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2410.02184", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown promising performance in code generation. However, how to reliably evaluate code generated by LLMs remains an unresolved problem. This paper presents CodeJudge,", "arxiv_id": "2410.02184", "doi": "10.48550/arXiv.2410.02184"}
+{"id": "projecttest-projectlevel-llm-2025", "title": "ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms", "authors": ["Yibo Wang", "Congying Xia", "Wenting Zhao", "Jiangshu Du", "Chunyu Miao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.06556", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit test generation has become a promising and important use case of LLMs. However, existing evaluation benchmarks for assessing LLM unit test generation capabilities focus on function- or class-leve", "arxiv_id": "2502.06556", "doi": "10.48550/arXiv.2502.06556"}
+{"id": "matplotagent-method-evaluation-2024", "title": "MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization", "authors": ["Zhiyu Yang", "Zihan Zhou", "Shuo Wang", "X. Cong", "Xu Han"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.11453", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance,", "arxiv_id": "2402.11453", "doi": "10.48550/arXiv.2402.11453"}
+{"id": "does-reasoning-introduce-2025", "title": "Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning", "authors": ["Xuyang Wu", "Jinming Nian", "Ting-ruen Wei", "Zhiqiang Tao", "Hsin-Tai Wu"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2502.15361", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models (LLMs) have enabled automatic generation of chain-of-thought (CoT) reasoning, leading to strong performance on tasks such as math and code. However, when reaso", "arxiv_id": "2502.15361", "doi": "10.18653/v1/2025.findings-emnlp.1006"}
+{"id": "deepcrceval-revisiting-evaluation-2024", "title": "DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation", "authors": ["Junyi Lu", "Xiaojia Li", "Zihan Hua", "Lei Yu", "Shiqi Cheng"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.18291", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review is a vital but demanding aspect of software development, generating significant interest in automating review comments. Traditional evaluation methods for these comments, primarily based o", "arxiv_id": "2412.18291", "doi": "10.48550/arXiv.2412.18291"}
+{"id": "advancing-code-generation-2025", "title": "Towards Advancing Code Generation with Large Language Models: A Research Roadmap", "authors": ["Haolin Jin", "Huaming Chen", "Qinghua Lu", "Liming Zhu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.11354", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, we have witnessed the rapid development of large language models, which have demonstrated excellent capabilities in the downstream task of code generation. However, despite their potential, ", "arxiv_id": "2501.11354", "doi": "10.48550/arXiv.2501.11354"}
+{"id": "dscodebench-realistic-benchmark-2025", "title": "DSCodeBench: A Realistic Benchmark for Data Science Code Generation", "authors": ["Shuyin Ouyang", "Dong Huang", "Jingwen Guo", "Zeyu Sun", "Qihao Zhu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2505.15621", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce DSCodeBench, a new benchmark designed to evaluate large language models (LLMs) on complicated and realistic data science code generation tasks. DSCodeBench consists of 1,000 carefully con", "arxiv_id": "2505.15621"}
+{"id": "openllmrtl-open-dataset-2024", "title": "OpenLLM-RTL: Open Dataset and Benchmark for LLM-Aided Design RTL Generation: Invited Paper", "authors": ["S. Ahsan", "M. S. Shahriar", "Mrittika Chowdhury", "Tanvir Hossain", "Md Sakib Hasan"], "year": 2024, "venue": "International Conference on Computer Aided Design", "source_url": "https://arxiv.org/abs/2503.15112", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The automated generation of design RTL based on large language model (LLM) and natural language instructions has demonstrated great potential in agile circuit design. However, the lack of datasets and", "arxiv_id": "2503.15112", "doi": "10.1145/3676536.3697118"}
+{"id": "geocodegpt-large-language-2024", "title": "GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks", "authors": ["Shuyang Hou", "Zhangxiao Shen", "Anqi Zhao", "Jianyuan Liang", "Zhipeng Gui"], "year": 2024, "venue": "International Journal of Applied Earth Observation and Geoinformation", "source_url": "https://arxiv.org/abs/2410.17031", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing demand for spatiotemporal data and modeling tasks in geosciences has made geospatial code generation technology a critical factor in enhancing productivity. Although large language mode", "arxiv_id": "2410.17031", "doi": "10.1016/j.jag.2025.104456"}
+{"id": "ldscene-llmguided-diffusion-2025", "title": "LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios", "authors": ["Mingxing Peng", "Yuting Xie", "Xusen Guo", "Ruoyu Yao", "Hai Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.11247", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Ensuring the safety and robustness of autonomous driving systems necessitates a comprehensive evaluation in safety-critical scenarios. However, these safety-critical scenarios are rare and difficult t", "arxiv_id": "2505.11247", "doi": "10.48550/arXiv.2505.11247"}
+{"id": "lost-mix-evaluating-2025", "title": "Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text", "authors": ["Amr Mohamed", "Yang Zhang", "M. Vazirgiannis", "Guokan Shang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.14012", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code-switching (CSW) is the act of alternating between two or more languages within a single discourse. This phenomenon is widespread in multilingual communities, and increasingly prevalent in online ", "arxiv_id": "2506.14012", "doi": "10.48550/arXiv.2506.14012"}
+{"id": "large-language-models-2025-3", "title": "Using Large Language Models for Aerospace Code Generation: Methods, Benchmarks, and Potential Values", "authors": ["Rui He", "Liang Zhang", "Mengyao Lyu", "Liangqing Lyu", "Changbin Xue"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/aerospace12060498", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, Large Language Models (LLMs) have witnessed rapid advancements, revolutionizing various domains. Within the realm of software development, code generation technology powered by LLMs h", "doi": "10.3390/aerospace12060498"}
+{"id": "datadreamer-tool-synthetic-2024", "title": "DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows", "authors": ["Ajay Patel", "Colin Raffel", "Chris Callison-Burch"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.10379", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in synthetic data generation, task evaluation, fin", "arxiv_id": "2402.10379", "doi": "10.48550/arXiv.2402.10379"}
+{"id": "copilot-evaluation-harness-2024", "title": "Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming", "authors": ["Anisha Agarwal", "Aaron Chan", "Shubham Chandel", "Jinu Jang", "S. Miller"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.14261", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into Development Environments (IDEs) has become a focal point in modern software development. LLMs such as OpenAI GPT-3.5/4 and Code Llama offer the pot", "arxiv_id": "2402.14261", "doi": "10.48550/arXiv.2402.14261"}
+{"id": "bias-unveiled-investigating-2024", "title": "Bias Unveiled: Investigating Social Bias in LLM-Generated Code", "authors": ["Lin Ling", "Fazle Rabbi", "Song Wang", "Jinqiu Yang"], "year": 2024, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2411.10351", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have significantly advanced the field of automated code generation. However, a notable research gap exists in evaluating social biases that may be present in the code prod", "arxiv_id": "2411.10351", "doi": "10.48550/arXiv.2411.10351"}
+{"id": "codebenchgen-creating-scalable-2024", "title": "CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks", "authors": ["Yiqing Xie", "Alex Xie", "Divyanshu Sheth", "Pengfei Liu", "Daniel Fried"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.00566", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited ", "arxiv_id": "2404.00566", "doi": "10.48550/arXiv.2404.00566"}
+{"id": "leetcodedataset-temporal-dataset-2025", "title": "LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs", "authors": ["Yunhui Xia", "Wei Shen", "Yan Wang", "Jason Klein Liu", "Hui Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.14655", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce LeetCodeDataset, a high-quality benchmark for evaluating and training code-generation models, addressing two key challenges in LLM research: the lack of reasoning-focused coding benchmark", "arxiv_id": "2504.14655", "doi": "10.48550/arXiv.2504.14655"}
+{"id": "vhdleval-framework-evaluating-2024", "title": "VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation", "authors": ["Prashanth Vijayaraghavan", "Luyao Shi", "S. Ambrogio", "C. Mackin", "Apoorva Nitsure"], "year": 2024, "venue": "2024 IEEE LLM Aided Design Workshop (LAD)", "source_url": "https://arxiv.org/abs/2406.04379", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the unprecedented advancements in Large Language Models (LLMs), their application domains have expanded to include code generation tasks across various programming languages. While significant pr", "arxiv_id": "2406.04379", "doi": "10.1109/LAD62341.2024.10691836"}
+{"id": "codexity-secure-aiassisted-2024", "title": "Codexity: Secure AI-assisted Code Generation", "authors": ["Sung Yong Kim", "Zhiyu Fan", "Yannic Noller", "Abhik Roychoudhury"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.03927", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI progr", "arxiv_id": "2405.03927", "doi": "10.48550/arXiv.2405.03927"}
+{"id": "deep-dive-into-2024", "title": "A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?", "authors": ["Qihong Chen", "Jiawei Li", "Jie Deng", "JiaCheng Yu", "Justin Tian Jin Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.01414", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in Large Language Models (LLMs) have led to their widespread application in automated code generation. However, these models can still generate defective code that deviates from th", "arxiv_id": "2411.01414", "doi": "10.48550/arXiv.2411.01414"}
+{"id": "aligning-crowdsourced-human-2024", "title": "Aligning Crowd-sourced Human Feedback for Code Generation with Bayesian Inference", "authors": ["M. Wong", "C. Tan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CAI59869.2024.00037", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) excel at code generation, translating abstract descriptions into robust and functional code remains a significant challenge. Despite dedicated efforts, existing work", "doi": "10.1109/CAI59869.2024.00037"}
+{"id": "first-look-at-2024", "title": "A First Look at License Compliance Capability of LLMs in Code Generation", "authors": ["Weiwei Xu", "Kai Gao", "Hao He", "Minghui Zhou"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2408.02487", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2408.02487"}
+{"id": "no-need-lift-2023", "title": "No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT", "authors": ["Zhijie Liu", "Yutian Tang", "Xiapu Luo", "Yuming Zhou", "L. Zhang"], "year": 2023, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2308.04838", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various natural language processing (NLP) tasks, such as machine translation, question answering, summarization, and so on", "arxiv_id": "2308.04838", "doi": "10.1109/TSE.2024.3392499"}
+{"id": "ansible-lightspeed-code-2024", "title": "Ansible Lightspeed: A Code Generation Service for IT Automation", "authors": ["Priyam Sahoo", "Saurabh Pujar", "Ganesh Nalawade", "Richard Gebhardt", "Louis Mandel"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3691620.3695277", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs which de", "doi": "10.1145/3691620.3695277"}
+{"id": "codescore-evaluating-code-2023", "title": "CodeScore: Evaluating Code Generation by Learning Code Execution", "authors": ["Yihong Dong", "J. Ding", "Xue Jiang", "Zhuo Li", "Ge Li"], "year": 2023, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2301.09043", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU", "arxiv_id": "2301.09043", "doi": "10.1145/3695991"}
+{"id": "codesift-llmbased-referenceless-2024", "title": "CodeSift: An LLM-Based Reference-Less Framework for Automatic Code Validation", "authors": ["Pooja Aggarwal", "Oishik Chatterjee", "Ting Dai", "P. Mohapatra", "B. Paulovicks"], "year": 2024, "venue": "IEEE International Conference on Cloud Computing", "source_url": "https://arxiv.org/abs/2408.15630", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of large language models (LLMs) has greatly facilitated code generation, but ensuring the functional correctness of generated code remains a challenge. Traditional validation methods are of", "arxiv_id": "2408.15630", "doi": "10.1109/CLOUD62652.2024.00052"}
+{"id": "identifying-inaccurate-descriptions-2024", "title": "Identifying Inaccurate Descriptions in LLM-generated Code Comments via Test Execution", "authors": ["Sungmin Kang", "Louis Milliken", "Shin Yoo"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.14836", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software comments are critical for human understanding of software, and as such many comment generation techniques have been proposed. However, we find that a systematic evaluation of the factual accu", "arxiv_id": "2406.14836", "doi": "10.48550/arXiv.2406.14836"}
+{"id": "deepcircuitx-comprehensive-repositorylevel-2025", "title": "DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis", "authors": ["Zeju Li", "Changran Xu", "Zhengyuan Shi", "Zedong Peng", "Yi Liu"], "year": 2025, "venue": "2025 IEEE International Conference on LLM-Aided Design (ICLAD)", "source_url": "https://arxiv.org/abs/2502.18297", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis", "arxiv_id": "2502.18297", "doi": "10.1109/ICLAD65226.2025.00029"}
+{"id": "coladder-supporting-programmers-2023", "title": "CoLadder: Supporting Programmers with Hierarchical Code Generation in Multi-Level Abstraction", "authors": ["Ryan Yen", "J. Zhu", "Sangho Suh", "Haijun Xia", "Jian Zhao"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.08699", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Programmers increasingly rely on Large Language Models (LLMs) for code generation. However, misalignment between programmers' goals and generated code complicates the code evaluation process and deman", "arxiv_id": "2310.08699", "doi": "10.48550/arXiv.2310.08699"}
+{"id": "benchmarking-large-language-2022", "title": "Benchmarking Large Language Models for Automated Verilog RTL Code Generation", "authors": ["Shailja Thakur", "Baleegh Ahmad", "Zhenxing Fan", "H. Pearce", "Benjamin Tan"], "year": 2022, "venue": "Design, Automation and Test in Europe", "source_url": "https://arxiv.org/abs/2212.11140", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automating hardware design could obviate a signif-icant amount of human error from the engineering process and lead to fewer errors. Verilog is a popular hardware description language to model and des", "arxiv_id": "2212.11140", "doi": "10.23919/DATE56975.2023.10137086"}
+{"id": "beyond-static-datasets-2023", "title": "Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation", "authors": ["Jiatong Li", "Rui Li", "Qi Liu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2309.04369", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have made progress in various real-world tasks, which stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are mainly supervised signal-base", "arxiv_id": "2309.04369", "doi": "10.48550/arXiv.2309.04369"}
+{"id": "interactive-code-generation-2022", "title": "Interactive Code Generation via Test-Driven User-Intent Formalization", "authors": ["Shuvendu K. Lahiri", "Aaditya Naik", "Georgios Sakkas", "Piali Choudhury", "Curtis von Veh"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2208.05950", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, when interacting with", "arxiv_id": "2208.05950", "doi": "10.48550/arXiv.2208.05950"}
+{"id": "novel-preprocessing-technique-2023", "title": "Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model", "authors": ["Yu-Chen Lin", "Akhilesh Kumar", "Norman Chang", "Wen-Liang Zhang", "Muhammad Zakir"], "year": 2023, "venue": "2024 IEEE LLM Aided Design Workshop (LAD)", "source_url": "https://arxiv.org/abs/2311.16267", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce four principal contributions to augment the capabilities of Large Language Models (LLMs) in generating domain-specific code: (i) leveraging LLM-based data splitting and data renovation te", "arxiv_id": "2311.16267", "doi": "10.1109/LAD62341.2024.10691715"}
+{"id": "not-all-metrics-2023", "title": "Not All Metrics Are Guilty: Improving NLG Evaluation with LLM Paraphrasing", "authors": ["Tianyi Tang", "Hongyuan Lu", "Y. Jiang", "Haoyang Huang", "Dongdong Zhang"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2305.15067", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2305.15067"}
+{"id": "autoverus-automated-proof-2024", "title": "AutoVerus: Automated Proof Generation for Rust Code", "authors": ["Chenyuan Yang", "Xuheng Li", "Md Rakib Hossain Misu", "Jianan Yao", "Weidong Cui"], "year": 2024, "venue": "Proc. ACM Program. Lang.", "source_url": "https://arxiv.org/abs/2409.13082", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI has shown its value for many software engineering tasks. Still in its infancy, large language model (LLM)-based proof generation lags behind LLM-based code generation. In this paper, we ", "arxiv_id": "2409.13082", "doi": "10.1145/3763174"}
+{"id": "analysis-evaluation-synthetic-2025", "title": "Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection", "authors": ["Jinming Zhang", "Xuanru Zhou", "Jiachen Lian", "Shuhe Li", "William Li"], "year": 2025, "venue": "Interspeech", "source_url": "https://arxiv.org/abs/2505.22029", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS ", "arxiv_id": "2505.22029", "doi": "10.48550/arXiv.2505.22029"}
+{"id": "episodic-memories-generation-2025", "title": "Episodic Memories Generation and Evaluation Benchmark for Large Language Models", "authors": ["Alexis Huet", "Zied Ben-Houidi", "Dario Rossi"], "year": 2025, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2501.13121", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Episodic memory -- the ability to recall specific events grounded in time and space -- is a cornerstone of human cognition, enabling not only coherent storytelling, but also planning and decision-maki", "arxiv_id": "2501.13121", "doi": "10.48550/arXiv.2501.13121"}
+{"id": "can-llms-replace-2025", "title": "Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering", "authors": ["Ruiqi Wang", "Jiyu Guo", "Cuiyun Gao", "Guodong Fan", "Chun Yong Chong"], "year": 2025, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2502.06193", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, large language models (LLMs) have been deployed to tackle various software engineering (SE) tasks like code generation, significantly advancing the automation of SE tasks. However, assessing", "arxiv_id": "2502.06193", "doi": "10.1145/3728963"}
+{"id": "codediting-reasoningbased-metric-2025", "title": "CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation", "authors": ["Guang Yang", "Yu Zhou", "Xiang Chen", "Wei Zheng", "Xing Hu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.19502", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Trustworthy evaluation methods for code snippets play a crucial role in neural code generation. Traditional methods, which either rely on reference solutions or require executable test cases, have inh", "arxiv_id": "2505.19502", "doi": "10.48550/arXiv.2505.19502"}
+{"id": "from-llm-reasoning-2025", "title": "From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review", "authors": ["M. Ferrag", "Norbert Tihanyi", "M. Debbah"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.19678", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models and autonomous AI agents have evolved rapidly, resulting in a diverse array of evaluation benchmarks, frameworks, and collaboration protocols. However, the landscape remains frag", "arxiv_id": "2504.19678", "doi": "10.48550/arXiv.2504.19678"}
+{"id": "large-language-model-2024-2", "title": "Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities", "authors": ["Hao Zhou", "Chengming Hu", "Ye Yuan", "Yufei Cui", "Yili Jin"], "year": 2024, "venue": "IEEE Communications Surveys and Tutorials", "source_url": "https://arxiv.org/abs/2405.10825", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement ", "arxiv_id": "2405.10825", "doi": "10.1109/COMST.2024.3465447"}
+{"id": "codescope-executionbased-multilingual-2023", "title": "CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation", "authors": ["Weixiang Yan", "Haitian Liu", "Yunkun Wang", "Yunzhe Li", "Qian Chen"], "year": 2023, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2311.08588", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code u", "arxiv_id": "2311.08588", "doi": "10.48550/arXiv.2311.08588"}
+{"id": "secbench-automated-benchmarking-2025", "title": "SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks", "authors": ["Hwiwon Lee", "Ziqi Zhang", "Hanxiao Lu", "Lingming Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.11791", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Rigorous security-focused evaluation of large language model (LLM) agents is imperative for establishing trust in their safe deployment throughout the software development lifecycle. However, existing", "arxiv_id": "2506.11791", "doi": "10.48550/arXiv.2506.11791"}
+{"id": "llmassisted-static-analysis-2024", "title": "LLM-Assisted Static Analysis for Detecting Security Vulnerabilities", "authors": ["Ziyang Li", "Saikat Dutta", "Mayur Naik"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2405.17238", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice due to their reliance on human labeled specifications. Large language models", "arxiv_id": "2405.17238", "doi": "10.48550/arXiv.2405.17238"}
+{"id": "evaluating-judges-as-2025", "title": "Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators", "authors": ["Yilun Zhou", "Austin Xu", "Peifeng Wang", "Caiming Xiong", "Shafiq Joty"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2504.15253", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling test-time computation, or affording a generator large language model (LLM) extra compute during inference, typically employs the help of external non-generative evaluators (i.e., reward models", "arxiv_id": "2504.15253", "doi": "10.48550/arXiv.2504.15253"}
+{"id": "code-centric-evaluation-2023", "title": "A Code Centric Evaluation of C/C++ Vulnerability Datasets for Deep Learning Based Vulnerability Detection Techniques", "authors": ["Ridhi Jain", "Nicole Gervasoni", "Mthandazo Ndhlovu", "Sanjay Rawat"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3578527.3578530", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent years have witnessed tremendous progress in NLP-based code comprehension via deep neural networks (DNN) learning, especially Large Language Models (LLMs). While the original application of LLMs", "doi": "10.1145/3578527.3578530"}
+{"id": "enhancing-llm-factual-2024", "title": "Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases", "authors": ["Jiarui Li", "Ye Yuan", "Zehua Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.10446", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We proposed an end-to-end system design towards utilizing Retrieval Augmented Generation (RAG) to improve the factual accuracy of Large Language Models (LLMs) for domain-specific and time-sensitive qu", "arxiv_id": "2403.10446", "doi": "10.48550/arXiv.2403.10446"}
+{"id": "everything-you-wanted-2025", "title": "Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask", "authors": ["Yue Li", "Xiao Li", "Hao Wu", "Minghui Xu", "Yue Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.13474", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models are a promising tool for automated vulnerability detection, thanks to their success in code generation and repair. However, despite widespread adoption, a critical question remai", "arxiv_id": "2504.13474", "doi": "10.48550/arXiv.2504.13474"}
+{"id": "top-leaderboard-ranking-2024", "title": "Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM", "authors": ["Chun Xia", "Yinlin Deng", "Lingming Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.19114", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs have become the go-to choice for code generation tasks, with an exponential increase in the training, development, and usage of LLMs specifically for code generation. To evaluate the ability of L", "arxiv_id": "2403.19114", "doi": "10.48550/arXiv.2403.19114"}
+{"id": "internbootcamp-technical-report-2025", "title": "InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling", "authors": ["Peiji Li", "Jiasheng Ye", "Yongkang Chen", "Yichuan Ma", "Zijie Yu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.08636", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have revolutionized artificial intelligence by enabling complex reasoning capabilities. While recent advancements in reinforcement learning (RL) have primarily focused on ", "arxiv_id": "2508.08636", "doi": "10.48550/arXiv.2508.08636"}
+{"id": "codejudgebench-benchmarking-llmasajudge-2025", "title": "CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks", "authors": ["Hongchao Jiang", "Yiming Chen", "Yushi Cao", "Hung-yi Lee", "R. Tan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.10535", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have significantly advanced the state-of-the-art in various coding tasks. Beyond directly answering user queries, LLMs can also serve as judges, assessing and comparing th", "arxiv_id": "2507.10535", "doi": "10.48550/arXiv.2507.10535"}
+{"id": "reasoning-runtime-behavior-2024", "title": "Reasoning Runtime Behavior of a Program with LLM: How Far are We?", "authors": ["Junkai Chen", "Zhiyuan Pan", "Xing Hu", "Zhenhao Li", "Ge Li"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2403.16437", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models for code (i.e., code LLMs) have shown strong code understanding and generation capabilities. To evaluate the capabilities of code LLMs in various aspects, many benchmarks have be", "arxiv_id": "2403.16437", "doi": "10.1109/ICSE55347.2025.00012"}
+{"id": "aime-ai-system-2024", "title": "AIME: AI System Optimization via Multiple LLM Evaluators", "authors": ["Bhrij Patel", "Souradip Chakraborty", "Wesley A. Suttle", "Mengdi Wang", "A. S. Bedi"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.03131", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Text-based AI system optimization typically involves a feedback loop scheme where a single LLM generates an evaluation in natural language of the current output to improve the next iteration's output.", "arxiv_id": "2410.03131", "doi": "10.48550/arXiv.2410.03131"}
+{"id": "vericontaminated-assessing-llmdriven-2025", "title": "VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination", "authors": ["Zeng Wang", "Minghao Shao", "Jitendra Bhandari", "Likhitha Mankali", "Ramesh Karri"], "year": 2025, "venue": "2025 IEEE International Conference on LLM-Aided Design (ICLAD)", "source_url": "https://arxiv.org/abs/2503.13572", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized code generation, achieving exceptional results on various established benchmarking frameworks. However, concerns about data contamination—where benchma", "arxiv_id": "2503.13572", "doi": "10.1109/ICLAD65226.2025.00017"}
+{"id": "llmsecconfig-llmbased-approach-2025", "title": "LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations", "authors": ["Ziyang Ye", "T. Huynh", "M. Le", "M. A. Babar"], "year": 2025, "venue": "IEEE Working Conference on Mining Software Repositories", "source_url": "https://arxiv.org/abs/2502.02009", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the ", "arxiv_id": "2502.02009", "doi": "10.1109/MSR66628.2025.00099"}
+{"id": "salad-systematic-assessment-2025", "title": "SALAD: Systematic Assessment of Machine Unlearning on LLM-Aided Hardware Design", "authors": ["Zeng Wang", "Minghao Shao", "R. Karn", "Likhitha Mankali", "Jitendra Bhandari"], "year": 2025, "venue": "Workshop on Machine Learning for CAD", "source_url": "https://arxiv.org/abs/2506.02089", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) offer transformative capabilities for hardware design automation, particularly in Verilog code generation. However, they also pose significant data security challenges, in", "arxiv_id": "2506.02089", "doi": "10.1109/MLCAD65511.2025.11189152"}
+{"id": "granite-code-models-2024", "title": "Granite Code Models: A Family of Open Foundation Models for Code Intelligence", "authors": ["Mayank Mishra", "Matt Stallone", "Gaoyuan Zhang", "Yikang Shen", "Aditya Prasad"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.04324", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the pr", "arxiv_id": "2405.04324", "doi": "10.48550/arXiv.2405.04324"}
+{"id": "from-code-courtroom-2025", "title": "From Code to Courtroom: LLMs as the New Software Judges", "authors": ["Junda He", "Jieke Shi", "Terry Yue Zhuo", "Christoph Treude", "Jiamou Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.02246", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, Large Language Models (LLMs) have been increasingly used to automate SE tasks such as code generation and summarization. However, evaluating the quality of LLM-generated software artifacts r", "arxiv_id": "2503.02246", "doi": "10.48550/arXiv.2503.02246"}
+{"id": "swerebench-automated-pipeline-2025", "title": "SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents", "authors": ["Ibragim Badertdinov", "Alexander Golubev", "Maksim Nekrashevich", "Anton Shevtsov", "Simon Karasik"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.20411", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training dat", "arxiv_id": "2505.20411", "doi": "10.48550/arXiv.2505.20411"}
+{"id": "how-beginning-programmers-2024", "title": "How Beginning Programmers and Code LLMs (Mis)read Each Other", "authors": ["S. Nguyen", "Hannah McLean Babe", "Yangtian Zi", "Arjun Guha", "C. Anderson"], "year": 2024, "venue": "International Conference on Human Factors in Computing Systems", "source_url": "https://arxiv.org/abs/2401.15232", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interac", "arxiv_id": "2401.15232", "doi": "10.1145/3613904.3642706"}
+{"id": "vieva-llm-conceptual-2024", "title": "Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations", "authors": ["L. Podo", "M. Ishmal", "M. Angelini"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.02167", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The automatic generation of visualizations is an old task that, through the years, has shown more and more interest from the research and practitioner communities. Recently, large language models (LLM", "arxiv_id": "2402.02167", "doi": "10.48550/arXiv.2402.02167"}
+{"id": "llmdriven-testing-autonomous-2024", "title": "LLM-Driven Testing for Autonomous Driving Scenarios", "authors": ["Nenad Petrovic", "Krzysztof Lebioda", "Vahid Zolfaghari", "André Schamschurko", "Sven Kirchner"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FLLM63129.2024.10852505", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we explore the potential of leveraging Large Language Models (LLMs) for automated test generation based on free-form textual descriptions in area of automotive. As outcome, we implement", "doi": "10.1109/FLLM63129.2024.10852505"}
+{"id": "performance-review-llm-2024", "title": "Performance Review on LLM for solving leetcode problems", "authors": ["Lun Wang", "Chuanqi Shi", "Shaoshui Du", "Yiyi Tao", "Yixian Shen"], "year": 2024, "venue": "2024 4th International Symposium on Artificial Intelligence and Intelligent Manufacturing (AIIM)", "source_url": "https://arxiv.org/abs/2502.15770", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a comprehensive performance evaluation of Large Language Models (LLMs) in solving programming challenges from Leetcode, a widely used platform for algorithm practice and technical ", "arxiv_id": "2502.15770", "doi": "10.1109/AIIM64537.2024.10934280"}
+{"id": "comprehensive-verilog-design-2025", "title": "Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification", "authors": ["N. Pinckney", "Chenhui Deng", "Chia-Tung Ho", "Yun-Da Tsai", "Mingjie Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.14074", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present the Comprehensive Verilog Design Problems (CVDP) benchmark, a new dataset and infrastructure to advance LLM and agent research in hardware design and verification. CVDP includes 783 problem", "arxiv_id": "2506.14074", "doi": "10.48550/arXiv.2506.14074"}
+{"id": "assessing-impact-code-2025", "title": "Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models", "authors": ["Sabaat Haroon", "Ahmad Faraz Khan", "Ahmad Humayun", "Waris Gill", "Abdul Haddi Amjad"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2504.04372", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Large Language Models (LLMs) are increasingly used in non-generative software maintenance tasks, such as fault localization (FL). Success in FL depends on a models ability to reason about p", "arxiv_id": "2504.04372"}
+{"id": "llm-agent-fire-2024", "title": "LLM Agent for Fire Dynamics Simulations", "authors": ["Lei Xu", "Danyal Mohaddes", "Yi Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.17146", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Significant advances have been achieved in leveraging foundation models, such as large language models (LLMs), to accelerate complex scientific workflows. In this work we introduce FoamPilot, a proof-", "arxiv_id": "2412.17146", "doi": "10.48550/arXiv.2412.17146"}
+{"id": "scales-justitia-comprehensive-2025", "title": "The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs", "authors": ["Songyang Liu", "Chaozhuo Li", "Jiameng Qiu", "Xi Zhang", "Feiran Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.11094", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of artificial intelligence, Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), including content generation, human-compute", "arxiv_id": "2506.11094", "doi": "10.48550/arXiv.2506.11094"}
+{"id": "uda-benchmark-suite-2024", "title": "UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis", "authors": ["Yulong Hui", "Yao Lu", "Huanchen Zhang"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2406.15187", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The use of Retrieval-Augmented Generation (RAG) has improved Large Language Models (LLMs) in collaborating with external data, yet significant challenges exist in real-world scenarios. In areas such a", "arxiv_id": "2406.15187", "doi": "10.48550/arXiv.2406.15187"}
+{"id": "evaluating-efficiency-source-2024", "title": "On Evaluating the Efficiency of Source Code Generated by LLMs", "authors": ["Changan Niu", "Ting Zhang", "Chuanyi Li", "Bin Luo", "Vincent Ng"], "year": 2024, "venue": "2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge) Conference Acronym:", "source_url": "https://arxiv.org/abs/2404.06041", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent years have seen the remarkable capabilities of large language models (LLMs) for code generation. Different from existing work that evaluate the correctness of the code generated by LLMs, we pro", "arxiv_id": "2404.06041", "doi": "10.1145/3650105.3652295"}
+{"id": "effilearner-enhancing-efficiency-2024", "title": "EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization", "authors": ["Dong Huang", "Jianbo Dai", "Han Weng", "Puzhen Wu", "Yuhao Qing"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2405.15189", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown remarkable progress in code generation, but their generated code often suffers from inefficiency, resulting in longer execution times and higher memory consumpt", "arxiv_id": "2405.15189", "doi": "10.52202/079017-2684"}
+{"id": "codetm4-detecting-machinegenerated-2025", "title": "CoDet-M4: Detecting Machine-Generated Code in Multi-Lingual, Multi-Generator and Multi-Domain Settings", "authors": ["Daniil Orel", "Dilshod Azizov", "Preslav Nakov"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.13733", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have revolutionized code generation, automating programming with remarkable efficiency. However, these advancements challenge programming skills, ethics, and assessment in", "arxiv_id": "2503.13733", "doi": "10.48550/arXiv.2503.13733"}
+{"id": "researchcodebench-benchmarking-llms-2025", "title": "ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code", "authors": ["Tianyu Hua", "Harper Hua", "Violet Xiang", "Benjamin Klieger", "Sang T. Truong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.02314", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown promise in transforming machine learning research, yet their capability to faithfully implement novel ideas from recent research papers-ideas unseen during pret", "arxiv_id": "2506.02314", "doi": "10.48550/arXiv.2506.02314"}
+{"id": "secodeplt-unified-platform-2024", "title": "SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI", "authors": ["Yuzhou Nie", "Zhun Wang", "Yu Yang", "R. Jiang", "Yuheng Tang"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2410.11096", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing benchmarks for evaluating the security risks and capabilities (e.g., vulnerability detection) of code-generating large language models (LLMs) face several key limitations: (1) limited coverag", "arxiv_id": "2410.11096"}
+{"id": "shroomindelab-at-semeval2024-2024", "title": "SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection", "authors": ["Bradley Paul Allen", "Fina Polat", "Paul T. Groth"], "year": 2024, "venue": "International Workshop on Semantic Evaluation", "source_url": "https://arxiv.org/abs/2404.03732", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We describe the University of Amsterdam Intelligent Data Engineering Lab team’s entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt program", "arxiv_id": "2404.03732", "doi": "10.48550/arXiv.2404.03732"}
+{"id": "codamosa-escaping-coverage-2023", "title": "CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models", "authors": ["Caroline Lemieux", "J. Inala", "Shuvendu K. Lahiri", "S. Sen"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE48619.2023.00085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Search-based software testing (SBST) generates high-coverage test cases for programs under test with a combination of test case generation and mutation. SBST's performance relies on there being a reas", "doi": "10.1109/ICSE48619.2023.00085"}
+{"id": "automated-knowledge-component-2025", "title": "Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems", "authors": ["Zhangqi Duan", "Nigel Fernandez", "Sri Kanakadandi", "Bita Akram", "Andrew Lan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2502.18632", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2502.18632"}
+{"id": "retrievalaugmented-generation-multilingual-2024", "title": "Retrieval-augmented generation in multilingual settings", "authors": ["Nadezhda Chirkova", "David Rau", "Herv'e D'ejean", "Thibault Formal", "S. Clinchant"], "year": 2024, "venue": "KNOWLLM", "source_url": "https://arxiv.org/abs/2407.01463", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) has recently emerged as a promising solution for incorporating up-to-date or domain-specific knowledge into large language models (LLMs) and improving LLM factuali", "arxiv_id": "2407.01463", "doi": "10.48550/arXiv.2407.01463"}
+{"id": "pangucoder2-boosting-large-2023", "title": "PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback", "authors": ["Bo Shen", "Jiaxin Zhang", "Taihong Chen", "Daoguang Zan", "Bing Geng"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2307.14936", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches ", "arxiv_id": "2307.14936", "doi": "10.48550/arXiv.2307.14936"}
+{"id": "learning-code-preference-2024", "title": "Learning Code Preference via Synthetic Evolution", "authors": ["Jiawei Liu", "Thanh Nguyen", "Mingyue Shang", "Hantian Ding", "Xiaopeng Li"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.03837", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have recently demonstrated remarkable coding capabilities. However, assessing code generation based on well-formed properties and aligning it with developer preferences re", "arxiv_id": "2410.03837", "doi": "10.48550/arXiv.2410.03837"}
+{"id": "utboost-rigorous-evaluation-2025", "title": "UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench", "authors": ["Boxi Yu", "Yuxuan Zhu", "Pinjia He", "Daniel Kang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2506.09289", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of Large Language Models (LLMs) has spurred the development of coding agents for real-world code generation. As a widely used benchmark for evaluating the code generation capabilities of th", "arxiv_id": "2506.09289", "doi": "10.48550/arXiv.2506.09289"}
+{"id": "lynx-open-source-2024", "title": "Lynx: An Open Source Hallucination Evaluation Model", "authors": ["Selvan Sunitha Ravi", "B. Mielczarek", "Anand Kannappan", "Douwe Kiela", "Rebecca Qian"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.08488", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval Augmented Generation (RAG) techniques aim to mitigate hallucinations in Large Language Models (LLMs). However, LLMs can still produce information that is unsupported or contradictory to the ", "arxiv_id": "2407.08488", "doi": "10.48550/arXiv.2407.08488"}
+{"id": "genetic-instruct-scaling-2024", "title": "Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models", "authors": ["Somshubra Majumdar", "V. Noroozi", "Sean Narenthiran", "Aleksander Ficek", "Jagadeesh Balam"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2407.21077", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) require high quality instruction data for effective alignment, particularly in code generation tasks where expert curated datasets are expensive to produce. We present Gen", "arxiv_id": "2407.21077", "doi": "10.48550/arXiv.2407.21077"}
+{"id": "whats-game-then-2024", "title": "What's the Game, then? Opportunities and Challenges for Runtime Behavior Generation", "authors": ["Nicholas Jennings", "Han Wang", "Isabel Li", "James Smith", "Bjoern Hartmann"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3654777.3676358", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Procedural content generation (PCG), the process of algorithmically creating game components instead of manually, has been a common tool of game development for decades. Recent advances in large langu", "doi": "10.1145/3654777.3676358"}
+{"id": "largescale-independent-comprehensive-2024", "title": "Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation", "authors": ["Wendkûuni C. Ouédraogo", "A. Kaboré", "Haoye Tian", "Yewei Song", "Anil Koyuncu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.00225", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing is essential for software reliability, yet manual test creation is time-consuming and often neglected. Although search-based software testing improves efficiency, it produces tests with p", "arxiv_id": "2407.00225", "doi": "10.48550/arXiv.2407.00225"}
+{"id": "testbench-evaluating-classlevel-2024", "title": "TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models", "authors": ["Quanjun Zhang", "Ye Shang", "Chunrong Fang", "Siqi Gu", "Jianyi Zhou"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.17561", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have pr", "arxiv_id": "2409.17561", "doi": "10.48550/arXiv.2409.17561"}
+{"id": "vgv-verilog-generation-2024", "title": "VGV: Verilog Generation using Visual Capabilities of Multi-Modal Large Language Models", "authors": ["Sam-Zaak Wong", "Gwok-Waa Wan", "Dongping Liu", "Xi Wang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/LAD62341.2024.10691753", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates the use of multimodal large language models (MMLLMs) for generating Verilog code from visual inputs, addressing the challenge of describing complex system architectures. It exp", "doi": "10.1109/LAD62341.2024.10691753"}
+{"id": "sampleefficient-human-evaluation-2024", "title": "Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition", "authors": ["Kehua Feng", "Keyan Ding", "Kede Ma", "Zhihua Wang", "Qiang Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.08008", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reliable evaluation of large language models (LLMs) is impeded by two key challenges: objective metrics often fail to reflect human perception of natural language, and exhaustive human labeling is pro", "arxiv_id": "2404.08008", "doi": "10.48550/arXiv.2404.08008"}
+{"id": "large-language-model-2024-3", "title": "Large language model evaluation for high‐performance computing software development", "authors": ["William F. Godoy", "Pedro Valero-Lara", "Keita Teranishi", "Prasanna Balaprakash", "Jeffrey S. Vetter"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1002/cpe.8269", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We apply AI‐assisted large language model (LLM) capabilities of GPT‐3 targeting high‐performance computing (HPC) kernels for (i) code generation, and (ii) auto‐parallelization of serial code in C ++, ", "doi": "10.1002/cpe.8269"}
+{"id": "cloudevalyaml-practical-benchmark-2023", "title": "CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation", "authors": ["Yifei Xu", "Yuning Chen", "Xumiao Zhang", "Xianshang Lin", "Pan Hu"], "year": 2023, "venue": "Conference on Machine Learning and Systems", "source_url": "https://arxiv.org/abs/2401.06786", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native app", "arxiv_id": "2401.06786", "doi": "10.48550/arXiv.2401.06786"}
+{"id": "understanding-large-language-2023", "title": "Understanding Large Language Model Based Fuzz Driver Generation", "authors": ["Cen Zhang", "Ming-Xing Bai", "Yaowen Zheng", "Yeting Li", "Xiaofei Xie"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2307.12469", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2307.12469"}
+{"id": "glmdialog-noisetolerant-pretraining-2023", "title": "GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation", "authors": ["Jing Zhang", "Xiaokang Zhang", "Daniel Zhang-Li", "Jifan Yu", "Zijun Yao"], "year": 2023, "venue": "Knowledge Discovery and Data Mining", "source_url": "https://arxiv.org/abs/2302.14401", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge. GLM-Dialog o", "arxiv_id": "2302.14401", "doi": "10.1145/3580305.3599832"}
+{"id": "llms-prescient-continuous-2024", "title": "Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle", "authors": ["Hui Dai", "R. Teehan", "Mengye Ren"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2411.08324", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many existing evaluation benchmarks for Large Language Models (LLMs) quickly become outdated due to the emergence of new models and training data. These benchmarks also fall short in assessing how LLM", "arxiv_id": "2411.08324", "doi": "10.48550/arXiv.2411.08324"}
+{"id": "hazard-analysis-framework-2022", "title": "A Hazard Analysis Framework for Code Synthesis Large Language Models", "authors": ["Heidy Khlaaf", "Pamela Mishkin", "Josh Achiam", "Gretchen Krueger", "Miles Brundage"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2207.14157", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Codex, a large language model (LLM) trained on a variety of codebases, exceeds the previous state of the art in its capacity to synthesize and generate code. Although Codex provides a plethora of bene", "arxiv_id": "2207.14157", "doi": "10.48550/arXiv.2207.14157"}
+{"id": "bioplanner-automatic-evaluation-2023", "title": "BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology", "authors": ["Odhran O'Donoghue", "Aleksandar Shtedritski", "John Ginger", "Ralph Abboud", "Ali E. Ghareeb"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2310.10632", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabil", "arxiv_id": "2310.10632", "doi": "10.48550/arXiv.2310.10632"}
+{"id": "llms-software-security-2025", "title": "LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights", "authors": ["Ze Sheng", "Zhicheng Chen", "Shuning Gu", "Heqing Huang", "Guofei Gu"], "year": 2025, "venue": "ACM Computing Surveys", "source_url": "https://arxiv.org/abs/2502.07049", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection. Traditional methods, including static and dynamic analysis, face limitations in efficiency, fals", "arxiv_id": "2502.07049", "doi": "10.1145/3769082"}
+{"id": "agentasajudge-evaluate-agents-2024", "title": "Agent-as-a-Judge: Evaluate Agents with Agents", "authors": ["Mingchen Zhuge", "Changsheng Zhao", "Dylan R. Ashley", "Wenyi Wang", "Dmitrii Khizbullin"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2410.10934", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require ex", "arxiv_id": "2410.10934", "doi": "10.48550/arXiv.2410.10934"}
+{"id": "lessleakbench-first-investigation-2025", "title": "LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks", "authors": ["Xin Zhou", "M. Weyssow", "Ratnadira Widyasari", "Ting Zhang", "Junda He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.06215", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are widely utilized in software engineering (SE) tasks, such as code generation and automated program repair. However, their reliance on extensive and often undisclosed pr", "arxiv_id": "2502.06215", "doi": "10.48550/arXiv.2502.06215"}
+{"id": "rankllm-python-package-2025", "title": "RankLLM: A Python Package for Reranking with LLMs", "authors": ["Sahel Sharifymoghaddam", "Ronak Pradeep", "Andre Slavescu", "Ryan Nguyen", "Andrew Xu"], "year": 2025, "venue": "Annual International ACM SIGIR Conference on Research and Development in Information Retrieval", "source_url": "https://arxiv.org/abs/2505.19284", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The adoption of large language models (LLMs) as rerankers in multi-stage retrieval systems has gained significant traction in academia and industry. These models refine a candidate list of retrieved d", "arxiv_id": "2505.19284", "doi": "10.1145/3726302.3730331"}
+{"id": "survey-data-contamination-2025", "title": "A Survey on Data Contamination for Large Language Models", "authors": ["Yu Cheng", "Yi Chang", "Yuan Wu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.14425", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated significant progress in various areas, such as text generation and code synthesis. However, the reliability of performance evaluat", "arxiv_id": "2502.14425", "doi": "10.48550/arXiv.2502.14425"}
+{"id": "rtlsquad-multiagent-based-2025", "title": "RTLSquad: Multi-Agent Based Interpretable RTL Design", "authors": ["Bowei Wang", "Qiankun Xiong", "Zeqing Xiang", "Lei Wang", "Renzhi Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.05470", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Optimizing Register-Transfer Level (RTL) code is crucial for improving hardware PPA performance. Large Language Models (LLMs) offer new approaches for automatic RTL code generation and optimization. H", "arxiv_id": "2501.05470", "doi": "10.48550/arXiv.2501.05470"}
+{"id": "biasalert-plugandplay-tool-2024", "title": "BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs", "authors": ["Zhiting Fan", "Ruizhe Chen", "Ruiling Xu", "Zuozhu Liu"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2407.10241", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating the bias of LLMs becomes more crucial with their rapid development. However, existing evaluation approaches rely on fixed-form outputs and cannot adapt to the flexible open-text generation ", "arxiv_id": "2407.10241", "doi": "10.48550/arXiv.2407.10241"}
+{"id": "security-assertions-by-2023", "title": "(Security) Assertions by Large Language Models", "authors": ["Rahul Kande", "H. Pearce", "Benjamin Tan", "Brendan Dolan-Gavitt", "Shailja Thakur"], "year": 2023, "venue": "IEEE Transactions on Information Forensics and Security", "source_url": "https://arxiv.org/abs/2306.14027", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The security of computer systems typically relies on a hardware root of trust. As vulnerabilities in hardware can have severe implications on a system, there is a need for techniques to support securi", "arxiv_id": "2306.14027", "doi": "10.1109/TIFS.2024.3372809"}
+{"id": "appatch-automated-adaptive-2024", "title": "APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching", "authors": ["Yu Nong", "Haoran Yang", "Long Cheng", "Hongxin Hu", "Haipeng Cai"], "year": 2024, "venue": "USENIX Security Symposium", "source_url": "https://arxiv.org/abs/2408.13597", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Timely and effective vulnerability patching is essential for cybersecurity defense, for which various approaches have been proposed yet still struggle to generate valid and correct patches for real-wo", "arxiv_id": "2408.13597"}
+{"id": "only-diff-not-2024", "title": "Only diff Is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model", "authors": ["Jiawei Li", "David Faragó", "Christian Petrov", "Iftekhar Ahmed"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643760", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Commit messages play a vital role in software development and maintenance. While previous research has introduced various Commit Message Generation (CMG) approaches, they often suffer from a lack of c", "doi": "10.1145/3643760"}
+{"id": "multilevel-explanations-generative-2024", "title": "Multi-Level Explanations for Generative Language Models", "authors": ["Lucas Monteiro Paes", "Dennis Wei", "Hyo Jin Do", "Hendrik Strobelt", "Ronny Luss"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2403.14459", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the increasing use of large language models (LLMs) for context-grounded tasks like summarization and question-answering, understanding what makes an LLM produce a certain response is challengi", "arxiv_id": "2403.14459", "doi": "10.48550/arXiv.2403.14459"}
+{"id": "fveval-understanding-language-2024", "title": "FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware", "authors": ["Minwoo Kang", "Mingjie Liu", "Ghaith Bany Hamad", "Syed Suhaib", "Haoxing Ren"], "year": 2024, "venue": "Design, Automation and Test in Europe", "source_url": "https://arxiv.org/abs/2410.23299", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The remarkable reasoning and code generation capabilities of large language models (LLMs) have spurred significant interest in applying LLMs to enable task automation in digital chip design. In partic", "arxiv_id": "2410.23299", "doi": "10.23919/DATE64628.2025.10992720"}
+{"id": "large-language-models-2024", "title": "Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey", "authors": ["Yang Gu", "Hengyu You", "Jian Cao", "Muran Yu", "Haoran Fan"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2411.10478", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Machine Learning (ML) workflows—spanning data preprocessing and feature engineering, model selection and hyperparameter optimization, and workflow evaluation—are increasingly embedded in complex softw", "arxiv_id": "2411.10478", "doi": "10.1145/3773084"}
+{"id": "robust-retrievalbased-summarization-2024", "title": "Towards a Robust Retrieval-Based Summarization System", "authors": ["Shengjie Liu", "Jing Wu", "Jingyuan Bao", "Wenyi Wang", "N. Hovakimyan"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.19889", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper describes an investigation of the robustness of large language models (LLMs) for retrieval augmented generation (RAG)-based summarization tasks. While LLMs provide summarization capabilitie", "arxiv_id": "2403.19889", "doi": "10.48550/arXiv.2403.19889"}
+{"id": "vulscriber-exploring-ragbased-2024", "title": "VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs", "authors": ["Seyed Shayan Daneshvar", "Yu Nong", "Xu Yang", "Shaowei Wang", "Haipeng Cai"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2408.04125", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Detecting vulnerabilities is vital for software security, yet deep learning-based vulnerability detectors (DLVD) face a data shortage, which limits their effectiveness. Data augmentation can potential", "arxiv_id": "2408.04125", "doi": "10.1145/3760775"}
+{"id": "transforming-wearable-data-2024", "title": "Transforming wearable data into personal health insights using large language model agents", "authors": ["Mike A. Merrill", "Akshay Paruchuri", "Naghmeh Rezaei", "Geza Kovacs", "Javier Perez Matos"], "year": 2024, "venue": "Nature Communications", "source_url": "https://arxiv.org/abs/2406.06464", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deriving personalized insights from popular wearable trackers requires complex numerical reasoning that challenges standard LLMs, necessitating tool-based approaches like code generation. Large langua", "arxiv_id": "2406.06464", "doi": "10.1038/s41467-025-67922-y"}
+{"id": "hdleval-benchmarking-llms-2024", "title": "HDLEval Benchmarking LLMs for multiple HDLs", "authors": ["Farzaneh Rabiei Kashanaki", "Mark Zakharov", "Jose Renau"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/LAD62341.2024.10691770", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are transforming code generation and documentation processes across programming languages, including hardware description languages (HDLs). However, existing benchmarks pr", "doi": "10.1109/LAD62341.2024.10691770"}
+{"id": "comparison-large-language-2024", "title": "Comparison of Large Language Models in Generating Machine Learning Curricula in High Schools", "authors": ["Gjorgji Noveski", "Mathis Jeroncic", "Thomas Velard", "Primož Kocuvan", "M. Gams"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics13204109", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of artificial intelligence technologies, the integration of AI concepts into educational curricula represents an increasingly important issue. This paper presents a comparat", "doi": "10.3390/electronics13204109"}
+{"id": "hardware-security-benchmarking-2024", "title": "Toward Hardware Security Benchmarking of LLMs", "authors": ["Raheel Afsharmazayejani", "Mohammad Moradi Shahmiri", "Parker Link", "H. Pearce", "Benjamin Tan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/LAD62341.2024.10691745", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement and proliferation of large language models (LLMs), there is a pressing need to explore and, crucially, evaluate their utility. Recently, LLMs have shown promise in digital d", "doi": "10.1109/LAD62341.2024.10691745"}
+{"id": "evaluating-diverse-large-2023", "title": "Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction", "authors": ["Sungmin Kang", "Juyeon Yoon", "Nargiz Askarbekkyzy", "Shin Yoo"], "year": 2023, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2311.04532", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. ", "arxiv_id": "2311.04532", "doi": "10.1109/TSE.2024.3450837"}
+{"id": "training-llms-generating-2024", "title": "Training LLMs for Generating IEC 61131-3 Structured Text with Online Feedback", "authors": ["Aaron Haag", "Bertram Fuchs", "Altay Kacan", "Oliver Lohse"], "year": 2024, "venue": "2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)", "source_url": "https://arxiv.org/abs/2410.22159", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: IEC 61131-3 Structured Text (ST) is a widely used programming language for programmable logic controllers (PLCs) in automation systems. However, generating ST code with LLMs poses unique challenges du", "arxiv_id": "2410.22159", "doi": "10.1109/LLM4Code66737.2025.00013"}
+{"id": "leveraging-large-language-2023", "title": "Leveraging large language models for data analysis automation", "authors": ["Jacqueline A Jansen", "A. Manukyan", "Nour Al Khoury", "A. Akalin"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2023.12.11.571140", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Data analysis is constrained by a shortage of skilled experts, particularly in biology, where detailed data interpretation is vital for understanding complex biological processes and developing new tr", "doi": "10.1101/2023.12.11.571140"}
+{"id": "evaluating-large-language-2024", "title": "Evaluating Large Language Models using Arabic Prompts to Generate Python Codes", "authors": ["N. Al-khafaji", "Basit Khalaf Majeed"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/eSmarTA62850.2024.10638877", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Currently, the popularity of large language models (LLMs) for instance, ChatGPT from OpenAI and Gemini from Google is increasing greatly in our lives, due to their unparalleled performance in various ", "doi": "10.1109/eSmarTA62850.2024.10638877"}
+{"id": "memorize-generalize-evaluating-2025", "title": "Memorize or Generalize? Evaluating LLM Code Generation with Evolved Questions", "authors": ["Wentao Chen", "Lizhen Zhang", "Li Zhong", "Letian Peng", "Zilong Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2503.02296", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2503.02296"}
+{"id": "multilanguage-perspective-robustness-2025", "title": "A Multi-Language Perspective on the Robustness of LLM Code Generation", "authors": ["Fazle Rabbi", "Zushuo Ding", "Jinqiu Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.19108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the explora", "arxiv_id": "2504.19108", "doi": "10.48550/arXiv.2504.19108"}
+{"id": "pragmatic-reasoning-improves-2025", "title": "Pragmatic Reasoning improves LLM Code Generation", "authors": ["Zhuchen Cao", "S. Apel", "A. Singla", "Vera Demberg"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.15835", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pragmatic reasoning is pervasive in human-human communication - it allows us to leverage shared knowledge and counterfactual reasoning in order to infer the intention of a conversational partner given", "arxiv_id": "2502.15835", "doi": "10.48550/arXiv.2502.15835"}
+{"id": "spreadnala-naturalistic-code-2024", "title": "SpreadNaLa: A Naturalistic Code Generation Evaluation Dataset of Spreadsheet Formulas", "authors": ["Sebastian Schuster", "Ayesha Ansar", "Om Agarwal", "Vera Demberg"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/bbc81e8f4e1853a736d781a070244880956215e3", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "rtl-graphenhanced-llm-2025", "title": "RTL++: Graph-enhanced LLM for RTL Code Generation", "authors": ["Mohammad Akyash", "K. Azar", "H. Kamali"], "year": 2025, "venue": "2025 IEEE International Conference on LLM-Aided Design (ICLAD)", "source_url": "https://arxiv.org/abs/2505.13479", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As hardware design complexity escalates, there is an urgent need for advanced automation in electronic design automation (EDA). Traditional register transfer level (RTL) design methods are manual, tim", "arxiv_id": "2505.13479", "doi": "10.1109/ICLAD65226.2025.00020"}
+{"id": "codecor-llmbased-selfreflective-2025", "title": "CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation", "authors": ["Ruwei Pan", "Hongyu Zhang", "Chao Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.07811", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation aims to produce code that fulfills requirements written in natural languages automatically. Large language Models (LLMs) like ChatGPT have demonstrated promising effectiveness in this ", "arxiv_id": "2501.07811", "doi": "10.48550/arXiv.2501.07811"}
+{"id": "spec2rtlagent-automated-hardware-2025", "title": "Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems", "authors": ["Zhongzhi Yu", "Mingjie Liu", "Michael Zimmer", "Y. Lin", "Yong Liu"], "year": 2025, "venue": "2025 IEEE International Conference on LLM-Aided Design (ICLAD)", "source_url": "https://arxiv.org/abs/2506.13905", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite recent progress in generating hardware register transfer level (RTL) code with large language models (LLMs), existing solutions still suffer from a substantial gap between practical applicatio", "arxiv_id": "2506.13905", "doi": "10.1109/ICLAD65226.2025.00013"}
+{"id": "tfhecoder-evaluating-llmagentic-2025", "title": "TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation", "authors": ["Mayank Kumar", "Jiaqi Xue", "Meng Zheng", "Qian Lou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.12217", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fully Homomorphic Encryption over the torus (TFHE) enables computation on encrypted data without decryption, making it a cornerstone of secure and confidential computing. Despite its potential in priv", "arxiv_id": "2503.12217", "doi": "10.48550/arXiv.2503.12217"}
+{"id": "simeval-investigating-similarity-2025", "title": "SimEval: Investigating the Similarity Obstacle in LLM-based Hardware Code Generation", "authors": ["Mohammad Akyash", "Hadi Mardani Kamali"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3658617.3697624", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing use and efficiency of large language models (LLMs) in digital hardware circuit design has started to revolutionize the early stages of integrated circuit (IC) supply chain design and im", "doi": "10.1145/3658617.3697624"}
+{"id": "clarifygpt-framework-enhancing-2024", "title": "ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification", "authors": ["Fangwen Mu", "Lin Shi", "Song Wang", "Zhuohao Yu", "Binquan Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3660810", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in automatically generating code from provided natural language requirements. However, in real-world practice, ", "doi": "10.1145/3660810"}
+{"id": "selforganized-agents-llm-2024", "title": "Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization", "authors": ["Yoichi Ishibashi", "Y. Nishimura"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.02183", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in automatic code generation using large language model (LLM) agent have brought us closer to the future of automated software development. However, existing single-agent approache", "arxiv_id": "2404.02183", "doi": "10.48550/arXiv.2404.02183"}
+{"id": "testdriven-development-llmbased-2024", "title": "Test-Driven Development and LLM-based Code Generation", "authors": ["N. Mathews", "M. Nagappan"], "year": 2024, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2402.13521", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements. This increasingly automated process mirrors traditional hum", "arxiv_id": "2402.13521", "doi": "10.1145/3691620.3695527"}
+{"id": "enhancing-llmbased-quantum-2025", "title": "Enhancing LLM-based Quantum Code Generation with Multi-Agent Optimization and Quantum Error Correction", "authors": ["Charlie Campbell", "H. Chen", "Wayne Luk", "Hongxiang Fan"], "year": 2025, "venue": "Design Automation Conference", "source_url": "https://arxiv.org/abs/2504.14557", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent frameworks with Large Language Models (LLMs) have become promising tools for generating generalpurpose programming languages using test-driven development, allowing developers to create mo", "arxiv_id": "2504.14557", "doi": "10.1109/DAC63849.2025.11133316"}
+{"id": "llmbased-retrievalaugmented-control-2024", "title": "LLM-based and Retrieval-Augmented Control Code Generation", "authors": ["Heiko Koziolek", "Sten Grüner", "Rhaban Hark", "Virendra Ashiwal", "Sofia Linsbauer"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643795.3648384", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Control code is designed and implemented for industrial automation applications that manage power plants, petrochemical processes, or steel production. Popular large language models (LLM) can synthesi", "doi": "10.1145/3643795.3648384"}
+{"id": "autop2c-llmbased-agent-2025", "title": "AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers", "authors": ["Zijie Lin", "Yiqing Shen", "Qilin Cai", "He Sun", "Jinrui Zhou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.20115", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Machine Learning (ML) research is spread through academic papers featuring rich multimodal content, including text, diagrams, and tabular results. However, translating these multimodal elements into e", "arxiv_id": "2504.20115", "doi": "10.48550/arXiv.2504.20115"}
+{"id": "graphcodeagent-dual-graphguided-2025", "title": "GraphCodeAgent: Dual Graph-Guided LLM Agent for Retrieval-Augmented Repo-Level Code Generation", "authors": ["Jia Li", "Xianjie Shi", "Kechi Zhang", "Lei Li", "Ge Li"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2504.10046", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Writing code requires significant time and effort in software development. To automate this process, researchers have made substantial progress for code generation. Recently, large language models (LL", "arxiv_id": "2504.10046"}
+{"id": "ckgfuzzer-llmbased-fuzz-2025", "title": "CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph", "authors": ["Hanxiang Xu", "Wei Ma", "Ti Zhou", "Yanjie Zhao", "Kai Chen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE-Companion66252.2025.00079", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, the programming capabilities of large language models (LLMs) have garnered significant attention. Fuzz testing, a highly effective technique, plays a key role in enhancing software re", "doi": "10.1109/ICSE-Companion66252.2025.00079"}
+{"id": "codeaware-prompting-study-2024", "title": "Code-Aware Prompting: A Study of Coverage-Guided Test Generation in Regression Setting using LLM", "authors": ["Gabriel Ryan", "Siddhartha Jain", "Mingyue Shang", "Shiqi Wang", "Xiaofei Ma"], "year": 2024, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2402.00097", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage.", "arxiv_id": "2402.00097", "doi": "10.1145/3643769"}
+{"id": "greencode-learning-optimize-2025", "title": "Green-Code: Learning to Optimize Energy Efficiency in Llm-Based Code Generation", "authors": ["Shashikant Ilager", "Lukas Florian Briem", "Ivona Brandić"], "year": 2025, "venue": "IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing", "source_url": "https://arxiv.org/abs/2501.11006", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are becoming integral to daily life, showcasing their vast potential across various Natural Language Processing (NLP) tasks. Beyond NLP, LLMs are increasingly used in soft", "arxiv_id": "2501.11006", "doi": "10.1109/ccgrid64434.2025.00068"}
+{"id": "they-all-good-2025", "title": "Are They All Good? Evaluating the Quality of CoTs in LLM-based Code Generation", "authors": ["Binquan Zhang", "Li Zhang", "Zhiwen Luo", "Yuxin Du", "Fang Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.06980", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated impressive performance in code generation, particularly when augmented with chain-of-thought (CoT) prompting techniques. They break down requirements int", "arxiv_id": "2507.06980", "doi": "10.48550/arXiv.2507.06980"}
+{"id": "creativeval-evaluating-creativity-2024", "title": "CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation", "authors": ["Matthew DeLorenzo", "Vasudev Gohil", "Jeyavijayan Rajendran"], "year": 2024, "venue": "2024 IEEE LLM Aided Design Workshop (LAD)", "source_url": "https://arxiv.org/abs/2404.08806", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have proved effective and efficient in generating code, leading to their utilization within the hardware design process. Prior works evaluating LLMs’ abilities for registe", "arxiv_id": "2404.08806", "doi": "10.1109/LAD62341.2024.10691798"}
+{"id": "dehallucinator-mitigating-llm-2024", "title": "De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding", "authors": ["A. Eghbali", "Michael Pradel"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2401.01701", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. However, these models are mostly unaware of the cod", "arxiv_id": "2401.01701"}
+{"id": "syntactic-robustness-llmbased-2024", "title": "Syntactic Robustness for LLM-based Code Generation", "authors": ["Laboni Sarker", "Mara Downing", "Achintya Desai", "T. Bultan"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.01535", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Rapid advances in the field of Large Language Models (LLMs) have made LLM-based code generation an important area for investigation. An LLM-based code generator takes a prompt as input and produces co", "arxiv_id": "2404.01535", "doi": "10.48550/arXiv.2404.01535"}
+{"id": "humanevalcomm-benchmarking-communication-2024", "title": "HumanEvalComm: Benchmarking the Communication Competence of Code Generation for LLMs and LLM Agents", "authors": ["J. Wu", "Fatemeh H. Fard"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2406.00215", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have significantly improved their ability to perform tasks in the field of code generation. However, there is still a gap between LLMs being capable coders and being top-t", "arxiv_id": "2406.00215", "doi": "10.1145/3715109"}
+{"id": "learn-code-sustainably-2024", "title": "Learn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation", "authors": ["Tina Vartziotis", "Ippolyti Dellatolas", "George Dasoulas", "Maximilian Schmidt", "Florian Schneider"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.03344", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing use of information technology has led to a significant share of energy consumption and carbon emissions from data centers. These contributions are expected to rise with the growing dema", "arxiv_id": "2403.03344", "doi": "10.48550/arXiv.2403.03344"}
+{"id": "defense-against-prompt-2024", "title": "Defense Against Prompt Injection Attack by Leveraging Attack Techniques", "authors": ["Yulin Chen", "Haoran Li", "Zihao Zheng", "Yangqiu Song", "Dekai Wu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2411.00459", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the advancement of technology, large language models (LLMs) have achieved remarkable performance across various natural language processing (NLP) tasks, powering LLM-integrated applications like ", "arxiv_id": "2411.00459", "doi": "10.48550/arXiv.2411.00459"}
+{"id": "protect-llm-agent-2025", "title": "To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt", "authors": ["Zhilong Wang", "Neha Nagaraja", "Lan Zhang", "Hayretdin Bahşi", "Pawan Patil"], "year": 2025, "venue": "2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)", "source_url": "https://arxiv.org/abs/2506.05739", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM agents are widely used as agents for customer support, content generation, and code assistance. However, they are vulnerable to prompt injection attacks, where adversarial inputs manipulate the mo", "arxiv_id": "2506.05739", "doi": "10.1109/DSN-S65789.2025.00037"}
+{"id": "prompt-injection-attacks-2025", "title": "Prompt Injection Attacks on Large Language Models: A Survey of Attack Methods, Root Causes, and Defense Strategies", "authors": ["Tongcheng Geng", "Zhiyuan Xu", "Yubin Qu", "W. E. Wong"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.32604/cmc.2025.074081", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.32604/cmc.2025.074081"}
+{"id": "prompt-injection-attacks-2026-2", "title": "Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms", "authors": ["Saidakhror Gulyamov", "Saidakhror Gulyamov", "A. Rodionov", "Rustam Khursanov", "Kambariddin Mekhmonov"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.3390/info17010054", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have rapidly transformed artificial intelligence applications across industries, yet their integration into production systems has unveiled critical security vulnerabiliti", "doi": "10.3390/info17010054"}
+{"id": "prompt-injection-attacks-2026-2-2", "title": "PROMPT INJECTION ATTACKS IN LARGE LANGUAGE MODELS VIA A COMPREHENSIVE ANALYSIS OF ATTACK VECTORS, DEFENSE MECHANISMS, AND FUTURE DIRECTIONS", "authors": ["Unknown"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.30546/2225-0530.14.2.2025.2013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.30546/2225-0530.14.2.2025.2013"}
+{"id": "melon-provable-defense-2025", "title": "MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents", "authors": ["Kaijie Zhu", "Xianjun Yang", "Jindong Wang", "Wenbo Guo", "William Yang Wang"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.05174", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent research has explored that LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unau", "arxiv_id": "2502.05174"}
+{"id": "defense-against-prompt-2025", "title": "Defense against Prompt Injection Attacks via Mixture of Encodings", "authors": ["Ruiyi Zhang", "David Sullivan", "Kyle Jackson", "Pengtao Xie", "Mei Chen"], "year": 2025, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2504.07467", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have emerged as a dominant approach for a wide range of NLP tasks, with their access to external information further enhancing their capabilities. However, this introduces", "arxiv_id": "2504.07467", "doi": "10.48550/arXiv.2504.07467"}
+{"id": "ipiguard-novel-tool-2025", "title": "IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents", "authors": ["Hengyu An", "Jinghuai Zhang", "Tianyu Du", "Chunyi Zhou", "Qingming Li"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2508.15310", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) agents are widely deployed in real-world applications, where they leverage tools to retrieve and manipulate external data for complex tasks. However, when interacting with u", "arxiv_id": "2508.15310", "doi": "10.48550/arXiv.2508.15310"}
+{"id": "uniguardian-unified-defense-2025", "title": "UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models", "authors": ["Huawei Lin", "Yingjie Lao", "Tong Geng", "Tan Yu", "Weijie Zhao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.13141", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are vulnerable to attacks like prompt injection, backdoor attacks, and adversarial attacks, which manipulate prompts or models to generate harmful outputs. In this paper, ", "arxiv_id": "2502.13141", "doi": "10.48550/arXiv.2502.13141"}
+{"id": "comprehensive-review-prompt-2025", "title": "The Comprehensive Review on Prompt Injection Attacks and Defense Mechanisms in Large Language Models", "authors": ["Qingtian Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.61173/390f5h97", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This review analyzes prompt injection attacks in large language models (LLMs) from 2019 to 2025, addressing critical security challenges as models like ChatGPT proliferate across sectors. We synthesiz", "doi": "10.61173/390f5h97"}
+{"id": "early-approaches-adversarial-2025", "title": "Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models", "authors": ["Gustavo Sandoval", "Denys Fenchenko", "Junyao Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.14271", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper documents early research conducted in 2022 on defending against prompt injection attacks in large language models, providing historical context for the evolution of this critical security d", "arxiv_id": "2509.14271", "doi": "10.48550/arXiv.2509.14271"}
+{"id": "promptarmor-simple-yet-2025", "title": "PromptArmor: Simple yet Effective Prompt Injection Defenses", "authors": ["Tianneng Shi", "Kaijie Zhu", "Zhun Wang", "Yuqi Jia", "Will Cai"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.15219", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite their potential, recent research has demonstrated that LLM agents are vulnerable to prompt injection attacks, where malicious prompts are injected into the agent's input, causing it to perform", "arxiv_id": "2507.15219", "doi": "10.48550/arXiv.2507.15219"}
+{"id": "leaksealer-semisupervised-defense-2025", "title": "LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks", "authors": ["Francesco Panebianco", "Stefano Bonfanti", "Francesco Trovò", "Michele Carminati"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.00602", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The generalization capabilities of Large Language Models (LLMs) have led to their widespread deployment across various applications. However, this increased adoption has introduced several security th", "arxiv_id": "2508.00602", "doi": "10.48550/arXiv.2508.00602"}
+{"id": "enhancing-system-security-2024", "title": "Enhancing System Security: LLM-Driven Defense Against Prompt Injection Vulnerabilities", "authors": ["Oleksandr Muliarevych"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TCSET64720.2024.10755823", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article examines cybersecurity vulnerabilities in systems utilizing Language Model Interfaces, focusing on the challenges of building secure systems. It provides an overview of current interfaces", "doi": "10.1109/TCSET64720.2024.10755823"}
+{"id": "defending-against-prompt-2025", "title": "Defending Against Prompt Injection With a Few DefensiveTokens", "authors": ["Sizhe Chen", "Yizhu Wang", "Nicholas Carlini", "Chawin Sitawarin", "David Wagner"], "year": 2025, "venue": "AISec@CCS", "source_url": "https://arxiv.org/abs/2507.07974", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When large language model (LLM) systems interact with external data to perform complex tasks, a new attack, namely prompt injection, becomes a significant threat. By injecting instructions into the da", "arxiv_id": "2507.07974", "doi": "10.1145/3733799.3762982"}
+{"id": "llmailinject-dataset-from-2025", "title": "LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge", "authors": ["Sahar Abdelnabi", "A. Fay", "Ahmed Salem", "Egor Zverev", "Kai-Chieh Liao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.09956", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the ", "arxiv_id": "2506.09956", "doi": "10.48550/arXiv.2506.09956"}
+{"id": "prompt-infection-llmtollm-2024", "title": "Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems", "authors": ["Donghyun Lee", "Mo Tiwari"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.07283", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in ", "arxiv_id": "2410.07283", "doi": "10.48550/arXiv.2410.07283"}
+{"id": "defense-against-indirect-2026", "title": "Defense Against Indirect Prompt Injection via Tool Result Parsing", "authors": ["Qiang Yu", "Xinran Cheng", "Chuanyi Liu"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.04795", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As LLM agents transition from digital assistants to physical controllers in autonomous systems and robotics, they face an escalating threat from indirect prompt injection. By embedding adversarial ins", "arxiv_id": "2601.04795", "doi": "10.48550/arXiv.2601.04795"}
+{"id": "signedprompt-new-approach-2024", "title": "Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications", "authors": ["X. Suo"], "year": 2024, "venue": "AIP Conference Proceedings", "source_url": "https://arxiv.org/abs/2401.07612", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, a growing concern in the Artificial Intelligence (AI) field. Such attacks, which manipulate ", "arxiv_id": "2401.07612", "doi": "10.48550/arXiv.2401.07612"}
+{"id": "defending-against-prompt-2025-2", "title": "Defending Against Prompt Injection with DataFilter", "authors": ["Yizhu Wang", "Sizhe Chen", "Raghad F Alkhudair", "Basel Alomair", "David Wagner"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19207", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When large language model (LLM) agents are increasingly deployed to automate tasks and interact with untrusted external data, prompt injection emerges as a significant security threat. By injecting ma", "arxiv_id": "2510.19207", "doi": "10.48550/arXiv.2510.19207"}
+{"id": "browsesafe-understanding-preventing-2025", "title": "BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents", "authors": ["Kaiyuan Zhang", "Mark Tenenholtz", "Kyle Polley", "Jerry Ma", "Denis Yarats"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.20597", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injec", "arxiv_id": "2511.20597", "doi": "10.48550/arXiv.2511.20597"}
+{"id": "securing-ai-agents-2025", "title": "Securing AI Agents Against Prompt Injection Attacks", "authors": ["Badrinath Ramakrishnan", "Akshaya Balaji"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15759", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection a", "arxiv_id": "2511.15759", "doi": "10.48550/arXiv.2511.15759"}
+{"id": "capture-contextaware-prompt-2025", "title": "CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement", "authors": ["Gauri Kholkar", "Ratinder Ahuja"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.12368", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection remains a major security risk for large language models. However, the efficacy of existing guardrail models in context-aware settings remains underexplored, as they often rely on stat", "arxiv_id": "2505.12368", "doi": "10.48550/arXiv.2505.12368"}
+{"id": "prevention-prompt-injection-2025", "title": "Prevention of Prompt Injection Attacks Over Financial Applications Integrated with LLM", "authors": ["T. Joshi", "V. Naik", "Isha Mistry", "Ramchandra S Mangrulkar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/InCACCT65424.2025.11011372", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection is a growing concern in financial applications integrated with Large Language Models. These attacks pose a critical risk to financial applications leading to data breaches, confidenti", "doi": "10.1109/InCACCT65424.2025.11011372"}
+{"id": "detection-method-prompt-2025", "title": "Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering", "authors": ["Yi Ji", "Runzhi Li", "Baolei Mao"], "year": 2025, "venue": "Knowledge Science, Engineering and Management", "source_url": "https://arxiv.org/abs/2506.06384", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the widespread adoption of Large Language Models (LLMs), prompt injection attacks have emerged as a significant security threat. Existing defense mechanisms often face critical trade-offs between", "arxiv_id": "2506.06384", "doi": "10.48550/arXiv.2506.06384"}
+{"id": "oet-optimizationbased-prompt-2025", "title": "OET: Optimization-based prompt injection Evaluation Toolkit", "authors": ["Jinsheng Pan", "Xiaogeng Liu", "Chaowei Xiao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.00843", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, enabling their widespread adoption across various domains. However, their susce", "arxiv_id": "2505.00843", "doi": "10.48550/arXiv.2505.00843"}
+{"id": "securing-large-language-2025", "title": "Securing Large Language Models (LLMs) from Prompt Injection Attacks", "authors": ["Omar Farooq Khan Suri", "J. Mccrae"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.01326", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly being deployed in real-world applications, but their flexibility exposes them to prompt injection attacks. These attacks leverage the model's instruction-", "arxiv_id": "2512.01326", "doi": "10.48550/arXiv.2512.01326"}
+{"id": "mitigating-prompt-injection-2025", "title": "Mitigating Prompt Injection Attacks in ModelAgnostic Networks (MAN)", "authors": ["Anupama Mishra", "Shiv Preet", "Brij B. Gupta", "Satyendra Singh Rawat", "V. Arya"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IOT-SIU65919.2025.11402845", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/IOT-SIU65919.2025.11402845"}
+{"id": "shadowplay-engineering-defenses-2025", "title": "ShadowPlay: Engineering Defenses Against Role-Based Prompt Injection and Dependency Hallucination in LLM-Powered Development", "authors": ["Anas Alsobeh", "Zahraddeen Gwarzo", "Amani M. Shatnawi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/Cyber-AI66431.2025.11233258", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized software development by providing intelligent code generation and assistance capabilities. However, their integration into development workflows introd", "doi": "10.1109/Cyber-AI66431.2025.11233258"}
+{"id": "novel-security-framework-2025", "title": "A Novel Security Framework against Prompt Injection Attacks", "authors": ["Yumei Zhao", "Xiaoming Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CCAI65422.2025.11189427", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing, but their widespread adoption has introduced critical security vulnerabilities, particularly prom", "doi": "10.1109/CCAI65422.2025.11189427"}
+{"id": "llm-firewall-validator-2025", "title": "LLM Firewall Using Validator Agent for Prevention Against Prompt Injection Attacks", "authors": ["Michal Podpora", "Marek Baranowski", "Maciej Chopcian", "Lukasz Kwasniewicz", "Wojciech Radziewicz"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/app16010085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models with Retrieval-Augmented Generation are considered to be modern, chat-native interfaces to enterprise knowledge. However, deploying such systems safely requires precautions more ", "doi": "10.3390/app16010085"}
+{"id": "give-positive-review-2025", "title": "\"Give a Positive Review Only\": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers", "authors": ["Qing Zhou", "Zhexin Zhang", "Zhi Li", "Limin Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01287", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of AI models, their deployment across diverse tasks has become increasingly widespread. A notable emerging application is leveraging AI models to assist in reviewing scienti", "arxiv_id": "2511.01287", "doi": "10.48550/arXiv.2511.01287"}
+{"id": "spin-selfsupervised-prompt-2024", "title": "SPIN: Self-Supervised Prompt INjection", "authors": ["Leon Zhou", "Junfeng Yang", "Chengzhi Mao"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.13236", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly used in a variety of important applications, yet their safety and reliability remain as major concerns. Various adversarial and jailbreak attacks have bee", "arxiv_id": "2410.13236", "doi": "10.48550/arXiv.2410.13236"}
+{"id": "f2a-innovative-approach-2024", "title": "F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents", "authors": ["Yupeng Ren"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.08776", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid development of Large Language Models (LLMs), numerous mature applications of LLMs have emerged in the field of content safety detection. However, we have found that LLMs exhibit blind t", "arxiv_id": "2410.08776", "doi": "10.48550/arXiv.2410.08776"}
+{"id": "prompt-injection-attacks-2024", "title": "Prompt Injection Attacks in Defended Systems", "authors": ["Daniil Khomsky", "Narek Maloyan", "Bulat Nutfullin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.14048", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models play a crucial role in modern natural language processing technologies. However, their extensive use also introduces potential security risks, such as the possibility of black-bo", "arxiv_id": "2406.14048", "doi": "10.1007/978-3-031-80853-1_30"}
+{"id": "causalarmor-efficient-indirect-2026", "title": "CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution", "authors": ["Minbeom Kim", "Mihir Parmar", "Phillip Wallis", "Lesly Miculicich", "Kyomin Jung"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.07918", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious commands hidden within untrusted content trick the agen", "arxiv_id": "2602.07918"}
+{"id": "mpib-benchmark-medical-2026", "title": "MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs", "authors": ["Junhyeok Lee", "Han Jang", "Kyu Sung Choi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06268", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt injection attacks can steer these systems toward clin", "arxiv_id": "2602.06268"}
+{"id": "cyberphysical-system-defense-2025", "title": "Cyber-Physical System Defense Against Structured False Data Injection Attacks Using an Adaptive Security Framework with Passivity Enhancement", "authors": ["R. Gopi", "Francis Shamili"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.52783/jisem.v10i43s.8360", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: System integrity, operation, and significant breakdowns can be compromised by coordinated False Data Injection Attacks (FDIAs), which are increasingly prevalent in Cyber-Physical Systems (CPS). Becaus", "doi": "10.52783/jisem.v10i43s.8360"}
+{"id": "zeroshot-embedding-drift-2026", "title": "Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs", "authors": ["A. Sekar", "Mrinal Agarwal", "Rachel Sharma", "Akitsugu Tanaka", "Jasmine Zhang"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.12359", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent ", "arxiv_id": "2601.12359", "doi": "10.48550/arXiv.2601.12359"}
+{"id": "exploring-clean-label-2024", "title": "Exploring Clean Label Backdoor Attacks and Defense in Language Models", "authors": ["Shuai Zhao", "Anh Tuan Luu", "Jie Fu", "Jinming Wen", "Weiqi Luo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TASLP.2024.3407571", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite being widely applied, pre-trained language models have been proven vulnerable to backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning ", "doi": "10.1109/TASLP.2024.3407571"}
+{"id": "indirect-prompt-injections-2025", "title": "Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?", "authors": ["Rishika Bhagwatkar", "Kevin Kasa", "Abhay Puri", "Gabriel Huang", "Irina Rish"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.05244", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-esta", "arxiv_id": "2510.05244", "doi": "10.48550/arXiv.2510.05244"}
+{"id": "aegis-automated-coevolutionary-2025", "title": "AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema", "authors": ["Ting-Chun Liu", "C. Hsu", "Kuan-Yi Lee", "C. Fu", "Hung-yi Lee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.00088", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks pose a significant challenge to the safe deployment of Large Language Models (LLMs) in real-world applications. While prompt-based detection offers a lightweight and interpret", "arxiv_id": "2509.00088", "doi": "10.48550/arXiv.2509.00088"}
+{"id": "hacking-llms-technical-2025", "title": "Hacking LLMs: A Technical Analysis of Security Vulnerabilities and Defense Mechanisms", "authors": ["G. Raj", "Hamzah", "Nikhil Raj", "Nikhil Ranjan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CICTN64563.2025.10932638", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) such as GPT-4 and Google’s Gemini have revolutionized the landscape of artificial intelligence, enabling sophisticated natural language processing capabilities across dive", "doi": "10.1109/CICTN64563.2025.10932638"}
+{"id": "safeguarding-visionlanguage-models-2024", "title": "Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors", "authors": ["Jiachen Sun", "Changsheng Wang", "Jiong Wang", "Yiwei Zhang", "Chaowei Xiao"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.10529", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts ", "arxiv_id": "2405.10529", "doi": "10.48550/arXiv.2405.10529"}
+{"id": "cyberattacks-large-language-2025", "title": "Cyberattacks on Large Language Models - Attack Detection and Architecture Adaptability", "authors": ["Srikar Alla", "Ali Shiri Sichani"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SoutheastCon56624.2025.10971722", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) like GPT and $PaLM$ have transformed natural language processing, enabling advancements in text generation, language translation, and conversational AI. However, their inc", "doi": "10.1109/SoutheastCon56624.2025.10971722"}
+{"id": "lmdmi-lightweight-multilevel-2025", "title": "LMDMI: A Lightweight Multilevel Defense against Malicious Inputs for Generative Language Models", "authors": ["Shi-Qi Yan", "Changsheng Wan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3786709.3786720", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) and other Generative Language Models (GLMs) face widespread security threats from malicious user inputs, which can lead to the generation of harmful content or the leakage", "doi": "10.1145/3786709.3786720"}
+{"id": "ccfc-core-corefullcore-2025", "title": "CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection", "authors": ["Jiaming Hu", "Haoyu Wang", "Debarghya Mukherjee", "I. Paschalidis"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.14128", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Jailbreak attacks pose a serious challenge to the safe deployment of large language models (LLMs). We introduce CCFC (Core&Core-Full-Core), a dual-track, prompt-level defense framework designed to mit", "arxiv_id": "2508.14128", "doi": "10.48550/arXiv.2508.14128"}
+{"id": "new-paradigms-adversarial-2025", "title": "New Paradigms of Adversarial Attacks Against Large Language Models and Their Defense Mechanisms", "authors": ["Yi Zhou", "Xinyao Hou", "Luohui Zhou"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AIAHPC66801.2025.11290591", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models are gaining immense popularity, enhancing the usability of intelligent services while introducing unprecedented security challenges. Systems designed to improve question-answerin", "doi": "10.1109/AIAHPC66801.2025.11290591"}
+{"id": "sentraguard-multilingual-humanai-2025", "title": "Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks", "authors": ["Mehedi Hasan", "Ziaur Rahman", "Rafid Mostafiz", "Md. Abir Hossain"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.22628", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework", "arxiv_id": "2510.22628", "doi": "10.48550/arXiv.2510.22628"}
+{"id": "llmguarded-clouds-leveraging-2025", "title": "LLM-Guarded Clouds: Leveraging Generative AI for Proactive Threat Hunting and Adaptive Defense in Hybrid Cloud Environments", "authors": ["Sumit Saklani", "Deepak Kumar Chohan", "Raman Sharma", "Niharika Varshney", "Atika Gupta"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/GCAT66372.2025.11368582", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ability to scale within hybrid and multi cloud infrastructures enhances liquid resource availability, However, this also dramatically increases the attack surface, leaving trivially configured def", "doi": "10.1109/GCAT66372.2025.11368582"}
+{"id": "taxonomy-evaluation-exploitation-2025", "title": "Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks", "authors": ["Zimo Ji", "Xunguang Wang", "Zongjie Li", "Pingchuan Ma", "Yudong Gao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15203", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In ", "arxiv_id": "2511.15203", "doi": "10.48550/arXiv.2511.15203"}
+{"id": "promptware-kill-chain-2026", "title": "The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism", "authors": ["Oleg Brodt", "Elad Feldman", "Bruce Schneier", "Ben Nassi"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2601.09625", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection was initially framed as the large language model (LLM) analogue of SQL injection. However, over the past three years, attacks labeled as prompt injection have evolved from isolated in", "arxiv_id": "2601.09625"}
+{"id": "when-bots-take-2026", "title": "When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent", "authors": ["Xinyi Wu", "Geng Hong", "Yueyue Chen", "Mingxuan Liu", "Feier Jin"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.07263", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Web agents, powered by large language models (LLMs), are increasingly deployed to automate complex web interactions. The rise of open-source frameworks (e.g., Browser Use, Skyvern-AI) has accelerated ", "arxiv_id": "2601.07263", "doi": "10.48550/arXiv.2601.07263"}
+{"id": "securecai-injectionresilient-llm-2026", "title": "SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations", "authors": ["Mohammed Himayath Ali", "Mohammed Aqib Abdullah", "Mohammed Mudassir Uddin", "S. Alam"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.07835", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in adversaria", "arxiv_id": "2601.07835", "doi": "10.48550/arXiv.2601.07835"}
+{"id": "agent-security-bench-2024", "title": "Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents", "authors": ["Hanrong Zhang", "Jingyuan Huang", "Kai Mei", "Yifei Yao", "Zhenting Wang"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2410.02644", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabil", "arxiv_id": "2410.02644", "doi": "10.48550/arXiv.2410.02644"}
+{"id": "agentdojo-dynamic-environment-2024", "title": "AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents", "authors": ["Edoardo Debenedetti", "Jie Zhang", "Mislav Balunovi'c", "Luca Beurer-Kellner", "Marc Fischer"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2406.13352", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents aim to solve complex tasks by combining text-based reasoning with external tool calls. Unfortunately, AI agents are vulnerable to prompt injection attacks where data returned by external too", "arxiv_id": "2406.13352", "doi": "10.48550/arXiv.2406.13352"}
+{"id": "hidden-dangers-browsing-2025", "title": "The Hidden Dangers of Browsing AI Agents", "authors": ["Mykyta Mudryi", "Markiyan Chaklosh", "Grzegorz W'ojcik"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.13076", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autonomous browsing agents powered by large language models (LLMs) are increasingly used to automate web-based tasks. However, their reliance on dynamic content, tool execution, and user-provided data", "arxiv_id": "2505.13076", "doi": "10.48550/arXiv.2505.13076"}
+{"id": "jailbreaking-mitigation-vulnerabilities-2024", "title": "Jailbreaking and Mitigation of Vulnerabilities in Large Language Models", "authors": ["Benji Peng", "Ziqian Bi", "Qian Niu", "Ming Liu", "Pohsun Feng"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.15236", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation, enabling applications across fields beyond healthcare, software engine", "arxiv_id": "2410.15236", "doi": "10.48550/arXiv.2410.15236"}
+{"id": "adversarial-threat-vectors-2025", "title": "Adversarial threat vectors and risk mitigation for retrieval-augmented generation systems", "authors": ["Chris M. Ward", "Joshua D. Harguess"], "year": 2025, "venue": "Defense + Security", "source_url": "https://arxiv.org/abs/2506.00281", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) systems, which integrate Large Language Models (LLMs) with external knowledge sources, are vulnerable to a range of adversarial attack vectors. This paper examines", "arxiv_id": "2506.00281", "doi": "10.1117/12.3055931"}
+{"id": "attacks-by-content-2025", "title": "Attacks by Content: Automated Fact-checking is an AI Security Issue", "authors": ["Michael Schlichtkrull"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2510.11238", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When AI agents retrieve and reason over external documents, adversaries can manipulate the data they receive to subvert their behaviour. Previous research has studied indirect prompt injection, where ", "arxiv_id": "2510.11238", "doi": "10.18653/v1/2025.emnlp-main.431"}
+{"id": "efficient-jailbreak-mitigation-2025", "title": "Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline", "authors": ["Akshaj Prashanth Rao", "Advait Singh", "Saumya Kumaar Saksena", "Dhruv Kumar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2512.19011", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2512.19011"}
+{"id": "injecting-falsehoods-adversarial-2025", "title": "Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs", "authors": ["Alina Fastowski", "Bardh Prenkaj", "Yuxiao Li", "Gjergji Kasneci"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.05919", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs are now an integral part of information retrieval. As such, their role as question answering chatbots raises significant concerns due to their shown vulnerability to adversarial man-in-the-middle", "arxiv_id": "2511.05919", "doi": "10.48550/arXiv.2511.05919"}
+{"id": "promptscreen-efficient-jailbreak-2025", "title": "PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline", "authors": ["Akshaj Prashanth Rao", "Advait Singh", "Saumya Kumaar Saksena", "Dhruv Kumar"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2512.19011", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection and jailbreaking attacks pose persistent security challenges to large language model (LLM)-based systems. We present PromptScreen, an efficient and systematically evaluated defense ar", "arxiv_id": "2512.19011"}
+{"id": "cognitive-control-architecture-2025", "title": "Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents", "authors": ["Zhibo Liang", "Tianze Hu", "Zaiye Chen", "Mingjie Tang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.06716", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autonomous Large Language Model (LLM) agents exhibit significant vulnerability to Indirect Prompt Injection (IPI) attacks. These attacks hijack agent behavior by polluting external information sources", "arxiv_id": "2512.06716", "doi": "10.48550/arXiv.2512.06716"}
+{"id": "trust-llmcontrolled-robotics-2025", "title": "Trust in LLM-controlled Robotics: a Survey of Security Threats, Defenses and Challenges", "authors": ["Xinyu Huang", "B. ShyamKarthickV", "Taozhao Chen", "Mitch Bryson", "Thomas L. Chaffey"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.02377", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into robotics has revolutionized their ability to interpret complex human commands and execute sophisticated tasks. However, such paradigm shift introdu", "arxiv_id": "2601.02377", "doi": "10.48550/arXiv.2601.02377"}
+{"id": "method-counteracting-manipulative-2025", "title": "Method of Counteracting Manipulative Queries to Large Language Models", "authors": ["Yehor Kovalchuk", "Mykhailo Kolomytsev"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.20535/tacs.2664-29132025.3.345389", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into critical infrastructure (SIEM, SOAR) has introduced new attack vectors, specifically prompt injection and jailbreaking. Traditional defense mechani", "doi": "10.20535/tacs.2664-29132025.3.345389"}
+{"id": "threats-defenses-large-2025", "title": "Threats and Defenses for Large Language Models: A Survey", "authors": ["Xiaobao Sheng", "Qinhui Jiang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3773365.3773631", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models (LLMs) are increasingly applied to text comprehension, code generation, and multimodal tasks, their exposure to security threats has become more pronounced. This paper focuses", "doi": "10.1145/3773365.3773631"}
+{"id": "proactive-hardening-llm-2026", "title": "Proactive Hardening of LLM Defenses with HASTE", "authors": ["Henry Chen", "Victor Aranda", "Samarth Keshari", "Ryan Heartfield", "Nicole Nichols"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19051", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt-based attack techniques are one of the primary challenges in securely deploying and protecting LLM-based AI systems. LLM inputs are an unbounded, unstructured space. Consequently, effectively d", "arxiv_id": "2601.19051", "doi": "10.48550/arXiv.2601.19051"}
+{"id": "embedguard-crosslayer-detection-2026", "title": "EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems", "authors": ["Neeraj Kumar", "Singh Beshane"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.22399/ijcesen.4869", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Embedding-based Retrieval-Augmented Generation (RAG) systems are critical infrastructure for production AI applications, yet they remain vulnerable to embedding space poisoning attacks that achieve di", "doi": "10.22399/ijcesen.4869"}
+{"id": "ocrmediated-modality-dominance-2026", "title": "OCR-Mediated Modality Dominance in Vision-Language Models: Implications for Radiology AI Trustworthiness", "authors": ["I. Akbasli", "B. Ozturk", "O. Serin", "V. Dogan", "G. Berikol"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.64898/2026.02.22.26346828", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.64898/2026.02.22.26346828"}
+{"id": "hijacking-large-language-2023", "title": "Hijacking Large Language Models via Adversarial In-Context Learning", "authors": ["Yao Qiang", "Xiangyu Zhou", "Dongxiao Zhu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.09948", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations (demos) in the preconditioned prompts. Despit", "arxiv_id": "2311.09948", "doi": "10.48550/arXiv.2311.09948"}
+{"id": "webinject-prompt-injection-2025", "title": "WebInject: Prompt Injection Attack to Web Agents", "authors": ["Xilong Wang", "John Bloch", "Zedian Shao", "Yuepeng Hu", "Shuyan Zhou"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2505.11717", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages. In this work, we propose WebInject, a prompt inj", "arxiv_id": "2505.11717", "doi": "10.18653/v1/2025.emnlp-main.104"}
+{"id": "optimizationbased-prompt-injection-2024", "title": "Optimization-based Prompt Injection Attack to LLM-as-a-Judge", "authors": ["Jiawen Shi", "Zenghui Yuan", "Yinuo Liu", "Yue Huang", "Pan Zhou"], "year": 2024, "venue": "Conference on Computer and Communications Security", "source_url": "https://arxiv.org/abs/2403.17710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set of candidates for a given question. LLM-as-a-Judge has many applications such as LLM-powered search, reinforceme", "arxiv_id": "2403.17710", "doi": "10.1145/3658644.3690291"}
+{"id": "topicattack-indirect-prompt-2025", "title": "TopicAttack: An Indirect Prompt Injection Attack via Topic Transition", "authors": ["Yulin Chen", "Haoran Li", "Yuexin Li", "Yue Liu", "Yangqiu Song"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2507.13686", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown remarkable performance across a range of NLP tasks. However, their strong instruction-following capabilities and inability to distinguish instructions from data", "arxiv_id": "2507.13686", "doi": "10.48550/arXiv.2507.13686"}
+{"id": "manipulating-llm-web-2025", "title": "Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree", "authors": ["Sam Johnson", "Viet Pham", "Thai Q. Le"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.14799", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work demonstrates that LLM-based web navigation agents offer powerful automation capabilities but are vulnerable to Indirect Prompt Injection (IPI) attacks. We show that adversaries can embed uni", "arxiv_id": "2507.14799", "doi": "10.48550/arXiv.2507.14799"}
+{"id": "tokenefficient-prompt-injection-2025", "title": "Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression", "authors": ["Yu Cui", "Yujun Cai", "Yiwei Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.20493", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While reasoning large language models (LLMs) demonstrate remarkable performance across various tasks, they also contain notable security vulnerabilities. Recent research has uncovered a\"thinking-stopp", "arxiv_id": "2504.20493", "doi": "10.48550/arXiv.2504.20493"}
+{"id": "vortexpia-indirect-prompt-2025", "title": "VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy", "authors": ["Yu Cui", "Sicheng Pan", "Yifei Liu", "Haibin Zhang", "Cong Zuo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.04261", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have been widely deployed in Conversational AIs (CAIs), while exposing privacy and security threats. Recent research shows that LLM-based CAIs can be manipulated to extrac", "arxiv_id": "2510.04261", "doi": "10.48550/arXiv.2510.04261"}
+{"id": "obliinjection-orderoblivious-prompt-2025", "title": "ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data", "authors": ["Ruiqi Wang", "Yuqi Jia", "N. Gong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.09321", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks aim to contaminate the input data of an LLM to mislead it into completing an attacker-chosen task instead of the intended task. In many applications and agents, the input data", "arxiv_id": "2512.09321", "doi": "10.48550/arXiv.2512.09321"}
+{"id": "goalguided-generative-prompt-2024", "title": "Goal-Guided Generative Prompt Injection Attack on Large Language Models", "authors": ["Chong Zhang", "Mingyu Jin", "Qinkai Yu", "Chengzhi Liu", "Haochen Xue"], "year": 2024, "venue": "Industrial Conference on Data Mining", "source_url": "https://arxiv.org/abs/2404.07234", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. Numerous users can easily inject adversarial text or instructions through the use", "arxiv_id": "2404.07234", "doi": "10.1109/ICDM59182.2024.00119"}
+{"id": "madspear-conformitydriven-prompt-2025", "title": "MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems", "authors": ["Yu Cui", "Hongyang Du"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.13038", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent debate (MAD) systems leverage collaborative interactions among large language models (LLMs) agents to improve reasoning capabilities. While recent studies have focused on increasing the ac", "arxiv_id": "2507.13038", "doi": "10.48550/arXiv.2507.13038"}
+{"id": "prompt-injection-attack-2025", "title": "Prompt Injection Attack Detection with Machine Learning", "authors": ["Berkay Özçam", "Berksu Ant Solmaz", "M. Amasyalı", "Mustafa Kara", "Muhammed Ali Aydın"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASYU67174.2025.11208433", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The introduction of large language models into our lives has created a revolution, allowing artificial intelligence to be easily used and accessible by everyone. With this opportunity, large language ", "doi": "10.1109/ASYU67174.2025.11208433"}
+{"id": "study-prompt-injection-2024", "title": "A Study on Prompt Injection Attack Against LLM-Integrated Mobile Robotic Systems", "authors": ["Wenxiao Zhang", "Xiangrui Kong", "Conan Dewitt", "Thomas Bräunl", "Jin B. Hong"], "year": 2024, "venue": "2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)", "source_url": "https://arxiv.org/abs/2408.03515", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) like GPT-4o into robotic systems represents a significant advancement in embodied artificial intelligence. These models can process multi-modal prompts,", "arxiv_id": "2408.03515", "doi": "10.1109/ISSREW63542.2024.00103"}
+{"id": "textbased-prompt-injection-2024", "title": "Text-Based Prompt Injection Attack Using Mathematical Functions in Modern Large Language Models", "authors": ["Hyeokjin Kwon", "Wooguil Pak"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics13245008", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection is a type of attack that induces violent or discriminatory responses via the input of a prompt containing illegal instructions to the large language model (LLM). Most early injection ", "doi": "10.3390/electronics13245008"}
+{"id": "shadowcode-automatic-external-2024", "title": "ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs", "authors": ["Yuchen Yang", "Yiming Li", "Hongwei Yao", "Bingrun Yang", "Yiling He"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2407.09164", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements have led to the widespread adoption of code-oriented large language models (Code LLMs) for programming tasks. Despite their success in deployment, their security research is left f", "arxiv_id": "2407.09164"}
+{"id": "reasalign-reasoning-enhanced-2026", "title": "ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack", "authors": ["Hao Li", "Yankai Yang", "G. E. Suh", "Ning Zhang", "Chaowei Xiao"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.10173", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields. However, these systems are highly vulnerable to ind", "arxiv_id": "2601.10173", "doi": "10.48550/arXiv.2601.10173"}
+{"id": "pina-prompt-injection-2026", "title": "PINA: Prompt Injection Attack against Navigation Agents", "authors": ["Jiani Liu", "Yixin He", "Lanlan Fan", "Qidi Zhong", "Yushi Cheng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.13612", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Navigation agents powered by large language models (LLMs) convert natural language instructions into executable plans and actions. Compared to text-based applications, their security is far more criti", "arxiv_id": "2601.13612", "doi": "10.48550/arXiv.2601.13612"}
+{"id": "whitebox-prompt-injection-2026", "title": "A White-Box Prompt Injection Attack on Embodied AI Agents Driven by Large Language Models", "authors": ["Tongcheng Geng", "Yubin Qu", "W. E. Wong"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.jss.2026.112782", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.jss.2026.112782"}
+{"id": "cacheprune-neuralbased-attribution-2025", "title": "CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks", "authors": ["Rui Wang", "Junda Wu", "Yu Xia", "Tong Yu", "Ruiyi Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.21228", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are susceptible to indirect prompt injection attacks, in which the model inadvertently responds to task messages injected within the prompt context. This vulnerability ste", "arxiv_id": "2504.21228", "doi": "10.48550/arXiv.2504.21228"}
+{"id": "injecguard-benchmarking-mitigating-2024", "title": "InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models", "authors": ["Hao Li", "Xiaogeng Liu", "Chaowei Xiao"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.22770", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense --", "arxiv_id": "2410.22770", "doi": "10.48550/arXiv.2410.22770"}
+{"id": "systemlevel-defense-against-2024", "title": "System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective", "authors": ["Fangzhou Wu", "Ethan Cecchetti", "Chaowei Xiao"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.19091", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model-based systems (LLM systems) are information and query processing systems that use LLMs to plan operations from natural-language prompts and feed the output of each successive step", "arxiv_id": "2409.19091", "doi": "10.48550/arXiv.2409.19091"}
+{"id": "semantics-as-shield-2025", "title": "Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification", "authors": ["Yanxi Li", "R. Shan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.21752", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models are increasingly used for text classification tasks such as sentiment analysis, yet their reliance on natural language prompts exposes them to prompt injection attacks. In partic", "arxiv_id": "2511.21752", "doi": "10.48550/arXiv.2511.21752"}
+{"id": "aegisagent-autonomous-defense-2025", "title": "AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs", "authors": ["Yihan Wang", "Huanqi Yang", "S. Pal", "Weitao Xu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.20986", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into wearable sensing is creating a new class of mobile applications capable of nuanced human activity understanding. However, the reliability of these ", "arxiv_id": "2512.20986", "doi": "10.48550/arXiv.2512.20986"}
+{"id": "decoding-latent-attack-2025", "title": "Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization", "authors": ["Ishaan Verma"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.05831", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly integrated into web-based systems for content summarization, yet their susceptibility to prompt injection attacks remains a pressing concern. In this stud", "arxiv_id": "2509.05831", "doi": "10.48550/arXiv.2509.05831"}
+{"id": "real-time-ai-2025", "title": "REAL TIME AI DEFENSE AGAINST PROMPT INJECTION ATTACKS", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.37962/icydd/2025/23-24", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.37962/icydd/2025/23-24"}
+{"id": "fath-authenticationbased-testtime-2024", "title": "FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks", "authors": ["Jiong Wang", "Fangzhou Wu", "Wen-Ding Li", "Jinsheng Pan", "Edward Suh"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.21492", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have been widely deployed as the backbone with additional tools and text information for real-world applications. However, integrating external information into LLM-integr", "arxiv_id": "2410.21492", "doi": "10.48550/arXiv.2410.21492"}
+{"id": "hybrid-constitutional-classifiers-2025", "title": "Hybrid Constitutional Classifiers for Prompt Injection Defense", "authors": ["Qianlong Lan", "Anuj Kaul", "Shaun Jones", "Stephanie Westrum", "Vinothini Pandurangan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/eIT64391.2025.11103606", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we propose a novel hybrid constitutional classifier system that combines a lightweight input classifier with a fine-tuned generative LLM to enhance security while maintaining efficiency", "doi": "10.1109/eIT64391.2025.11103606"}
+{"id": "guardian-multitiered-defense-2024", "title": "GUARDIAN: A Multi-Tiered Defense Architecture for Thwarting Prompt Injection Attacks on LLMs", "authors": ["Parijat Rai", "Saumil Sood", "V. Madisetti", "Arshdeep Bahga"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.4236/jsea.2024.171003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.4236/jsea.2024.171003"}
+{"id": "hacking-back-aihacker-2024", "title": "Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks", "authors": ["Dario Pasquini", "Evgenios M. Kornaropoulos", "G. Ateniese"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.20911", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailo", "arxiv_id": "2410.20911", "doi": "10.48550/arXiv.2410.20911"}
+{"id": "meta-secalign-secure-2025", "title": "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks", "authors": ["Sizhe Chen", "Arman Zharmagambetov", "David Wagner", "Chuan Guo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.02735", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks, where untrusted data contains an injected prompt to manipulate the system, have been listed as the top security threat to LLM-integrated applications. Model-level prompt inje", "arxiv_id": "2507.02735", "doi": "10.48550/arXiv.2507.02735"}
+{"id": "defense-strategy-selection-2024", "title": "Defense strategy selection based on incomplete information game for the false data injection attack", "authors": ["Na Yi", "Jianjun Xu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1080/00207721.2024.2363546", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid development and wide application of sensing devices and communication networks, the traditional power system has transformed into a cyber physical power system (CPPS) gradually. The con", "doi": "10.1080/00207721.2024.2363546"}
+{"id": "adaptive-attacks-break-2025", "title": "Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents", "authors": ["Qiusi Zhan", "Richard Fang", "H. Panchal", "Daniel Kang"], "year": 2025, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.00061", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) agents exhibit remarkable performance across diverse applications by using external tools to interact with environments. However, integrating external tools introduces secur", "arxiv_id": "2503.00061", "doi": "10.48550/arXiv.2503.00061"}
+{"id": "jatmo-prompt-injection-2023", "title": "Jatmo: Prompt Injection Defense by Task-Specific Finetuning", "authors": ["Julien Piet", "Maha Alrashed", "Chawin Sitawarin", "Sizhe Chen", "Zeming Wei"], "year": 2023, "venue": "European Symposium on Research in Computer Security", "source_url": "https://arxiv.org/abs/2312.17673", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However,", "arxiv_id": "2312.17673", "doi": "10.48550/arXiv.2312.17673"}
+{"id": "cognitive-overload-attackprompt-2024", "title": "Cognitive Overload Attack:Prompt Injection for Long Context", "authors": ["Bibek Upadhayay", "Vahid Behzadan", "Amin Karbasi"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.11272", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in performing tasks across various domains without needing explicit retraining. This capability, known as In-Context Learning (IC", "arxiv_id": "2410.11272", "doi": "10.48550/arXiv.2410.11272"}
+{"id": "defense-strategy-against-2024", "title": "A Defense Strategy Against False Data Injection Attack in Smart Grid Based on Multi-Stage Game*", "authors": ["Hu Li", "Huan Pan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICPRE62586.2024.10768530", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper addresses the critical issue of false data injection attacks (FDIA) in smart grids by proposing a novel multi-stage dynamic attack and defense game model. This model not only accounts for t", "doi": "10.1109/ICPRE62586.2024.10768530"}
+{"id": "can-indirect-prompt-2025", "title": "Can Indirect Prompt Injection Attacks Be Detected and Removed?", "authors": ["Yulin Chen", "Haoran Li", "Yuan Sui", "Yufei He", "Yue Liu"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.16580", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks manipulate large language models (LLMs) by misleading them to deviate from the original input instructions and execute maliciously injected instructions, because of their inst", "arxiv_id": "2502.16580", "doi": "10.48550/arXiv.2502.16580"}
+{"id": "robustness-referencing-defending-2025", "title": "Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction", "authors": ["Yulin Chen", "Haoran Li", "Yuan Sui", "Yue Liu", "Yufei He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.20472", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated impressive performance and have come to dominate the field of natural language processing (NLP) across various tasks. However, due to their strong instru", "arxiv_id": "2504.20472", "doi": "10.48550/arXiv.2504.20472"}
+{"id": "critical-evaluation-defenses-2025", "title": "A Critical Evaluation of Defenses against Prompt Injection Attacks", "authors": ["Yuqi Jia", "Zedian Shao", "Yupei Liu", "Jinyuan Jia", "Dawn Song"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.18333", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are vulnerable to prompt injection attacks, and several defenses have recently been proposed, often claiming to mitigate these attacks successfully. However, we argue that", "arxiv_id": "2505.18333", "doi": "10.48550/arXiv.2505.18333"}
+{"id": "piguard-prompt-injection-2025", "title": "PIGuard: Prompt Injection Guardrail via Mitigating Overdefense for Free", "authors": ["Hao Li", "Xiaogeng Liu", "Ning Zhang", "Chaowei Xiao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.acl-long.1468", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks pose a critical threat to large language models (LLMs), enabling goal hijacking and data leakage. Prompt guard models, though effective in defense, suffer from over-defense—fa", "doi": "10.18653/v1/2025.acl-long.1468"}
+{"id": "manipulating-multimodal-agents-2025", "title": "Manipulating Multimodal Agents via Cross-Modal Prompt Injection", "authors": ["Le Wang", "Zonghao Ying", "Tianyuan Zhang", "Siyuan Liang", "Shengshan Hu"], "year": 2025, "venue": "ACM Multimedia", "source_url": "https://arxiv.org/abs/2504.14348", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of multimodal large language models has redefined the agent paradigm by integrating language and vision modalities with external data sources, enabling agents to better interpret human i", "arxiv_id": "2504.14348", "doi": "10.1145/3746027.3755211"}
+{"id": "agentvigil-generic-blackbox-2025", "title": "AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents", "authors": ["Zhun Wang", "Vincent Siu", "Zhe Ye", "Tianneng Shi", "Yuzhou Nie"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2505.05849", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The strong planning and reasoning capabilities of Large Language Models (LLMs) have fostered the development of agent-based systems capable of leveraging external tools and interacting with increasing", "arxiv_id": "2505.05849"}
+{"id": "redvisor-reasoningaware-prompt-2026", "title": "RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse", "authors": ["Mingrui Liu", "Sixiao Zhang", "Cheng Long", "Kwok-Yan Lam"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.01795", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. Current de", "arxiv_id": "2602.01795"}
+{"id": "icon-indirect-prompt-2026", "title": "ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction", "authors": ["Che Wang", "Fuyao Zhang", "Jiaming Zhang", "Ziqi Zhang", "Yinghui Wang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.20708", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) agents are susceptible to Indirect Prompt Injection (IPI) attacks, where malicious instructions in retrieved content hijack the agent's execution. Existing defenses typicall", "arxiv_id": "2602.20708"}
+{"id": "bypassing-llm-guardrails-2025", "title": "Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems", "authors": ["William Hackett", "Lewis Birch", "Stefan Trawicki", "Neeraj Suri", "Peter Garraghan"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2504.11168", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) guardrail systems are designed to protect against prompt injection and jailbreak attacks. However, they remain vulnerable to evasion techniques. We demonstrate two approac", "arxiv_id": "2504.11168"}
+{"id": "simple-prompt-injection-2025", "title": "Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution", "authors": ["Meysam Alizadeh", "Zeynab Samei", "Daria Stetsenko", "Fabrizio Gilardi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.01055", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Previous benchmarks on prompt injection in large language models (LLMs) have primarily focused on generic tasks and attacks, offering limited insights into more complex threats like data exfiltration.", "arxiv_id": "2506.01055", "doi": "10.48550/arXiv.2506.01055"}
+{"id": "bilevel-attackdefense-model-2023", "title": "A Bi-Level Attack-Defense Model for the Forecasting False Data Injection Attacks on the Integrated Energy Systems", "authors": ["M. Azimi", "Hamed Delkhosh", "M. Ghaedi"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICEE59167.2023.10334687", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrated Energy Systems (IESs) are attractive flexible energy infrastructures that improve the reliability and efficiency. Utilizing the information and communication technologies in the IESs brings", "doi": "10.1109/ICEE59167.2023.10334687"}
+{"id": "not-what-youve-2023", "title": "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", "authors": ["Kai Greshake", "Sahar Abdelnabi", "Shailesh Mishra", "C. Endres", "Thorsten Holz"], "year": 2023, "venue": "AISec@CCS", "source_url": "https://arxiv.org/abs/2302.12173", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly being integrated into applications, with versatile functionalities that can be easily modulated via natural language prompts. So far, it was assumed that ", "arxiv_id": "2302.12173", "doi": "10.1145/3605764.3623985"}
+{"id": "defending-against-indirect-2024", "title": "Defending Against Indirect Prompt Injection Attacks With Spotlighting", "authors": ["Keegan Hines", "Gary Lopez", "M. Hall", "Federico Zarfati", "Yonatan Zunger"], "year": 2024, "venue": "CAMLIS", "source_url": "https://arxiv.org/abs/2403.14720", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a singl", "arxiv_id": "2403.14720", "doi": "10.48550/arXiv.2403.14720"}
+{"id": "shieldlearner-new-paradigm-2025", "title": "ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs", "authors": ["Ziyi Ni", "Hao Wang", "Huacan Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.13162", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved remarkable success in various domains but remain vulnerable to adversarial jailbreak attacks. Existing prompt-defense strategies, including parameter-modifyi", "arxiv_id": "2502.13162", "doi": "10.48550/arXiv.2502.13162"}
+{"id": "secalign-defending-against-2024", "title": "SecAlign: Defending Against Prompt Injection with Preference Optimization", "authors": ["Sizhe Chen", "Arman Zharmagambetov", "Saeed Mahloujifar", "Kamalika Chaudhuri", "Chuan Guo"], "year": 2024, "venue": "Conference on Computer and Communications Security", "source_url": "https://arxiv.org/abs/2410.05451", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are becoming increasingly prevalent in modern software systems, interfacing between the user and the Internet to assist with tasks that require advanced language understan", "arxiv_id": "2410.05451", "doi": "10.1145/3719027.3744836"}
+{"id": "novel-detection-defense-2023", "title": "A novel detection and defense mechanism against false data injection attack in smart grids", "authors": ["Jinlong Cui", "Beibei Gao", "Baojun Guo"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1049/gtd2.12848", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1049/gtd2.12848"}
+{"id": "automatic-universal-prompt-2024", "title": "Automatic and Universal Prompt Injection Attacks against Large Language Models", "authors": ["Xiaogeng Liu", "Zhiyuan Yu", "Yizhe Zhang", "Ning Zhang", "Chaowei Xiao"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.04957", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prom", "arxiv_id": "2403.04957", "doi": "10.48550/arXiv.2403.04957"}
+{"id": "attention-tracker-detecting-2024", "title": "Attention Tracker: Detecting Prompt Injection Attacks in LLMs", "authors": ["Kuo-Han Hung", "Ching-Yun Ko", "Ambrish Rawat", "I-Hsin Chung", "Winston H. Hsu"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2411.00348", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks, where malicious inputs manipulate the model into ignoring original instructions and ", "arxiv_id": "2411.00348", "doi": "10.48550/arXiv.2411.00348"}
+{"id": "rl-hammer-llms-2025", "title": "RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection", "authors": ["Yuxin Wen", "Arman Zharmagambetov", "Ivan Evtimov", "Narine Kokhlikyan", "Tom Goldstein"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.04885", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection poses a serious threat to the reliability and safety of LLM agents. Recent defenses against prompt injection, such as Instruction Hierarchy and SecAlign, have shown notable robustness", "arxiv_id": "2510.04885", "doi": "10.48550/arXiv.2510.04885"}
+{"id": "enhancing-security-large-2025", "title": "Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses", "authors": ["E. Mathew"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.32604/jai.2025.069841", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : This review paper explores advanced methods to prompt Large Language Models (LLMs) into generating objectionable or unintended behaviors through adversarial prompt injection attacks. We examine a se", "doi": "10.32604/jai.2025.069841"}
+{"id": "chatinject-abusing-chat-2025", "title": "ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents", "authors": ["Hwan Chang", "Yonghyun Jun", "Hwanhee Lee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.22830", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The growing deployment of large language model (LLM) based agents that interact with external environments has created new attack surfaces for adversarial manipulation. One major threat is indirect pr", "arxiv_id": "2509.22830", "doi": "10.48550/arXiv.2509.22830"}
+{"id": "funtuning-characterizing-vulnerability-2025", "title": "Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-Based Prompt Injection Attacks via the Fine-Tuning Interface", "authors": ["Andrey Labunets", "Nishit V. Pandya", "Ashish Hooda", "Xiaohan Fu", "Earlence Fernandes"], "year": 2025, "venue": "IEEE Symposium on Security and Privacy", "source_url": "https://arxiv.org/abs/2501.09798", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We surface a new threat to closed-weight Large Language Models (LLMs) that enables an attacker to compute optimization-based prompt injections. Specifically, we characterize how an attacker can levera", "arxiv_id": "2501.09798", "doi": "10.1109/SP61157.2025.00121"}
+{"id": "neural-exec-learning-2024", "title": "Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks", "authors": ["Dario Pasquini", "Martin Strohmeier", "Carmela Troncoso"], "year": 2024, "venue": "AISec@CCS", "source_url": "https://arxiv.org/abs/2403.03792", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce a new family of prompt injection attacks, termed Neural Exec. Unlike known attacks that rely on handcrafted strings (e.g., \"Ignore previous instructions and...\"), we show that it is possi", "arxiv_id": "2403.03792", "doi": "10.1145/3689932.3694764"}
+{"id": "system-prompt-poisoning-2025", "title": "System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection", "authors": ["Jiawei Guo", "Haipeng Cai"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.06493", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have gained widespread adoption across diverse applications due to their impressive generative capabilities. Their plug-and-play nature enables both developers and end use", "arxiv_id": "2505.06493", "doi": "10.48550/arXiv.2505.06493"}
+{"id": "promptlocate-localizing-prompt-2025", "title": "PromptLocate: Localizing Prompt Injection Attacks", "authors": ["Yuqi Jia", "Yupei Liu", "Zedian Shao", "Jinyuan Jia", "N. Gong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.12252", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks deceive a large language model into completing an attacker-specified task instead of its intended task by contaminating its input data with an injected prompt, which consists ", "arxiv_id": "2510.12252", "doi": "10.48550/arXiv.2510.12252"}
+{"id": "your-prompt-safe-2025", "title": "Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs", "authors": ["Jiawen Wang", "Pritha Gupta", "Ivan Habernal", "Eyke Hüllermeier"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.14368", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent studies demonstrate that Large Language Models (LLMs) are vulnerable to different prompt-based attacks, generating harmful content or sensitive information. Both closed-source and open-source L", "arxiv_id": "2505.14368", "doi": "10.48550/arXiv.2505.14368"}
+{"id": "investigating-vulnerability-llmasajudge-2025", "title": "Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks", "authors": ["Narek Maloyan", "Bislan Ashinov", "Dmitry Namiot"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.13348", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly employed as evaluators (LLM-as-a-Judge) for assessing the quality of machine-generated text. This paradigm offers scalability and cost-effectiveness compa", "arxiv_id": "2505.13348", "doi": "10.48550/arXiv.2505.13348"}
+{"id": "eva-redteaming-gui-2025", "title": "EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection", "authors": ["Yijie Lu", "Tianjie Ju", "Manman Zhao", "Xinbei Ma", "Yuan Guo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.14289", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As multimodal agents are increasingly trained to operate graphical user interfaces (GUIs) to complete user tasks, they face a growing threat from indirect prompt injection, attacks in which misleading", "arxiv_id": "2505.14289", "doi": "10.48550/arXiv.2505.14289"}
+{"id": "may-i-have-2025", "title": "May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks", "authors": ["Nishit V. Pandya", "Andrey Labunets", "Sicun Gao", "Earlence Fernandes"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.07417", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A popular class of defenses against prompt injection attacks on large language models (LLMs) relies on fine-tuning to separate instructions and data, so that the LLM does not follow instructions that ", "arxiv_id": "2507.07417", "doi": "10.48550/arXiv.2507.07417"}
+{"id": "early-categorization-prompt-2024", "title": "An Early Categorization of Prompt Injection Attacks on Large Language Models", "authors": ["Sippo Rossi", "Alisia Marianne Michel", "R. Mukkamala", "J. Thatcher"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.00898", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models and AI chatbots have been at the forefront of democratizing artificial intelligence. However, the releases of ChatGPT and other similar tools have been followed by growing concer", "arxiv_id": "2402.00898", "doi": "10.48550/arXiv.2402.00898"}
+{"id": "cybersecurity-ai-hacking-2025", "title": "Cybersecurity AI: Hacking the AI Hackers via Prompt Injection", "authors": ["V. Vilches", "Per Mannermaa Rynning"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.21669", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We demonstrate how AI-powered cybersecurity tools can be turned against themselves through prompt injection attacks. Prompt injection is reminiscent of cross-site scripting (XSS): malicious text is hi", "arxiv_id": "2508.21669", "doi": "10.48550/arXiv.2508.21669"}
+{"id": "drift-dynamic-rulebased-2025", "title": "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents", "authors": ["Hao Li", "Xiaogeng Liu", "Hung-Chun Chiu", "Dianqi Li", "Ning Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.12104", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined tools, th", "arxiv_id": "2506.12104", "doi": "10.48550/arXiv.2506.12104"}
+{"id": "false-data-injection-2025", "title": "False Data Injection Attack Detection and Localization Framework in Power Distribution Systems Using a Novel Ensemble of CNNs and Explainable Artificial Intelligence", "authors": ["M. R. Dehbozorgi", "Mohammad Rastegar", "M. Arani"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TIA.2025.3532917", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Cyber-physical power systems are vulnerable to cyber-attacks, especially false data injection attacks (FDIAs). FDIAs against distribution system state estimation (DSSE), which alter state estimation (", "doi": "10.1109/TIA.2025.3532917"}
+{"id": "secinfer-preventing-prompt-2025", "title": "SecInfer: Preventing Prompt Injection via Inference-time Scaling", "authors": ["Yupei Liu", "Yanting Wang", "Yuqi Jia", "Jinyuan Jia", "N. Gong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.24967", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Prompt injection attacks pose a pervasive threat to the security of Large Language Models (LLMs). State-of-the-art prevention-based defenses typically rely on fine-tuning an LLM to enhance its securit", "arxiv_id": "2509.24967", "doi": "10.48550/arXiv.2509.24967"}
+{"id": "prompt-injection-attacks-2025-2", "title": "Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications", "authors": ["Janis Keuper"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.10248", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ongoing intense discussion on rising LLM usage in the scientific peer-review process has recently been mingled by reports of authors using hidden prompt injections to manipulate review scores. Sin", "arxiv_id": "2509.10248", "doi": "10.48550/arXiv.2509.10248"}
+{"id": "mitigating-indirect-prompt-2025", "title": "Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis", "authors": ["Mintong Kang", "Chong Xiang", "Sanjay Kariyappa", "Chaowei Xiao", "Bo Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.00966", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Indirect prompt injection attacks (IPIAs), where large language models (LLMs) follow malicious instructions hidden in input data, pose a critical threat to LLM-powered agents. In this paper, we presen", "arxiv_id": "2512.00966", "doi": "10.48550/arXiv.2512.00966"}
+{"id": "trustworthy-agentic-ai-2025", "title": "Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks", "authors": ["Toqeer Ali Syed", "Mishal Ateeq Almutairi", "Mahmoud Abdel Moaty"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.23557", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Powerful autonomous systems, which reason, plan, and converse using and between numerous tools and agents, are made possible by Large Language Models (LLMs), Vision-Language Models (VLMs), and new age", "arxiv_id": "2512.23557", "doi": "10.48550/arXiv.2512.23557"}
+{"id": "argus-defending-against-2025", "title": "ARGUS: Defending Against Multimodal Indirect Prompt Injection via Steering Instruction-Following Behavior", "authors": ["Weikai Lu", "Ziqian Zeng", "Kehuan Zhang", "Haoran Li", "Huiping Zhuang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.05745", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multimodal Large Language Models (MLLMs) are increasingly vulnerable to multimodal Indirect Prompt Injection (IPI) attacks, which embed malicious instructions in images, videos, or audio to hijack mod", "arxiv_id": "2512.05745", "doi": "10.48550/arXiv.2512.05745"}
+{"id": "attention-all-you-2025", "title": "Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs", "authors": ["Yinan Zhong", "Qianhao Miao", "Yanjiao Chen", "Jiangyi Deng", "Yushi Cheng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08417", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt I", "arxiv_id": "2512.08417", "doi": "10.48550/arXiv.2512.08417"}
+{"id": "pisanitizer-preventing-prompt-2025", "title": "PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization", "authors": ["Runpeng Geng", "Yanting Wang", "Chenlong Yin", "Minhao Cheng", "Ying Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.10720", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Long context LLMs are vulnerable to prompt injection, where an attacker can inject an instruction in a long context to induce an LLM to generate an attacker-desired output. Existing prompt injection d", "arxiv_id": "2511.10720", "doi": "10.48550/arXiv.2511.10720"}
+{"id": "research-sql-injection-2022", "title": "Research on SQL Injection Attack and Defense Technology of Power Dispatching Data Network: Based on Data Mining", "authors": ["Jingyuan Sheng"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1155/2022/6207275", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the process of SQL injection attack and defense of power dispatching data network, in order to ensure the accuracy of identification and defense, it is often necessary to build a rule base. However", "doi": "10.1155/2022/6207275"}
+{"id": "lesson-multilabel-adversarial-2024", "title": "LESSON: Multi-Label Adversarial False Data Injection Attack for Deep Learning Locational Detection", "authors": ["Jiwei Tian", "Chao Shen", "Buhong Wang", "Xiaofang Xia", "Meng Zhang"], "year": 2024, "venue": "IEEE Transactions on Dependable and Secure Computing", "source_url": "https://arxiv.org/abs/2401.16001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep learning methods can not only detect false data injection attacks (FDIA) but also locate attacks of FDIA. Although adversarial false data injection attacks (AFDIA) based on deep learning vulnerab", "arxiv_id": "2401.16001", "doi": "10.1109/TDSC.2024.3353302"}
+{"id": "mind-mapping-prompt-2025", "title": "Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models", "authors": ["Seyong Lee", "Jaebeom Kim", "Wooguil Pak"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics14101907", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have made significant strides in generating coherent and contextually relevant responses across diverse domains. However, these advancements have also led to an increase i", "doi": "10.3390/electronics14101907"}
+{"id": "weaponizing-words-direct-2025", "title": "Weaponizing Words: Direct & Indirect Prompt Injection Attacks on LLM", "authors": ["P. Reddy"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3769694.3771165", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI systems—particularly LLM applications—proliferate across products and workflows, security has not kept pace. Prompt injection is the failure mode most likely to turn well-intentioned LLM applica", "doi": "10.1145/3769694.3771165"}
+{"id": "design-implementation-secure-2025", "title": "Design and Implementation of a Secure RAG-Enhanced AI Chatbot for Smart Tourism Customer Service: Defending Against Prompt Injection Attacks - A Case Study of Hsinchu, Taiwan", "authors": ["Yu-Kai Shih", "You-Kai Kang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21367", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As smart tourism evolves, AI-powered chatbots have become indispensable for delivering personalized, real-time assistance to travelers while promoting sustainability and efficiency. However, these sys", "arxiv_id": "2509.21367", "doi": "10.48550/arXiv.2509.21367"}
+{"id": "defending-aipowered-commerce-2025", "title": "Defending The AI-Powered Commerce Stack: A Security Framework For Prompt Injection, Review Integrity, And Privacy In Genai Retail Systems", "authors": ["Prakash Kodali"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.63278/jicrcr.vi.3471", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI is fundamentally changing digital retail through intelligent search, conversational assistants, personalized recommendations, and generating dynamic content. The use of generative AI has", "doi": "10.63278/jicrcr.vi.3471"}
+{"id": "beyond-benchmark-innovative-2025", "title": "Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks", "authors": ["Safwan Shaheer", "G. M. R. Islam", "Mohammad Rafid Hamid", "Tahsin Zaman Jilan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.16307", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of m", "arxiv_id": "2512.16307", "doi": "10.48550/arXiv.2512.16307"}
+{"id": "drip-defending-prompt-2025", "title": "DRIP: Defending Prompt Injection via De-instruction Training and Residual Fusion Model Architecture", "authors": ["Ruofan Liu", "Yun Lin", "J. Dong"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2511.00447", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2511.00447"}
+{"id": "adversial-prompt-injection-2025", "title": "Adversial Prompt Injection in Large Language Models: Taxonomy, Exploits, and Mitigation Frameworks", "authors": ["Hritesh Yadav", "Varun Singh", "Kshitij Sharma"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICRCICN68210.2025.11364988", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Adversarial prompt injection attacks pose a critical security threat to Large Language Models (LLMs) by manipulating model instructions through malicious inputs. In this paper, we present a comprehens", "doi": "10.1109/ICRCICN68210.2025.11364988"}
+{"id": "courtguard-local-multiagent-2025", "title": "CourtGuard: A Local, Multiagent Prompt Injection Classifier", "authors": ["I. Wu", "Michael Maslowski"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19844", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language models (LLMs) become integrated into various sensitive applications, prompt injection, the use of prompting to induce harmful behaviors from LLMs, poses an ever increasing risk. Prom", "arxiv_id": "2510.19844", "doi": "10.48550/arXiv.2510.19844"}
+{"id": "dialogue-injection-attack-2025", "title": "Dialogue Injection Attack: Jailbreaking LLMs Through Context Manipulation", "authors": ["Wenlong Meng", "Fan Zhang", "Wendao Yao", "Zhenyuan Guo", "Yuwei Li"], "year": 2025, "venue": "IEEE Transactions on Information Forensics and Security", "source_url": "https://arxiv.org/abs/2503.08195", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated significant utility in a wide range of applications; however, their deployment is plagued by security vulnerabilities, notably jailbreak attacks. These a", "arxiv_id": "2503.08195", "doi": "10.1109/TIFS.2026.3657898"}
+{"id": "task-shield-enforcing-2024", "title": "The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents", "authors": ["Feiran Jia", "Tong Wu", "Xin Qin", "Anna Squicciarini"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2412.16682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration. This enhanced ability to interac", "arxiv_id": "2412.16682", "doi": "10.48550/arXiv.2412.16682"}
+{"id": "comprehensive-analysis-machine-2025", "title": "Comprehensive Analysis of Machine Learning and Deep Learning models on Prompt Injection Classification using Natural Language Processing techniques", "authors": ["Bhavvya Jain", "Pranav Pawar", "Dhruv Gada", "Tanish Patwa", "Pratik Kanani"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.54392/irjmt2523", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study addresses the prompt injection attack based vulnerability in large language models, which poses a significant security concern by allowing unauthorized commands by attackers to manipulate t", "doi": "10.54392/irjmt2523"}
+{"id": "breaking-prompt-wall-2025", "title": "Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection", "authors": ["Xiangyu Chang", "Guang Dai", "Hao Di", "Haishan Ye"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.16125", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This report presents a real-world case study demonstrating how prompt injection can attack large language model platforms such as ChatGPT according to a proposed injection framework. By providing thre", "arxiv_id": "2504.16125", "doi": "10.48550/arXiv.2504.16125"}
+{"id": "too-easily-fooled-2025", "title": "Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions", "authors": ["Xuyang Guo", "Zekai Huang", "Zhao Song", "Jiahao Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.13214", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have recently demonstrated strong emergent abilities in complex reasoning and zero-shot generalization, showing unprecedented potential for LLM-as-a-judge applications in ", "arxiv_id": "2508.13214", "doi": "10.48550/arXiv.2508.13214"}
+{"id": "agentfuzzer-generic-blackbox-2025", "title": "AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents", "authors": ["Zhun Wang", "Vincent Siu", "Zhe Ye", "Tianneng Shi", "Yuzhou Nie"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2505.05849", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2505.05849"}
+{"id": "agenttypo-adaptive-typographic-2025", "title": "AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents", "authors": ["Yanjie Li", "Yiming Cao", "Dong Wang", "Bin Xiao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.04257", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multimodal agents built on large vision-language models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We ", "arxiv_id": "2510.04257", "doi": "10.48550/arXiv.2510.04257"}
+{"id": "meticulous-thought-defender-2025", "title": "Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models", "authors": ["Lijuan Shi", "Yajing Kang", "Jie Hu", "Xinchi Li", "Mingchuan Yang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3583759", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have exhibited exceptional capabilities across various natural language processing tasks, however, they remain susceptible to prompt injection attacks, which pose signific", "doi": "10.1109/ACCESS.2025.3583759"}
+{"id": "queryipi-queryagnostic-indirect-2025", "title": "QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents", "authors": ["Yuchong Xie", "Zesen Liu", "Mingyu Luo", "Zhixiang Zhang", "Kaikai Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.23675", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern coding agents integrated into IDEs orchestrate powerful tools and high-privilege system access, creating a high-stakes attack surface. Prior work on Indirect Prompt Injection (IPI) is mainly qu", "arxiv_id": "2510.23675", "doi": "10.48550/arXiv.2510.23675"}
+{"id": "checkpointgcg-auditing-attacking-2025", "title": "Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses", "authors": ["Xiaoxu Yang", "Bozhidar Stevanoski", "Matthieu Meeus", "Y. Montjoye"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2505.15738", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly deployed in real-world applications ranging from chatbots to agentic systems, where they are expected to process untrusted data and follow trusted instruc", "arxiv_id": "2505.15738"}
+{"id": "promptsleuth-detecting-prompt-2025", "title": "PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance", "authors": ["Mengxiao Wang", "Yuxuan Zhang", "Guofei Gu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.20890", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly integrated into real-world applications, from virtual assistants to autonomous agents. However, their flexibility also introduces new attack vectors-parti", "arxiv_id": "2508.20890", "doi": "10.48550/arXiv.2508.20890"}
+{"id": "text-prompt-injection-2025", "title": "Text Prompt Injection of Vision Language Models", "authors": ["Ruizhe Zhu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.09849", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead the", "arxiv_id": "2510.09849", "doi": "10.48550/arXiv.2510.09849"}
+{"id": "disrupting-large-language-2025", "title": "Disrupting Large Language Models with Hidden Prompt Injection Attacks Embedded in HTML Pages", "authors": ["Ionuţ-Vlăduţ Dinu", "G. Danciu", "Raul Cristian Vintilă", "T. Bălan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/OPTIM-ACEMP62776.2025.11075247", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This research evaluates the reliability of Large Language Models (LLMs) as collaborative tools for extracting information from various web sources, including standard websites, e-commerce platforms, b", "doi": "10.1109/OPTIM-ACEMP62776.2025.11075247"}
+{"id": "reconstructionbased-prompt-generation-2025", "title": "Reconstruction-Based Prompt Generation Algorithm for Prompt Injection Attacks", "authors": ["Zheng Yu", "Chunlong Fan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AANN66429.2025.11257661", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In order to better bypass the security mechanism of the existing model and verify the vulnerabilities of the model in prompt word injection, we propose a reconstructed prompt generation algorithm (RPG", "doi": "10.1109/AANN66429.2025.11257661"}
+{"id": "separator-injection-attack-2025", "title": "Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators", "authors": ["Xitao Li", "Haijun Wang", "Jiang Wu", "Ting Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.05689", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Conversational large language models (LLMs) have gained widespread attention due to their instruction-following capabilities. To ensure conversational LLMs follow instructions, role separators are emp", "arxiv_id": "2504.05689", "doi": "10.48550/arXiv.2504.05689"}
+{"id": "datasentinel-gametheoretic-detection-2025", "title": "DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks", "authors": ["Yupei Liu", "Yuqi Jia", "Jinyuan Jia", "Dawn Song", "N. Gong"], "year": 2025, "venue": "IEEE Symposium on Security and Privacy", "source_url": "https://arxiv.org/abs/2504.11358", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to de", "arxiv_id": "2504.11358", "doi": "10.1109/SP61157.2025.00250"}
+{"id": "adversarial-multilingual-threats-2025", "title": "Adversarial and Multilingual Threats in Retrieval-Augmented Generation: From Prompt Injection to Model Exploitation", "authors": ["Basma ElSaify", "Mohamed Baderelden"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/GACLM67198.2025.11231998", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) systems have emerged as a cornerstone of modern generative AI by coupling large language models (LLMs) with external knowledge retrieval, thereby enabling more gro", "doi": "10.1109/GACLM67198.2025.11231998"}
+{"id": "dark-side-ai-2025", "title": "The Dark Side of AI: a Systematization of Knowledge on Jailbreaking and Prompt Injection in LLMs", "authors": ["N. M", "S. Reshma", "Y. L. R. Harsha", "Swaminadhan Rajula"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICONAT66879.2025.11362401", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The proliferation of Large Language Models (LLMs) has introduced a new frontier of security vulnerabilities that diverge significantly from traditional software exploits. This paper provides a systema", "doi": "10.1109/ICONAT66879.2025.11362401"}
+{"id": "prompt-injection-vulnerability-2025", "title": "Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy", "authors": ["Jairo Gudiño-Rosero", "Cl'ement Contet", "Umberto Grandi", "César A. Hidalgo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.04281", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are gaining traction as a method to generate consensus statements and aggregate preferences in digital democracy experiments. Yet, LLMs could introduce critical vulnerabil", "arxiv_id": "2508.04281", "doi": "10.48550/arXiv.2508.04281"}
+{"id": "imagebased-prompt-injection-2025", "title": "Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions", "authors": ["Neha Nagaraja", "Lan Zhang", "Zhilong Wang", "Bo Zhang", "Pawan Patil"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FLLM67465.2025.11391218", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multimodal Large Language Models (MLLMs) integrate vision and text to power applications, but this integration introduces new vulnerabilities. We study Image-based Prompt Injection (IPI), a black-box ", "doi": "10.1109/FLLM67465.2025.11391218"}
+{"id": "adaptive-privacypreserving-detection-2025", "title": "Adaptive and Privacy-Preserving Detection of Evolving Prompt Injection Attacks in Large Language Models", "authors": ["Sivakumar Depuru", "Kurugunda Radha Lakshmi", "Gopa Chandrahas", "Mopuri Chetana", "M. V. Reddy"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSCN67106.2025.11308512", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated impressive abilities but are extremely vulnerable to prompt injection attacks where malicious inputs bypass the intended instructions, bypass safety mech", "doi": "10.1109/ICSCN67106.2025.11308512"}
+{"id": "experimental-evaluation-prompt-2025", "title": "An Experimental Evaluation of Prompt Injection Attacks on LLM-Based Content Moderation Systems", "authors": ["Fabio Tortora", "Vincenzo de Angelis"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SNAMS67467.2025.11390931", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly deployed to automate content moderation on social platforms, offering efficiency and scalability far beyond human capabilities. However, they are highly v", "doi": "10.1109/SNAMS67467.2025.11390931"}
+{"id": "detection-defense-mechanism-2022", "title": "The Detection and Defense Mechanism for SQL Injection Attack Based on Web Application", "authors": ["Li Min", "Gao Ranxin", "Si Guanlin", "Chen-Jung Wei", "Xu Xiaotian"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ITAIC54216.2022.9836786", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In view of the risk of SQL injection attack faced by the Web system, this paper proposes a SQL injection attack detection mechanism based on triangle module operator. The method uses the analysis resu", "doi": "10.1109/ITAIC54216.2022.9836786"}
+{"id": "adaptive-lqrbased-defense-2022", "title": "An Adaptive LQR-Based Defense Strategy against False Data Injection Attack in Smart Grids", "authors": ["Xiaoyuan Luo", "Ruiyang Gao", "Xinyu Wang", "Xiangjie Wang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EI256261.2022.10117083", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid development of cyber-physical power system, the security risk caused by false data injection attacks on the power system is increasing. Due to the covert characteristics of false data i", "doi": "10.1109/EI256261.2022.10117083"}
+{"id": "prompt-injection-detection-2025", "title": "Prompt Injection Detection in LLM Integrated Applications", "authors": ["Qianlong Lan", "AnujKaul", "Shaun Jones"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.53941/ijndi.2025.100013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of large language models (LLMs) into creative applications has unlocked new capabilities but also introduced vulnerabilities, notably prompt injections. These are malicious inputs desi", "doi": "10.53941/ijndi.2025.100013"}
+{"id": "evaluating-prompt-injection-2025", "title": "Evaluating Prompt Injection Attacks with LSTM-Based Generative Adversarial Networks: A Lightweight Alternative to Large Language Models", "authors": ["Sharaf Rashid", "Edson Bollis", "Lucas Pellicer", "Darian Rabbani", "Rafael Palacios"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/make7030077", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Adversarial Networks (GANs) using Long Short-Term Memory (LSTM) provide a computationally cheaper approach for text generation compared to large language models (LLMs). The low hardware bar", "doi": "10.3390/make7030077"}
+{"id": "wasp-benchmarking-web-2025", "title": "WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks", "authors": ["Ivan Evtimov", "Arman Zharmagambetov", "Aaron Grattafiori", "Chuan Guo", "Kamalika Chaudhuri"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.18575", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autonomous UI agents powered by AI have tremendous potential to boost human productivity by automating routine tasks such as filing taxes and paying bills. However, a major challenge in unlocking thei", "arxiv_id": "2504.18575", "doi": "10.48550/arXiv.2504.18575"}
+{"id": "derag-blackbox-adversarial-2025", "title": "DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection", "authors": ["Jerry Wang", "Fang-Yi Yu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.15042", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Adversarial prompt attacks can significantly alter the reliability of Retrieval-Augmented Generation (RAG) systems by re-ranking them to produce incorrect outputs. In this paper, we present a novel me", "arxiv_id": "2507.15042", "doi": "10.48550/arXiv.2507.15042"}
+{"id": "survey-adversarial-examples-2025", "title": "A Survey of Adversarial Examples in Computer Vision: Attack, Defense, and Beyond", "authors": ["Keyizhi Xu", "Yajuan Lu", "Zhongyuan Wang", "Chao Liang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1051/wujns/2025301001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent years have witnessed the ever-increasing performance of Deep Neural Networks (DNNs) in computer vision tasks. However, researchers have identified a potential vulnerability: carefully crafted a", "doi": "10.1051/wujns/2025301001"}
+{"id": "rtbas-defending-llm-2025", "title": "RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage", "authors": ["Peter Zhong", "Siyuan Chen", "Ruiqi Wang", "M. McCall", "Ben L. Titzer"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.08966", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Tool-Based Agent Systems (TBAS) allow Language Models (LMs) to use external tools for tasks beyond their standalone capabilities, such as searching websites, booking flights, or making financial trans", "arxiv_id": "2502.08966", "doi": "10.48550/arXiv.2502.08966"}
+{"id": "defense-massive-false-2022", "title": "Defense of Massive False Data Injection Attack via Sparse Attack Points Considering Uncertain Topological Changes", "authors": ["Xiaoge Huang", "Zhijun Qin", "Ming Xie", "Hui Liu", "Liang Meng"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.35833/mpce.2020.000686", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.35833/mpce.2020.000686"}
+{"id": "empirical-analysis-large-2024", "title": "Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection", "authors": ["Subaru Kimura", "Ryota Tanaka", "Shumpei Miyawaki", "Jun Suzuki", "Keisuke Sakaguchi"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.03554", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method,\"g", "arxiv_id": "2408.03554", "doi": "10.48550/arXiv.2408.03554"}
+{"id": "when-reject-turns-2025", "title": "When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection", "authors": ["Devanshu Sahoo", "Manish Prasad", "Vasudev Majhi", "Jahnvi Singh", "Vinay Chamola"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.10449", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Driven by surging submission volumes, scientific peer review has catalyzed two parallel trends: individual over-reliance on LLMs and institutional AI-powered assessment systems. This study investigate", "arxiv_id": "2512.10449", "doi": "10.48550/arXiv.2512.10449"}
+{"id": "node-injectionbased-adversarial-2026", "title": "Node Injection-Based Adversarial Attack and Defense on Social Bot Detection", "authors": ["Yanwei Xie", "Wei-zhi Nie", "Lanjun Wang", "Xinran Qiao", "Shenyuan Zhang"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TCSS.2025.3599730", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Social platforms such as Twitter are increasingly threatened by automated social bots, which can manipulate public opinion, spread misinformation, and compromise platform integrity. To detect such acc", "doi": "10.1109/TCSS.2025.3599730"}
+{"id": "backdoored-retrievers-prompt-2024", "title": "Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models", "authors": ["Cody Clop", "Yannick Teglia"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.14479", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in generating coherent text but remain limited by the static nature of their training data. Retrieval Augmented Generation (RAG) ", "arxiv_id": "2410.14479", "doi": "10.48550/arXiv.2410.14479"}
+{"id": "optimal-injection-attack-2024", "title": "Optimal Injection Attack Strategy for Nonlinear Cyber-Physical Systems Based on Iterative Learning", "authors": ["Sheng Gao", "Hao Zhang", "Chao Huang", "Zhuping Wang", "Huaicheng Yan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TASE.2022.3232496", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper aims to investigate the security problem of nonlinear cyber-physical systems (CPSs), which poses a challenge to handle compared with linear CPSs. A series of optimization problems for nonli", "doi": "10.1109/TASE.2022.3232496"}
+{"id": "damitsql-detecting-mitigating-2025", "title": "DaMiT-SQL: Detecting and Mitigating Text-to-SQL Prompt Injection Attacks", "authors": ["Zi Han Ding", "Alexandros Labrinidis"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CASCON66301.2025.00116", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are known for their ability to understand and respond to human instructions/prompts. As such, LLMs can be used to produce natural language interfaces for databases. Howeve", "doi": "10.1109/CASCON66301.2025.00116"}
+{"id": "formalizing-benchmarking-prompt-2023", "title": "Formalizing and Benchmarking Prompt Injection Attacks and Defenses", "authors": ["Yupei Liu", "Yuqi Jia", "Runpeng Geng", "Jinyuan Jia", "N. Gong"], "year": 2023, "venue": "USENIX Security Symposium", "source_url": "https://arxiv.org/abs/2310.12815", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to ", "arxiv_id": "2310.12815"}
+{"id": "do-as-i-2025", "title": "'Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs", "authors": ["Chun Wai Chiu", "Linghan Huang", "Bo Li", "Huaming Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.00735", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have seen widespread applications across various domains due to their growing ability to process diverse types of input data, including text, audio, image and video. While", "arxiv_id": "2502.00735", "doi": "10.48550/arXiv.2502.00735"}
+{"id": "defensive-prompt-patch-2024", "title": "Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks", "authors": ["Chen Xiong", "Xiangyu Qi", "Pin-Yu Chen", "Tsung-Yi Ho"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.20099", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Safety, security, and compliance are essential requirements when aligning large language models (LLMs). However, many seemingly aligned LLMs are soon shown to be susceptible to jailbreak attacks. Thes", "arxiv_id": "2405.20099", "doi": "10.48550/arXiv.2405.20099"}
+{"id": "prompt-injection-attacks-2025-2-2", "title": "Prompt injection attacks on vision language models in oncology", "authors": ["J. Clusmann", "Dyke Ferber", "I. Wiest", "Carolin V. Schneider", "T. Brinker"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41467-024-55631-x", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decisi", "doi": "10.1038/s41467-024-55631-x"}
+{"id": "red-teaming-mind-2025", "title": "Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs", "authors": ["Chetan Pathade"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.04806", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly integrated into consumer and enterprise applications. Despite their capabilities, they remain susceptible to adversarial attacks such as prompt injection ", "arxiv_id": "2505.04806", "doi": "10.48550/arXiv.2505.04806"}
+{"id": "traitors-deception-trust-2025", "title": "The Traitors: Deception and Trust in Multi-Agent Language Model Simulations", "authors": ["Pedro M. P. Curvo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.12923", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We intro", "arxiv_id": "2505.12923", "doi": "10.48550/arXiv.2505.12923"}
+{"id": "survival-games-humanllm-2025", "title": "Survival Games: Human-LLM Strategic Showdowns under Severe Resource Scarcity", "authors": ["Zhihong Chen", "Yiqian Yang", "Jinzhao Zhou", "Qiang Zhang", "Chin-teng Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.17937", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) raises critical concerns about their ethical alignment, particularly in scenarios where human and AI co-exist under the conflict of interest. This", "arxiv_id": "2505.17937", "doi": "10.48550/arXiv.2505.17937"}
+{"id": "among-us-sandbox-2025", "title": "Among Us: A Sandbox for Agentic Deception", "authors": ["Satvik Golechha", "Adrià Garriga-Alonso"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2504.04072", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2504.04072"}
+{"id": "mmrlhf-next-step-2025", "title": "MM-RLHF: The Next Step Forward in Multimodal LLM Alignment", "authors": ["Yifan Zhang", "Tao Yu", "Haochen Tian", "Chaoyou Fu", "Peiyan Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.10391", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite notable advancements in Multimodal Large Language Models (MLLMs), most state-of-the-art models have not undergone thorough alignment with human preferences. This gap exists because current ali", "arxiv_id": "2502.10391", "doi": "10.48550/arXiv.2502.10391"}
+{"id": "collab-controlled-decoding-2025", "title": "Collab: Controlled Decoding using Mixture of Agents for LLM Alignment", "authors": ["Souradip Chakraborty", "Sujay Bhatt", "Udari Madhushani Sehwag", "Soumya Suvra Ghosal", "Jiahao Qiu"], "year": 2025, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2503.21720", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Alignment of Large Language models (LLMs) is crucial for safe and trustworthy deployment in applications. Reinforcement learning from human feedback (RLHF) has emerged as an effective technique to ali", "arxiv_id": "2503.21720", "doi": "10.48550/arXiv.2503.21720"}
+{"id": "alignpro-principled-approach-2025", "title": "Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment", "authors": ["Prashant Trivedi", "Souradip Chakraborty", "Avinash Reddy", "Vaneet Aggarwal", "A. S. Bedi"], "year": 2025, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2501.03486", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The alignment of large language models (LLMs) with human values is critical as these models become increasingly integrated into various societal and decision-making processes. Traditional methods, suc", "arxiv_id": "2501.03486", "doi": "10.48550/arXiv.2501.03486"}
+{"id": "condor-enhance-llm-2025", "title": "Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement", "authors": ["Maosong Cao", "Taolin Zhang", "Mo Li", "Chuyu Zhang", "Yunxin Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.12273", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The quality of Supervised Fine-Tuning (SFT) data plays a critical role in enhancing the conversational capabilities of Large Language Models (LLMs). However, as LLMs become more advanced, the availabi", "arxiv_id": "2501.12273", "doi": "10.48550/arXiv.2501.12273"}
+{"id": "hidden-dimensions-llm-2025", "title": "The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Safety Analysis", "authors": ["Wenbo Pan", "Zhichao Liu", "Qiguang Chen", "Xiangyang Zhou", "Haining Yu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2502.09674", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2502.09674"}
+{"id": "openrubrics-scalable-synthetic-2025", "title": "OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment", "authors": ["Tianci Liu", "Ran Xu", "Tony Yu", "Ilgee Hong", "Carl Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.07743", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reward modeling lies at the core of reinforcement learning from human feedback (RLHF), yet most existing reward models rely on scalar or pairwise judgments that fail to capture the multifaceted nature", "arxiv_id": "2510.07743", "doi": "10.48550/arXiv.2510.07743"}
+{"id": "inverse-reinforcement-learning-2025", "title": "Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment", "authors": ["Ruoxi Cheng", "Haoxuan Ma", "Weixing Wang", "Ranjie Duan", "Jiexi Liu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.18991", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Alignment is vital for safely deploying large language models (LLMs). Existing techniques are either reward-based (train a reward model on preference pairs and optimize with reinforcement learning) or", "arxiv_id": "2503.18991"}
+{"id": "robust-llm-alignment-2025", "title": "Robust LLM Alignment via Distributionally Robust Direct Preference Optimization", "authors": ["Zaiyan Xu", "Sushil Vemuri", "Kishan Panaganti", "D. Kalathil", "Rahul Jain"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.01930", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they ac", "arxiv_id": "2502.01930"}
+{"id": "detecting-proxy-gaming-2025", "title": "Detecting Proxy Gaming in RL and LLM Alignment via Evaluator Stress Tests", "authors": ["Ibne Farabi Shihab", "Sanjeda Akter", "Anuj Sharma"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2507.05619", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Proxy optimization, where AI systems exploit evaluator weaknesses rather than improve intended objectives, threatens both reinforcement learning (reward hacking) and LLM alignment (evaluator gaming). ", "arxiv_id": "2507.05619"}
+{"id": "survey-progress-llm-2025", "title": "A Survey on Progress in LLM Alignment from the Perspective of Reward Design", "authors": ["Miaomiao Ji", "Yanqiu Wu", "Zhibin Wu", "Shoujin Wang", "Jian Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.02666", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reward design plays a pivotal role in aligning large language models (LLMs) with human values, serving as the bridge between feedback signals and model optimization. This survey provides a structured ", "arxiv_id": "2505.02666", "doi": "10.48550/arXiv.2505.02666"}
+{"id": "exploring-personadependent-llm-2025", "title": "Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment", "authors": ["Jiseon Kim", "Jea Kwon", "L. Vecchietti", "Alice Oh", "Meeyoung Cha"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.10886", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deploying large language models (LLMs) with agency in real-world applications raises critical questions about how these models will behave. In particular, how will their decisions align with humans wh", "arxiv_id": "2504.10886", "doi": "10.48550/arXiv.2504.10886"}
+{"id": "alphapo-reward-shape-2025", "title": "AlphaPO: Reward Shape Matters for LLM Alignment", "authors": ["Aman Gupta", "Shao Tang", "Qingquan Song", "Sirou Zhu", "Jiwoo Hong"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2501.03884", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement Learning with Human Feedback (RLHF) and its variants have made huge strides toward the effective alignment of large language models (LLMs) to follow instructions and reflect human values", "arxiv_id": "2501.03884"}
+{"id": "dpo-superior-ppo-2024", "title": "Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study", "authors": ["Shusheng Xu", "Wei Fu", "Jiaxuan Gao", "Wenjie Ye", "Weiling Liu"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2404.10719", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorize", "arxiv_id": "2404.10719", "doi": "10.48550/arXiv.2404.10719"}
+{"id": "llm-alignment-as-2025", "title": "LLM Alignment as Retriever Optimization: An Information Retrieval Perspective", "authors": ["Bowen Jin", "Jinsung Yoon", "Zhen Qin", "Ziqi Wang", "Wei Xiong"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.03699", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized artificial intelligence with capabilities in reasoning, coding, and communication, driving innovation across industries. Their true potential depends o", "arxiv_id": "2502.03699", "doi": "10.48550/arXiv.2502.03699"}
+{"id": "fundamental-limits-gametheoretic-2025", "title": "Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching", "authors": ["Zhekun Shi", "Kaizhao Liu", "Qi Long", "Weijie J. Su", "Jiancong Xiao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.20627", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Nash Learning from Human Feedback is a game-theoretic framework for aligning large language models (LLMs) with human preferences by modeling learning as a two-player zero-sum game. However, using raw ", "arxiv_id": "2505.20627", "doi": "10.48550/arXiv.2505.20627"}
+{"id": "systematic-evaluation-llmasajudge-2024", "title": "Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates", "authors": ["Hui Wei", "Shenghua He", "Tian Xia", "Andy Wong", "Jingyang Lin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.13006", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-as-a-Judge has been widely applied to evaluate and compare different LLM alignmnet approaches (e.g., RLHF and DPO). However, concerns regarding its reliability have emerged, due to LLM judges' bia", "arxiv_id": "2408.13006", "doi": "10.48550/arXiv.2408.13006"}
+{"id": "comprehensive-survey-llm-2024", "title": "A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More", "authors": ["Zhichao Wang", "Bin Bi", "Shiva K. Pentyala", "Kiran Ramnath", "Sougata Chaudhuri"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.16216", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With advancements in self-supervised learning, the availability of trillions tokens in a pre-training corpus, instruction fine-tuning, and the development of large Transformers with billions of parame", "arxiv_id": "2407.16216", "doi": "10.48550/arXiv.2407.16216"}
+{"id": "kornat-llm-alignment-2024", "title": "KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge", "authors": ["Jiyoung Lee", "Minwoo Kim", "Seungho Kim", "Junghwan Kim", "Seunghyun Won"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.13605", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: For Large Language Models (LLMs) to be effectively deployed in a specific country, they must possess an understanding of the nation's culture and basic knowledge. To this end, we introduce National Al", "arxiv_id": "2402.13605", "doi": "10.48550/arXiv.2402.13605"}
+{"id": "rmb-comprehensively-benchmarking-2024", "title": "RMB: Comprehensively Benchmarking Reward Models in LLM Alignment", "authors": ["Enyu Zhou", "Guodong Zheng", "Bing Wang", "Zhiheng Xi", "Shihan Dou"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2410.09893", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reward models (RMs) guide the alignment of large language models (LLMs), steering them toward behaviors preferred by humans. Evaluating RMs is the key to better aligning LLMs. However, the current eva", "arxiv_id": "2410.09893", "doi": "10.48550/arXiv.2410.09893"}
+{"id": "transfer-q-star-2024", "title": "Transfer Q Star: Principled Decoding for LLM Alignment", "authors": ["Souradip Chakraborty", "Soumya Suvra Ghosal", "Ming Yin", "Dinesh Manocha", "Mengdi Wang"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2405.20495", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model paramet", "arxiv_id": "2405.20495"}
+{"id": "bayesian-reward-models-2024", "title": "Bayesian Reward Models for LLM Alignment", "authors": ["Adam X. Yang", "Maxime Robeyns", "Thomas Coste", "Jun Wang", "Haitham Bou-Ammar"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.13210", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To ensure that large language model (LLM) responses are helpful and non-toxic, a reward model trained on human preference data is usually used. LLM responses with high rewards are then selected throug", "arxiv_id": "2402.13210", "doi": "10.48550/arXiv.2402.13210"}
+{"id": "rlthf-targeted-human-2025", "title": "RLTHF: Targeted Human Feedback for LLM Alignment", "authors": ["Yifei Xu", "Tusher Chakraborty", "Emre Kiciman", "Bibek Aryal", "Eduardo Rodrigues"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.13417", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the ge", "arxiv_id": "2502.13417", "doi": "10.48550/arXiv.2502.13417"}
+{"id": "aligned-query-expansion-2025", "title": "Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment", "authors": ["Adam Yang", "Gustavo Penha", "Enrico Palumbo", "Hugues Bouchard"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.11042", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the breakthroughs in large language models (LLMs), query generation techniques that expand documents and queries with related terms are becoming increasingly popular in the information retrieval ", "arxiv_id": "2507.11042", "doi": "10.48550/arXiv.2507.11042"}
+{"id": "societal-alignment-frameworks-2025", "title": "Societal Alignment Frameworks Can Improve LLM Alignment", "authors": ["Karolina Stańczak", "Nicholas Meade", "Mehar Bhatia", "Hattie Zhou", "Konstantin Bottinger"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.00069", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remain", "arxiv_id": "2503.00069", "doi": "10.48550/arXiv.2503.00069"}
+{"id": "finegrained-analysis-brainllm-2025", "title": "Fine-grained Analysis of Brain-LLM Alignment through Input Attribution", "authors": ["Michela Proietti", "Roberto Capobianco", "Mariya Toneva"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.12355", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Understanding the alignment between large language models (LLMs) and human brain activity can reveal computational principles underlying language processing. We introduce a fine-grained input attribut", "arxiv_id": "2510.12355", "doi": "10.48550/arXiv.2510.12355"}
+{"id": "evaluating-llm-alignment-2025", "title": "On Evaluating LLM Alignment by Evaluating LLMs as Judges", "authors": ["Yixin Liu", "Pengfei Liu", "Arman Cohan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.20604", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models'(LL", "arxiv_id": "2511.20604", "doi": "10.48550/arXiv.2511.20604"}
+{"id": "unintended-impacts-llm-2024", "title": "Unintended Impacts of LLM Alignment on Global Representation", "authors": ["Michael Joseph Ryan", "William B. Held", "Diyi Yang"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.15018", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Before being deployed for user-facing applications, developers align Large Language Models (LLMs) to user preferences through a variety of procedures, such as Reinforcement Learning From Human Feedbac", "arxiv_id": "2402.15018", "doi": "10.48550/arXiv.2402.15018"}
+{"id": "getting-more-juice-2024", "title": "Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment", "authors": ["Jiaxiang Li", "Siliang Zeng", "Hoi-To Wai", "Chenliang Li", "Alfredo García"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2405.17888", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist ", "arxiv_id": "2405.17888", "doi": "10.48550/arXiv.2405.17888"}
+{"id": "spread-preference-annotation-2024", "title": "Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment", "authors": ["Dongyoung Kim", "Kimin Lee", "Jinwoo Shin", "Jaehyung Kim"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2406.04412", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preferenc", "arxiv_id": "2406.04412"}
+{"id": "poisoning-real-threat-2024", "title": "Is poisoning a real threat to LLM alignment? Maybe more so than you think", "authors": ["Pankayaraj Pathmanathan", "Souradip Chakraborty", "Xiangyu Liu", "Yongyuan Liang", "Furong Huang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.12091", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms", "arxiv_id": "2406.12091", "doi": "10.48550/arXiv.2406.12091"}
+{"id": "relative-preference-optimization-2024", "title": "Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts", "authors": ["Yueqin Yin", "Zhendong Wang", "Yi Gu", "Hai Huang", "Weizhu Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.10958", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge. Direct Preference Optimization (DPO) has played a key role in this area. It", "arxiv_id": "2402.10958", "doi": "10.48550/arXiv.2402.10958"}
+{"id": "inverserlignment-inverse-reinforcement-2024", "title": "Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment", "authors": ["Hao Sun", "M. Schaar"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.15624", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility. However, existing methods, primarily based on preference datasets, face challenges such as noisy labels, high a", "arxiv_id": "2405.15624", "doi": "10.48550/arXiv.2405.15624"}
+{"id": "efficient-knowledge-infusion-2024", "title": "Efficient Knowledge Infusion via KG-LLM Alignment", "authors": ["Zhouyu Jiang", "Ling Zhong", "Mengshu Sun", "Jun Xu", "Rui Sun"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.03746", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To tackle the problem of domain-specific knowledge scarcity within large language models (LLMs), knowledge graph-retrievalaugmented method has been proven to be an effective and efficient technique fo", "arxiv_id": "2406.03746", "doi": "10.48550/arXiv.2406.03746"}
+{"id": "exposing-privacy-gaps-2024", "title": "Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment", "authors": ["Qizhang Feng", "Siva Rajesh Kasa", "Hyokun Yun", "C. Teo", "S. Bodapati"], "year": 2024, "venue": "International Conference on Artificial Intelligence and Statistics", "source_url": "https://arxiv.org/abs/2407.06443", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have seen widespread adoption due to their remarkable natural language capabilities. However, when deploying them in real-world settings, it is important to align LLMs to ", "arxiv_id": "2407.06443", "doi": "10.48550/arXiv.2407.06443"}
+{"id": "understanding-layer-significance-2024", "title": "Understanding Layer Significance in LLM Alignment", "authors": ["Guangyuan Shi", "Zexin Lu", "Xiaoyu Dong", "Wenlong Zhang", "Xuanyu Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.17875", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning large language models (LLMs) through supervised fine-tuning is essential for tailoring them to specific applications. Recent studies suggest that alignment primarily adjusts a model's present", "arxiv_id": "2410.17875", "doi": "10.48550/arXiv.2410.17875"}
+{"id": "moral-turing-test-2024", "title": "The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making", "authors": ["Basile Garcia", "Crystal Qian", "Stefano Palminteri"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.07304", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language models (LLMs) become increasingly integrated into society, their alignment with human morals is crucial. To better understand this alignment, we created a large corpus of human- and ", "arxiv_id": "2410.07304", "doi": "10.48550/arXiv.2410.07304"}
+{"id": "bpo-staying-close-2024", "title": "BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment", "authors": ["Wenda Xu", "Jiachen Li", "William Yang Wang", "Lei Li"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.12168", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent ", "arxiv_id": "2406.12168", "doi": "10.18653/v1/2024.emnlp-main.623"}
+{"id": "cbfllm-safe-control-2024", "title": "CBF-LLM: Safe Control for LLM Alignment", "authors": ["Yuya Miyaoka", "Masaki Inoue"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.15625", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework", "arxiv_id": "2408.15625", "doi": "10.48550/arXiv.2408.15625"}
+{"id": "inference-time-llm-2024", "title": "Inference time LLM alignment in single and multidomain preference spectrum", "authors": ["Sadat Shahriar", "Zheng Qi", "Nikolaos Pappas", "Srikanth Doss Kadarundalagi Raghuram Doss", "Monica Sunkara"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.19206", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Ex", "arxiv_id": "2410.19206", "doi": "10.48550/arXiv.2410.19206"}
+{"id": "aegis20-diverse-ai-2025", "title": "Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails", "authors": ["Shaona Ghosh", "Prasoon Varshney", "Makesh Narsimhan Sreedhar", "Aishwarya Padmakumar", "Traian Rebedea"], "year": 2025, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2501.09004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models (LLMs) and generative AI become increasingly widespread, concerns about content safety have grown in parallel. Currently, there is a clear lack of high-quality, human-annotate", "arxiv_id": "2501.09004", "doi": "10.48550/arXiv.2501.09004"}
+{"id": "faster-wind-accelerating-2024", "title": "Faster WIND: Accelerating Iterative Best-of-N Distillation for LLM Alignment", "authors": ["Tong Yang", "Jincheng Mei", "Hanjun Dai", "Zixin Wen", "Shicong Cen"], "year": 2024, "venue": "International Conference on Artificial Intelligence and Statistics", "source_url": "https://arxiv.org/abs/2410.20727", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in aligning large language models with human preferences have corroborated the growing importance of best-of-N distillation (BOND). However, the iterative BOND algorithm is prohibitive", "arxiv_id": "2410.20727", "doi": "10.48550/arXiv.2410.20727"}
+{"id": "single-character-perturbations-2024", "title": "Single Character Perturbations Break LLM Alignment", "authors": ["Leon Lin", "Hannah Brown", "Kenji Kawaguchi", "Michael Shieh"], "year": 2024, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2407.03232", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed ", "arxiv_id": "2407.03232", "doi": "10.48550/arXiv.2407.03232"}
+{"id": "todo-enhancing-llm-2024", "title": "TODO: Enhancing LLM Alignment with Ternary Preferences", "authors": ["Yuxiang Guo", "Lu Yin", "Bo Jiang", "Jiaqi Zhang"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2411.02442", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning large language models (LLMs) with human intent is critical for enhancing their performance across a variety of tasks. Standard alignment techniques, such as Direct Preference Optimization (DP", "arxiv_id": "2411.02442", "doi": "10.48550/arXiv.2411.02442"}
+{"id": "chat-bankmanfried-exploration-2024", "title": "Chat Bankman-Fried: an Exploration of LLM Alignment in Finance", "authors": ["Claudia Biancotti", "Carolina Camassa", "Andrea Coletta", "Oliver Giudice", "Aldo Glielmo"], "year": 2024, "venue": "COLING Workshops", "source_url": "https://arxiv.org/abs/2411.11853", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, ", "arxiv_id": "2411.11853", "doi": "10.48550/arXiv.2411.11853"}
+{"id": "contextalignment-activating-enhancing-2025", "title": "Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series", "authors": ["Yuxiao Hu", "Qian Li", "Dong-juan Zhang", "Jinyue Yan", "Yuntian Chen"], "year": 2025, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2501.03747", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs'capabilities. Many methods aim to", "arxiv_id": "2501.03747", "doi": "10.48550/arXiv.2501.03747"}
+{"id": "pairwise-proximal-policy-2023", "title": "Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment", "authors": ["Tianhao Wu", "Banghua Zhu", "Ruoyu Zhang", "Zhaojin Wen", "K. Ramchandran"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.00212", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligni", "arxiv_id": "2310.00212", "doi": "10.48550/arXiv.2310.00212"}
+{"id": "grading-scale-impact-2026", "title": "Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale", "authors": ["Weiyue Li", "Minda Zhao", "Weixuan Dong", "Jiahui Cai", "Yuze Wei"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.03444", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly used as automated evaluators, yet prior works demonstrate that these LLM judges often lack consistency in scoring when the prompt is altered. However, the", "arxiv_id": "2601.03444", "doi": "10.48550/arXiv.2601.03444"}
+{"id": "ella-equip-diffusion-2024", "title": "ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment", "authors": ["Xiwei Hu", "Rui Wang", "Yixiao Fang", "Bin Fu", "Pei Cheng"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.05135", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ab", "arxiv_id": "2403.05135", "doi": "10.48550/arXiv.2403.05135"}
+{"id": "improving-llm-general-2025", "title": "Improving LLM General Preference Alignment via Optimistic Online Mirror Descent", "authors": ["Yuheng Zhang", "Dian Yu", "Tao Ge", "Linfeng Song", "Zhichen Zeng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.16852", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences. Many existing alignment approaches rely on ", "arxiv_id": "2502.16852", "doi": "10.48550/arXiv.2502.16852"}
+{"id": "improving-llm-safety-2025", "title": "Improving LLM Safety Alignment with Dual-Objective Optimization", "authors": ["Xuandong Zhao", "Will Cai", "Tianneng Shi", "David Huang", "Licong Lin"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2503.03710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. Direct preference optimization (DPO), a widely deployed alignment method, ex", "arxiv_id": "2503.03710", "doi": "10.48550/arXiv.2503.03710"}
+{"id": "evaluation-cultural-value-2025", "title": "An Evaluation of Cultural Value Alignment in LLM", "authors": ["Nicholas Sukiennik", "Chen Gao", "Fengli Xu", "Yong Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.08863", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs as intelligent agents are being increasingly applied in scenarios where human interactions are involved, leading to a critical concern about whether LLMs are faithful to the variations in culture", "arxiv_id": "2504.08863", "doi": "10.48550/arXiv.2504.08863"}
+{"id": "layeraware-representation-filtering-2025", "title": "Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment", "authors": ["Hao Li", "Lijun Li", "Zhenghao Lu", "Xian Wei", "Rui Li"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2507.18631", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With rapid advancement and increasing accessibility of LLMs, fine-tuning aligned models has become a critical step for adapting them to real-world applications, which makes the safety of this fine-tun", "arxiv_id": "2507.18631", "doi": "10.48550/arXiv.2507.18631"}
+{"id": "saro-enhancing-llm-2025", "title": "SaRO: Enhancing LLM Safety through Reasoning-based Alignment", "authors": ["Yutao Mou", "Yuxiao Luo", "Shikun Zhang", "Wei Ye"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.09420", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current safety alignment techniques for large language models (LLMs) face two key challenges: (1) under-generalization, which leaves models vulnerable to novel jailbreak attacks, and (2) over-alignmen", "arxiv_id": "2504.09420", "doi": "10.48550/arXiv.2504.09420"}
+{"id": "osc-cognitive-orchestration-2025", "title": "OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration", "authors": ["Jusheng Zhang", "Yijia Fan", "Kaitong Cai", "Xiaofei Sun", "Keze Wang"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2509.04876", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces OSC (Orchestrating Cognitive Synergy), a knowledge-aware adaptive collaboration framework designed to enhance cognitive synergy in multi-agent systems with large language models.", "arxiv_id": "2509.04876", "doi": "10.48550/arXiv.2509.04876"}
+{"id": "refining-input-guardrails-2025", "title": "Refining Input Guardrails: Enhancing LLM-as-a-Judge Efficiency Through Chain-of-Thought Fine-Tuning and Alignment", "authors": ["Melissa Kazemi Rad", "H. Nghiem", "Andy Luo", "Sahil Wadhwa", "M. Sorower"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.13080", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated powerful capabilities that render them valuable in different applications, including conversational AI products. It is paramount to ensure the security a", "arxiv_id": "2501.13080", "doi": "10.48550/arXiv.2501.13080"}
+{"id": "following-autoregressive-nature-2025", "title": "Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment", "authors": ["Jingcheng Deng", "Zhongtao Jiang", "Liang Pang", "Liwei Chen", "Kun Xu"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2502.11401", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A new trend uses LLMs as dense text encoders via contrastive learning. However, since LLM embeddings predict the probability distribution of the next token, they are inherently generative and distribu", "arxiv_id": "2502.11401", "doi": "10.48550/arXiv.2502.11401"}
+{"id": "alleviating-fear-losing-2025", "title": "Alleviating the Fear of Losing Alignment in LLM Fine-tuning", "authors": ["Kang Yang", "Guanhong Tao", "Xun Chen", "Jun Xu"], "year": 2025, "venue": "IEEE Symposium on Security and Privacy", "source_url": "https://arxiv.org/abs/2504.09757", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated revolutionary capabilities in understanding complex contexts and performing a wide range of tasks. However, LLMs can also answer questions that are uneth", "arxiv_id": "2504.09757", "doi": "10.1109/SP61157.2025.00171"}
+{"id": "multiple-llm-agents-2025", "title": "Multiple LLM Agents Debate for Equitable Cultural Alignment", "authors": ["Dayeon Ki", "Rachel Rudinger", "Tianyi Zhou", "Marine Carpuat"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2505.24671", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) need to adapt their predictions to diverse cultural contexts to benefit diverse communities across the world. While previous efforts have focused on single-LLM, single-tur", "arxiv_id": "2505.24671", "doi": "10.48550/arXiv.2505.24671"}
+{"id": "epistemic-alignment-mediating-2025", "title": "Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery", "authors": ["Nicholas Clark", "Hua Shen", "Bill Howe", "Tanushree Mitra"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.01205", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs increasingly serve as tools for knowledge acquisition, yet users cannot effectively specify how they want information presented. When users request that LLMs\"cite reputable sources,\"\"express appr", "arxiv_id": "2504.01205", "doi": "10.48550/arXiv.2504.01205"}
+{"id": "feedbacktotext-alignment-llm-2025", "title": "Feedback-to-Text Alignment: LLM Learning Consistent Natural Language Generation from User Ratings and Loyalty Data", "authors": ["Zhenyu Gao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICAIDE65466.2025.11189700", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large-scale pre-trained language models (LLMs) continue to deliver groundbreaking advances in natural language generation (NLG), a persistent challenge remains: how to reliably align model outputs ", "doi": "10.1109/ICAIDE65466.2025.11189700"}
+{"id": "metarewarding-language-models-2024", "title": "Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge", "authors": ["Tianhao Wu", "Weizhe Yuan", "Olga Golovneva", "Jing Xu", "Yuandong Tian"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2407.19594", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan et a", "arxiv_id": "2407.19594", "doi": "10.48550/arXiv.2407.19594"}
+{"id": "timecma-llmempowered-multivariate-2024", "title": "TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment", "authors": ["Chenxi Liu", "Qianxiong Xu", "Hao Miao", "Sun Yang", "Lingzheng Zhang"], "year": 2024, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2406.01638", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multivariate time series forecasting (MTSF) aims to learn temporal dynamics among variables to forecast future time series. Existing statistical and deep learning-based methods suffer from limited lea", "arxiv_id": "2406.01638", "doi": "10.1609/aaai.v39i18.34067"}
+{"id": "modular-pluralism-pluralistic-2024", "title": "Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration", "authors": ["Shangbin Feng", "Taylor Sorensen", "Yuhan Liu", "Jillian R. Fisher", "Chan Young Park"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.15951", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across culture", "arxiv_id": "2406.15951", "doi": "10.48550/arXiv.2406.15951"}
+{"id": "how-alignment-jailbreak-2024", "title": "How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States", "authors": ["Zhenhong Zhou", "Haiyang Yu", "Xinghua Zhang", "Rongwu Xu", "Fei Huang"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.05644", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) rely on safety alignment to avoid responding to malicious user inputs. Unfortunately, jailbreak can circumvent safety guardrails, resulting in LLMs generating harmful cont", "arxiv_id": "2406.05644", "doi": "10.48550/arXiv.2406.05644"}
+{"id": "idgenrec-llmrecsys-alignment-2024", "title": "IDGenRec: LLM-RecSys Alignment with Textual ID Learning", "authors": ["Juntao Tan", "Shuyuan Xu", "Wenyue Hua", "Yingqiang Ge", "Zelong Li"], "year": 2024, "venue": "Annual International ACM SIGIR Conference on Research and Development in Information Retrieval", "source_url": "https://arxiv.org/abs/2403.19021", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based Generative recommendation has attracted significant attention. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current generative recommendation appro", "arxiv_id": "2403.19021", "doi": "10.1145/3626772.3657821"}
+{"id": "timecma-llmempowered-time-2024", "title": "TimeCMA: Towards LLM-Empowered Time Series Forecasting via Cross-Modality Alignment", "authors": ["Chenxi Liu", "Qianxiong Xu", "Hao Miao", "Sun Yang", "Lingzheng Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2406.01638", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2406.01638"}
+{"id": "llms-as-zeroshot-2024", "title": "LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings", "authors": ["Duo Wang", "Y. Zuo", "Fengzhi Li", "Junjie Wu"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2408.14512", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learni", "arxiv_id": "2408.14512", "doi": "10.48550/arXiv.2408.14512"}
+{"id": "understand-what-llm-2024", "title": "Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation", "authors": ["Guanting Dong", "Yutao Zhu", "Chenghao Zhang", "Zechen Wang", "Zhicheng Dou"], "year": 2024, "venue": "The Web Conference", "source_url": "https://arxiv.org/abs/2406.18676", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) has effectively mitigated the hallucination problem of large language models (LLMs). However, the difficulty of aligning the retriever with the LLMs' diverse knowl", "arxiv_id": "2406.18676", "doi": "10.1145/3696410.3714717"}
+{"id": "avicuna-audiovisual-llm-2024", "title": "AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue", "authors": ["Yunlong Tang", "Daiki Shimada", "Jing Bi", "Chenliang Xu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2403.16276", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2403.16276"}
+{"id": "humaninstructionfree-llm-selfalignment-2024", "title": "Human-Instruction-Free LLM Self-Alignment with Limited Samples", "authors": ["Hongyi Guo", "Yuanshun Yao", "Wei Shen", "Jiaheng Wei", "Xiaoying Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2401.06785", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aligning large language models (LLMs) with human values is a vital task for LLM practitioners. Current alignment techniques have several limitations: (1) requiring a large amount of annotated data; (2", "arxiv_id": "2401.06785", "doi": "10.48550/arXiv.2401.06785"}
+{"id": "beavertails-improved-safety-2023", "title": "BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset", "authors": ["Jiaming Ji", "Mickel Liu", "Juntao Dai", "Xuehai Pan", "Chi Zhang"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2307.04657", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we introduce the \\textsc{BeaverTails} dataset, aimed at fostering research on safety alignment in large language models (LLMs). This dataset uniquely separates annotations of helpfulnes", "arxiv_id": "2307.04657", "doi": "10.48550/arXiv.2307.04657"}
+{"id": "user-feedback-alignment-2025", "title": "User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems", "authors": ["Jianling Wang", "Yifan Liu", "Yinghao Sun", "Xuejian Ma", "Yueqi Wang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2504.05522", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems due to feedback loops and limited signals on user explora", "arxiv_id": "2504.05522", "doi": "10.48550/arXiv.2504.05522"}
+{"id": "advancing-llm-safe-2025", "title": "Advancing LLM Safe Alignment with Safety Representation Ranking", "authors": ["Tianqi Du", "Zeming Wei", "Quan Chen", "Chenheng Zhang", "Yisen Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.15710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) has demonstrated milestone success in a variety of tasks, yet their potential for generating harmful content has raised significant safety concern", "arxiv_id": "2505.15710", "doi": "10.48550/arXiv.2505.15710"}
+{"id": "probing-emergence-crosslingual-2024", "title": "Probing the Emergence of Cross-lingual Alignment during LLM Training", "authors": ["Hetong Wang", "Pasquale Minervini", "E. Ponti"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.13229", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance. We speculate that this is predicated on their ability to align languages without ex", "arxiv_id": "2406.13229", "doi": "10.48550/arXiv.2406.13229"}
+{"id": "goal-alignment-llmbased-2025", "title": "Goal Alignment in LLM-Based User Simulators for Conversational AI", "authors": ["Shuhaib Mehri", "Xiaocheng Yang", "Takyoung Kim", "Gokhan Tur", "Shikib Mehri"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.20152", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: User simulators are essential to conversational AI, enabling scalable agent development and evaluation through simulated interactions. While current Large Language Models (LLMs) have advanced user sim", "arxiv_id": "2507.20152", "doi": "10.48550/arXiv.2507.20152"}
+{"id": "llm-agents-interaction-2024", "title": "LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models", "authors": ["Ivar Frisch", "Mario Giulianelli"], "year": 2024, "venue": "PERSONALIZE", "source_url": "https://arxiv.org/abs/2402.02896", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agent interaction has long been a key topic in psychology, philosophy, and artificial intelligence, and it is now gaining traction in large language model (LLM) research. This experimental study seeks", "arxiv_id": "2402.02896", "doi": "10.48550/arXiv.2402.02896"}
+{"id": "style-outweighs-substance-2024", "title": "Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking", "authors": ["Ben Feuer", "Micah Goldblum", "Teresa Datta", "Sanjana Nambiar", "Raz Besaleli"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2409.15268", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The release of ChatGPT in November 2022 sparked an explosion of interest in post-training and an avalanche of new preference optimization (PO) methods. These methods claim superior alignment by virtue", "arxiv_id": "2409.15268", "doi": "10.48550/arXiv.2409.15268"}
+{"id": "behavior-alignment-new-2024", "title": "Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems", "authors": ["Dayu Yang", "F. Chen", "Hui Fang"], "year": 2024, "venue": "Annual International ACM SIGIR Conference on Research and Development in Information Retrieval", "source_url": "https://arxiv.org/abs/2404.11773", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior betwee", "arxiv_id": "2404.11773", "doi": "10.1145/3626772.3657924"}
+{"id": "moral-alignment-llm-2024", "title": "Moral Alignment for LLM Agents", "authors": ["Elizaveta Tennant", "Stephen Hailes", "Mirco Musolesi"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2410.01639", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Decision-making agents based on pre-trained Large Language Models (LLMs) are increasingly being deployed across various domains of human activity. While their applications are currently rather special", "arxiv_id": "2410.01639", "doi": "10.48550/arXiv.2410.01639"}
+{"id": "improving-robustness-llmbased-2024", "title": "Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment", "authors": ["Paarth Neekhara", "Shehzeen Samarah Hussain", "Subhankar Ghosh", "Jason Li", "Rafael Valle"], "year": 2024, "venue": "Interspeech", "source_url": "https://arxiv.org/abs/2406.17957", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-b", "arxiv_id": "2406.17957", "doi": "10.48550/arXiv.2406.17957"}
+{"id": "caredio-cultural-alignment-2025", "title": "CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization", "authors": ["Jing Yao", "Xiaoyuan Yi", "Jindong Wang", "Zhicheng Dou", "Xing Xie"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.08820", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models (LLMs) more deeply integrate into human life across various regions, aligning them with pluralistic cultures is crucial for improving user experience and mitigating cultural c", "arxiv_id": "2504.08820", "doi": "10.48550/arXiv.2504.08820"}
+{"id": "automating-deception-scalable-2025", "title": "Automating Deception: Scalable Multi-Turn LLM Jailbreaks", "authors": ["Adarsh Kumarappan", "Ananya Mujoo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19517", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-turn conversational attacks, which leverage psychological principles like Foot-in-the-Door (FITD), where a small initial request paves the way for a more significant one, to bypass safety alignm", "arxiv_id": "2511.19517", "doi": "10.48550/arXiv.2511.19517"}
+{"id": "isheep-selfalignment-llm-2024", "title": "I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm", "authors": ["Yiming Liang", "Ge Zhang", "Xingwei Qu", "Tianyu Zheng", "Jiawei Guo"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.08072", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learni", "arxiv_id": "2408.08072", "doi": "10.48550/arXiv.2408.08072"}
+{"id": "walle-world-alignment-2024", "title": "WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents", "authors": ["Siyu Zhou", "Tianyi Zhou", "Yijun Yang", "Guodong Long", "Deheng Ye"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.07484", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, o", "arxiv_id": "2410.07484", "doi": "10.48550/arXiv.2410.07484"}
+{"id": "fairmindsim-alignment-behavior-2024", "title": "FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas", "authors": ["Yu Lei", "Hao Liu", "Chengxing Xie", "Songjia Liu", "Zhiyu Yin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.10398", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI alignment is a pivotal issue concerning AI control and safety. It should consider not only value-neutral human preferences but also moral and ethical considerations. In this study, we introduced Fa", "arxiv_id": "2410.10398", "doi": "10.48550/arXiv.2410.10398"}
+{"id": "pedagogical-alignment-large-2024", "title": "Pedagogical Alignment of Large \nLanguage Models (LLM) for \nPersonalized Learning: A Survey, \nTrends and Challenges", "authors": ["Mahefa Abel Razafinirina", "William Germain Dimbisoa", "Thomas Mahatody"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.4236/jilsa.2024.164023", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.4236/jilsa.2024.164023"}
+{"id": "driving-style-alignment-2024", "title": "Driving Style Alignment for LLM-powered Driver Agent", "authors": ["Ruoxuan Yang", "Xinyu Zhang", "Anais Fernandez-Laaksonen", "Xin Ding", "Jiangtao Gong"], "year": 2024, "venue": "IEEE/RJS International Conference on Intelligent RObots and Systems", "source_url": "https://arxiv.org/abs/2403.11368", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, LLM-powered driver agents have demonstrated considerable potential in the field of autonomous driving, showcasing human-like reasoning and decision-making abilities. However, current researc", "arxiv_id": "2403.11368", "doi": "10.1109/IROS58592.2024.10802629"}
+{"id": "annotation-alignment-comparing-2024", "title": "Annotation alignment: Comparing LLM and human annotations of conversational safety", "authors": ["Rajiv Movva", "Pang Wei Koh", "Emma Pierson"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.06369", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Do LLMs align with human perceptions of safety? We study this question via *annotation alignment*, the extent to which LLMs and humans agree when annotating the safety of user-chatbot conversations. W", "arxiv_id": "2406.06369", "doi": "10.48550/arXiv.2406.06369"}
+{"id": "multilingual-blending-llm-2024", "title": "Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture", "authors": ["Jiayang Song", "Yuheng Huang", "Zhehua Zhou", "Lei Ma"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.07342", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and alignin", "arxiv_id": "2407.07342", "doi": "10.48550/arXiv.2407.07342"}
+{"id": "coprompter-usercentric-evaluation-2024", "title": "CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering", "authors": ["Ishika Joshi", "Simra Shahid", "S. Venneti", "Manushree Vasu", "Yantao Zheng"], "year": 2024, "venue": "International Conference on Intelligent User Interfaces", "source_url": "https://arxiv.org/abs/2411.06099", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Ensuring large language models’ (LLMs) responses align with prompt instructions is crucial for application development. Based on our formative study with industry professionals, the alignment requires", "arxiv_id": "2411.06099", "doi": "10.1145/3708359.3712102"}
+{"id": "llmalign-utilizing-large-2024", "title": "LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs", "authors": ["Xuan Chen", "Tongyu Lu", "Zhichun Wang"], "year": 2024, "venue": "Data Intelligence", "source_url": "https://arxiv.org/abs/2412.04690", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Entity Alignment (EA) seeks to identify and match corresponding entities across different Knowledge Graphs (KGs), playing a crucial role in knowledge fusion and integration. Embedding-based entity ali", "arxiv_id": "2412.04690", "doi": "10.48550/arXiv.2412.04690"}
+{"id": "llm-theory-mind-2024", "title": "LLM Theory of Mind and Alignment: Opportunities and Risks", "authors": ["Winnie Street"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.08154", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are transforming human-computer interaction and conceptions of artificial intelligence (AI) with their impressive capacities for conversing and reasoning in natural langua", "arxiv_id": "2405.08154", "doi": "10.48550/arXiv.2405.08154"}
+{"id": "adversarial-bug-reports-2025", "title": "Adversarial Bug Reports as a Security Risk in Language Model-Based Automated Program Repair", "authors": ["Piotr Przymus", "A. Happe", "Jürgen Cito"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.05372", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) - based Automated Program Repair (APR) systems are increasingly integrated into modern software development workflows, offering automated patches in response to natural lang", "arxiv_id": "2509.05372", "doi": "10.1145/3793302.3793352"}
+{"id": "critical-review-large-2023", "title": "A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair", "authors": ["Quanjun Zhang", "Tongke Zhang", "Juan Zhai", "Chunrong Fang", "Bo-Chen Yu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.08879", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), ", "arxiv_id": "2310.08879", "doi": "10.48550/arXiv.2310.08879"}
+{"id": "repair-ingredients-all-2025", "title": "Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search", "authors": ["Jiayi Zhang", "Kai Huang", "Jian Zhang", "Yang Liu", "Chunyang Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.23100", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) techniques aim to automatically fix buggy programs. Among these, Large Language Model-based (LLM-based) approaches have shown great promise. Recent advances demonstrate ", "arxiv_id": "2506.23100", "doi": "10.48550/arXiv.2506.23100"}
+{"id": "empirical-evaluation-large-2025", "title": "Empirical Evaluation of Large Language Models in Automated Program Repair", "authors": ["Jiajun Sun", "Fengjie Li", "Xin Qi", "Hongyu Zhang", "Jiajun Jiang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.13186", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing prevalence of software bugs has made automated program repair (APR) a key research focus. Large language models (LLMs) offer new opportunities for APR, but existing studies mostly rely ", "arxiv_id": "2506.13186", "doi": "10.48550/arXiv.2506.13186"}
+{"id": "leveraging-searchbased-pretrained-2025", "title": "Leveraging Search-Based and Pre-Trained Code Language Models for Automated Program Repair", "authors": ["Oebele Lijzenga", "Iman Hemati Moghadam", "Vadim Zaytsev"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3672608.3707774", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background. Automated Program Repair (APR) techniques often face challenges in navigating vast search space of possible patches and often rely on redundancy-based assumptions, which can restrict the d", "doi": "10.1145/3672608.3707774"}
+{"id": "comparative-analysis-pretrained-2025", "title": "Comparative Analysis of Pre-trained Code Language Models for Automated Program Repair via Code Infill Generation", "authors": ["Iman Hemati Moghadam", "Oebele Lijzenga", "Vadim Zaytsev"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3742876.3742881", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1145/3742876.3742881"}
+{"id": "automated-program-repair-2025", "title": "Automated Program Repair Based on REST API Specifications Using Large Language Models", "authors": ["Katsuki Yamagishi", "Norihiro Yoshida", "Erina Makihara", "Katsuro Inoue"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.25148", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many cloud services provide REST API accessible to client applications. However, developers often identify specification violations only during testing, as error messages typically lack the detail nec", "arxiv_id": "2510.25148", "doi": "10.48550/arXiv.2510.25148"}
+{"id": "revisiting-evolutionary-program-2024", "title": "Revisiting Evolutionary Program Repair via Code Language Model", "authors": ["Yunan Wang", "Tingyu Guo", "Zilong Huang", "Yuan Yuan"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.10486", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of", "arxiv_id": "2408.10486", "doi": "10.48550/arXiv.2408.10486"}
+{"id": "agentic-bug-reproduction-2025", "title": "Agentic Bug Reproduction for Effective Automated Program Repair at Google", "authors": ["Runxiang Cheng", "Michele Tufano", "Jurgen Cito", "J. Cambronero", "Pat Rondon"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.01821", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bug reports often lack sufficient detail for developers to reproduce and fix the underlying defects. Bug Reproduction Tests (BRTs), tests that fail when the bug is present and pass when it has been re", "arxiv_id": "2502.01821", "doi": "10.48550/arXiv.2502.01821"}
+{"id": "rgfl-reasoning-guided-2026", "title": "RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models", "authors": ["Melika Sepidband", "Hamed Taherkhani", "Hung Viet Pham", "Hadi Hemmati"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.18044", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fault Localization (FL) is a critical step in Automated Program Repair (APR), and its importance has increased with the rise of Large Language Model (LLM)-based repair agents. In realistic project-lev", "arxiv_id": "2601.18044", "doi": "10.48550/arXiv.2601.18044"}
+{"id": "autostructor-generative-aibased-2025", "title": "AutoStructor: A Generative AI-Based Framework for Automated Program Repair with Deep Learning-Guided Fault Localization", "authors": ["Hasen Özaytürk", "F. Buzluca"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/UBMK67458.2025.11206755", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current automated program repair (APR) approaches still suffer from two critical limitations: inaccurate fault localization and ineffective patch generation. This paper presents a novel hybrid framewo", "doi": "10.1109/UBMK67458.2025.11206755"}
+{"id": "adapting-knowledge-prompt-2025", "title": "Adapting Knowledge Prompt Tuning for Enhanced Automated Program Repair", "authors": ["Xuemeng Cai", "Lingxiao Jiang"], "year": 2025, "venue": "IEEE International Conference on Software Analysis, Evolution, and Reengineering", "source_url": "https://arxiv.org/abs/2504.01523", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) aims to enhance software reliability by automatically generating bug-fixing patches. Recent work has improved the state-of-the-art of APR by fine-tuning pre-trained larg", "arxiv_id": "2504.01523", "doi": "10.1109/SANER64311.2025.00041"}
+{"id": "repair-automated-program-2024", "title": "RePair: Automated Program Repair with Process-based Feedback", "authors": ["Yuze Zhao", "Zhenya Huang", "Yixiao Ma", "Rui Li", "Kai Zhang"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2408.11296", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable pro", "arxiv_id": "2408.11296", "doi": "10.18653/v1/2024.findings-acl.973"}
+{"id": "exploring-lifting-robustness-2024", "title": "Exploring and Lifting the Robustness of LLM-powered Automated Program Repair with Metamorphic Testing", "authors": ["Pengyu Xue", "Linhao Wu", "Zhen Yang", "Zhongxing Yu", "Zhi Jin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.07516", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, Large language model-powered Automated Program Repair (LAPR) techniques have achieved state-of-the-art bug-fixing performance and have been pervasively applied and studied in both ind", "arxiv_id": "2410.07516", "doi": "10.48550/arXiv.2410.07516"}
+{"id": "mergerepair-exploratory-study-2024", "title": "MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair", "authors": ["Meghdad Dehghan", "J. Wu", "Fatemeh H. Fard", "Ali Ouni"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.09568", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown high capabilities in several software development-related tasks such as program repair, documentation, code refactoring, debugging, and testing. However, traini", "arxiv_id": "2408.09568", "doi": "10.48550/arXiv.2408.09568"}
+{"id": "specification-vibing-automated-2026", "title": "Specification Vibing for Automated Program Repair", "authors": ["Taohong Zhu", "Lucas C. Cordeiro", "Mustafa A. Mustafa", "Youcheng Sun"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.08263", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-driven automated program repair (APR) has advanced rapidly, but most methods remain code-centric: they directly rewrite source code and thereby risk hallucinated, behavioral", "arxiv_id": "2602.08263"}
+{"id": "svrepair-structured-visual-2026", "title": "SVRepair: Structured Visual Reasoning for Automated Program Repair", "authors": ["Xiaoxuan Tang", "Jincheng Wang", "L. Luo", "Jingxuan Xu", "Sheng Zhou"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.06090", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic signals cont", "arxiv_id": "2602.06090"}
+{"id": "lmfuzz-program-repair-2025", "title": "LMFuzz: Program repair fuzzing based on large language models", "authors": ["Renze Lin", "Ran Wang", "Guanghuan Hu", "Xianghua Xu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10515-025-00568-8", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10515-025-00568-8"}
+{"id": "defects4c-benchmarking-large-2025", "title": "Defects4C: Benchmarking Large Language Model Repair Capability with C/C++ Bugs", "authors": ["Jian Wang", "Xiaofei Xie", "Qiang Hu", "Shangqing Liu", "Jiongchi Yu"], "year": 2025, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2510.11059", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) plays a critical role in enhancing the quality and reliability of software systems. While substantial progress has been made in Java-based APR, largely facilitated by be", "arxiv_id": "2510.11059", "doi": "10.1109/ASE63991.2025.00029"}
+{"id": "accelerating-automatic-program-2025", "title": "Accelerating Automatic Program Repair with Dual Retrieval-Augmented Fine-Tuning and Patch Generation on Large Language Models", "authors": ["Hanyang Guo", "Xiaoheng Xie", "Hong-ning Dai", "Peng Di", "Yu Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.10103", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) is essential for ensuring software reliability and quality while enhancing efficiency and reducing developers'workload. Although rule-based and learning-based APR method", "arxiv_id": "2507.10103", "doi": "10.48550/arXiv.2507.10103"}
+{"id": "improving-automated-program-2022", "title": "Improving Automated Program Repair with Domain Adaptation", "authors": ["Armin Zirak", "Hadi Hemmati"], "year": 2022, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2212.11414", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-t", "arxiv_id": "2212.11414", "doi": "10.1145/3631972"}
+{"id": "bugtransformer-automated-program-2022", "title": "Bug-Transformer: Automated Program Repair Using Attention-Based Deep Neural Network", "authors": ["Jie Yao", "Bingbing Rao", "Weiwei Xing", "Liqiang Wang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1142/s0218126622502103", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we propose a novel transformer-based deep neural network model to learn semantic bug patterns from a corpus of buggy/fixed codes, then generate correct ones automatically. Transformer i", "doi": "10.1142/s0218126622502103"}
+{"id": "deep-dive-into-2024-2", "title": "A Deep Dive into Large Language Models for Automated Bug Localization and Repair", "authors": ["Soneya Binta Hossain", "Nan Jiang", "Qiang Zhou", "Xiaopeng Li", "Wen-Hao Chiang"], "year": 2024, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2404.11595", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug ", "arxiv_id": "2404.11595", "doi": "10.1145/3660773"}
+{"id": "explainable-automated-debugging-2023", "title": "Explainable automated debugging via large language model-driven scientific debugging", "authors": ["Sungmin Kang", "B. Chen", "S. Yoo", "Jian-Guang Lou"], "year": 2023, "venue": "Empirical Software Engineering", "source_url": "https://arxiv.org/abs/2304.02195", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated debugging techniques have the potential to reduce developer effort in debugging. However, while developers want rationales for the provided automatic debugging results, existing techniques a", "arxiv_id": "2304.02195", "doi": "10.1007/s10664-024-10594-x"}
+{"id": "codetranfix-neural-machine-2025", "title": "CodeTranFix: A Neural Machine Translation Approach for Context-Aware Java Program Repair with CodeBERT", "authors": ["Yiwei Lu", "Shuxia Ye", "Liang Qi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/app15073632", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) plays a vital role in enhancing software quality and reducing developer maintenance efforts. Neural Machine Translation (NMT)-based methods demonstrate notable potential", "doi": "10.3390/app15073632"}
+{"id": "attention-pruning-automated-2025", "title": "Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing", "authors": ["Vishnu Asutosh Dasu", "Md. Rafi Ur Rashid", "Vipul Gupta", "Saeid Tizpaz-Niari", "Gang Tan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.15815", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs). Modern AI systems such as LLMs are expanding into sensitive social contexts wh", "arxiv_id": "2503.15815", "doi": "10.48550/arXiv.2503.15815"}
+{"id": "right-prompts-job-2023", "title": "The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model", "authors": ["Zelin Zhao", "Zhaogui Xu", "Jialong Zhu", "Peng Di", "Yuan Yao"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2312.17485", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and", "arxiv_id": "2312.17485", "doi": "10.48550/arXiv.2312.17485"}
+{"id": "automated-repair-c-2025", "title": "Automated Repair of C Programs Using Large Language Models", "authors": ["Mahdi Farzandway", "Fatemeh Ghassemi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.01947", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study explores the potential of Large Language Models (LLMs) in automating the repair of C programs. We present a framework that integrates spectrum-based fault localization (SBFL), runtime feedb", "arxiv_id": "2509.01947", "doi": "10.48550/arXiv.2509.01947"}
+{"id": "compass-contrastive-learning-2026", "title": "ComPass: Contrastive Learning for Automated Patch Correctness Assessment in Program Repair", "authors": ["Quanjun Zhang", "Ye Shang", "Haichuan Hu", "Chunrong Fang", "Zhenyu Chen"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.07561", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) attempts to reduce manual debugging efforts and plays a vital role in software maintenance. Despite remarkable progress, APR is still limited in generating overfitting p", "arxiv_id": "2602.07561"}
+{"id": "automated-repair-ai-2024", "title": "Automated Repair of AI Code with Large Language Models and Formal Verification", "authors": ["Yiannis Charalambous", "Edoardo Manino", "Lucas C. Cordeiro"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.08848", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The next generation of AI systems requires strong safety guarantees. This report looks at the software implementation of neural networks and related memory safety properties, including NULL pointer de", "arxiv_id": "2405.08848", "doi": "10.48550/arXiv.2405.08848"}
+{"id": "peft-multiline-complex-2025", "title": "Peft: Multiline Complex Patch Correctness Assessment Based on Fine-Tuning Large Language Model with “Golden Data”", "authors": ["Xiaoxi Zheng", "Ruilian Zhao", "Junxia Guo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/QRS65678.2025.00024", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, automated program repair has garnered considerable attention owing to its potential to mitigate software maintenance costs. There remain challenges for automated program repair techni", "doi": "10.1109/QRS65678.2025.00024"}
+{"id": "repairllama-efficient-representations-2023", "title": "RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair", "authors": ["André Silva", "Sen Fang", "Monperrus Martin"], "year": 2023, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2312.15698", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which", "arxiv_id": "2312.15698", "doi": "10.1109/TSE.2025.3581062"}
+{"id": "leveraging-large-language-2024", "title": "Leveraging Large Language Model for Automatic Patch Correctness Assessment", "authors": ["Xin Zhou", "Bowen Xu", "Kisub Kim", "Donggyun Han", "Hung Nguyen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2024.3452252", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) techniques have shown more and more promising results in fixing real-world bugs. Despite the effectiveness, APR techniques still face an overfitting problem: a generated", "doi": "10.1109/TSE.2024.3452252"}
+{"id": "multidataset-evaluation-models-2025", "title": "A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair", "authors": ["Zanis Ali Khan", "Aayush Garg", "Qiang Tang"], "year": 2025, "venue": "ARES", "source_url": "https://arxiv.org/abs/2506.04987", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software vulnerabilities pose significant security threats, requiring effective mitigation. While Automated Program Repair (APR) has advanced in fixing general bugs, vulnerability patching, a security", "arxiv_id": "2506.04987", "doi": "10.48550/arXiv.2506.04987"}
+{"id": "repaircat-applying-large-2024", "title": "RepairCAT: Applying Large Language Model to Fix Bugs in AI-Generated Programs", "authors": ["Nan Jiang", "Yi Wu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643788.3648020", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair has been a crucial and popular domain for years, and with the development of large language models (LLMs) and the trend of using LLMs for code generation, there comes the new ", "doi": "10.1145/3643788.3648020"}
+{"id": "poster-repairing-bugs-2024", "title": "Poster: Repairing Bugs with the Introduction of New Variables: A Multi-Agent Large Language Model", "authors": ["Elisa Zhang", "Shiyu Sun", "Yunlong Xing", "Kun Sun"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3658644.3691412", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Trained on billions of tokens, large language models (LLMs) have a broad range of empirical knowledge which enables them to generate software patches with complex repair patterns. We leverage the powe", "doi": "10.1145/3658644.3691412"}
+{"id": "neural-program-repair-2023", "title": "Neural Program Repair with Program Dependence Analysis and Effective Filter Mechanism", "authors": ["Yuwei Zhang", "Ge Li", "Zhi Jin", "Ying Xing"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2305.09315", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair is a crucial task for improving the efficiency of software developers. Recently, neural-based techniques have demonstrated significant promise in generating correct patches fo", "arxiv_id": "2305.09315", "doi": "10.48550/arXiv.2305.09315"}
+{"id": "automated-program-improvement-2023", "title": "Automated program improvement with reinforcement learning and graph neural networks", "authors": ["Nataša Sukur", "Nemanja Milošević", "Doni Pracner", "Z. Budimac"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s00500-023-08559-1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s00500-023-08559-1"}
+{"id": "repaca-leveraging-reasoning-2025", "title": "RePaCA: Leveraging Reasoning Large Language Models for Static Automated Patch Correctness Assessment", "authors": ["Marcos Fuster-Pena", "David de-Fitero-Dominguez", "Antonio Garcia-Cabot", "Eva García-López"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.22580", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) seeks to automatically correct software bugs without requiring human intervention. However, existing tools tend to generate patches that satisfy test cases without fixin", "arxiv_id": "2507.22580", "doi": "10.48550/arXiv.2507.22580"}
+{"id": "investigating-large-language-2024", "title": "Investigating large language models capabilities for automatic code repair in Python", "authors": ["Safwan Omari", "Kshitiz Basnet", "Mohammad Wardat"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10586-024-04490-8", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10586-024-04490-8"}
+{"id": "evaluation-effectiveness-openais-2023", "title": "An Evaluation of the Effectiveness of OpenAI's ChatGPT for Automated Python Program Bug Fixing using QuixBugs", "authors": ["Marchel Christhoper Wuisang", "Marcel Kurniawan", "Komang Andika Wira Santosa", "Alexander Agung Santoso Gunawan", "Karen Etania Saputra"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/iSemantic59612.2023.10295323", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, the use of Artificial Intelligence (AI) has become increasingly common in various fields, including in software development. One such field is where AI can automatically detect and fi", "doi": "10.1109/iSemantic59612.2023.10295323"}
+{"id": "reliability-explainability-language-2023", "title": "On the Reliability and Explainability of Language Models for Program Generation", "authors": ["Yue Liu", "C. Tantithamthavorn", "Yonghui Liu", "Li Li"], "year": 2023, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2302.09587", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent studies have adopted pre-trained language models, such as CodeT5 and CodeGPT, for automated program generation tasks like code generation, repair, and translation. Numerous language model based", "arxiv_id": "2302.09587", "doi": "10.1145/3641540"}
+{"id": "parameterefficient-finetuning-attributed-2025", "title": "Parameter-Efficient Fine-Tuning with Attributed Patch Semantic Graph for Automated Patch Correctness Assessment", "authors": ["Zhen Yang", "Jingwen Wu", "Zhen Yang", "Zhongxing Yu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.02629", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) aims to automatically repair program errors without human intervention, and recent years have witnessed a growing interest on this research topic. While much progress ha", "arxiv_id": "2505.02629", "doi": "10.48550/arXiv.2505.02629"}
+{"id": "seeing-fixing-crossmodal-2025", "title": "Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Repair", "authors": ["Kai Huang", "Jian Zhang", "Xiaofei Xie", "Chunyang Chen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASE63991.2025.00100", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM)-based automated program repair (APR) techniques have shown promising results in resolving real-world github issue tasks. Existing APR systems are primarily evaluated in unim", "doi": "10.1109/ASE63991.2025.00100"}
+{"id": "tracing-errors-constructing-2025", "title": "Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval", "authors": ["Xiao Cheng", "Zhihao Guo", "Huan Huo", "Yulei Sui"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.18394", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Memory-related errors in C programming continue to pose significant challenges in software development, primarily due to the complexities of manual memory management inherent in the language. These er", "arxiv_id": "2506.18394", "doi": "10.48550/arXiv.2506.18394"}
+{"id": "repairr1-better-test-2025", "title": "Repair-R1: Better Test Before Repair", "authors": ["Haichuan Hu", "Xiaochen Xie", "Quanjun Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.22853", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: APR (Automated Program Repair) aims to automatically locate program defects, generate patches and validate the repairs. Existing techniques for APR are often combined with LLMs (Large Language Models)", "arxiv_id": "2507.22853", "doi": "10.48550/arXiv.2507.22853"}
+{"id": "endtoend-secure-code-2025", "title": "End-to-End Secure Code Repair with Context-Aware Anonymization and Isolated Agent Execution", "authors": ["Chao Wang", "Zan Zhou", "Chao Wang", "Yi Sun", "Shujie Yang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICBCTIS66509.2025.11387695", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, large language models (LLMs) have demonstrated remarkable capabilities in the field of automated program repair(APR). However, integrating LLM-based repair solutions into enterprise p", "doi": "10.1109/ICBCTIS66509.2025.11387695"}
+{"id": "leveraging-mutation-analysis-2026", "title": "Leveraging Mutation Analysis for LLM-based Repair of Quantum Programs", "authors": ["Chihiro Yoshida", "Yuta Ishimoto", "Olivier Nourry", "Masanari Kondo", "Makoto Matsushita"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.12273", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, Automated Program Repair (APR) techniques specifically designed for quantum programs have been proposed. However, existing approaches often suffer from low repair success rates or poo", "arxiv_id": "2601.12273", "doi": "10.48550/arXiv.2601.12273"}
+{"id": "invalidator-automated-patch-2023", "title": "Invalidator: Automated Patch Correctness Assessment Via Semantic and Syntactic Reasoning", "authors": ["Thanh Le-Cong", "Duc M. Luong", "X. Le", "David Lo", "Nhat-Hoa Tran"], "year": 2023, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2301.01113", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) faces the challenge of test overfitting, where generated patches pass validation tests but fail to generalize. Existing methods for patch assessment involve generating n", "arxiv_id": "2301.01113", "doi": "10.1109/TSE.2023.3255177"}
+{"id": "smelldetector-multilabel-code-2025", "title": "SmellDetector: Multi-Label Code Smell Detection and Refactoring with Large Language Models", "authors": ["Wenjie Liang", "Jiale Wang", "Hai-Tao Zheng", "Yinghui Li", "Haiye Lin"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IJCNN64981.2025.11227837", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in many tasks such as code generation and automated program repair. However, code LLMs have ignored another important task in pro", "doi": "10.1109/IJCNN64981.2025.11227837"}
+{"id": "extracting-fix-ingredients-2025", "title": "Extracting Fix Ingredients using Language Models", "authors": ["Julian Aron Prenner", "Romain Robbes"], "year": 2025, "venue": "2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)", "source_url": "https://arxiv.org/abs/2503.04214", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep learning and language models are increasingly dominating automated program repair research. While previous generate-and-validate approaches were able to find and use fix ingredients on a file or ", "arxiv_id": "2503.04214", "doi": "10.1109/Forge66646.2025.00028"}
+{"id": "large-language-models-2025-4", "title": "Large Language Models for Fault Localization: An Empirical Study", "authors": ["YingJian Xiao", "Rongqun Hu", "Weiwei Gong", "Hongwei Li", "AnQuan Jie"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.20521", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, particularly in automated program repair. However, the effectiveness of such repairs is highly dependent o", "arxiv_id": "2510.20521", "doi": "10.48550/arXiv.2510.20521"}
+{"id": "program-synthesis-dataset-2025", "title": "A Program Synthesis Dataset for LLM Temperature Analysis", "authors": ["Zoltán Ságodi", "István Kolláth", "Péter Hegedűs", "Rudolf Ferenc"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3625443", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) play an increasingly critical role in software engineering research, aiding tasks such as program synthesis, automated program repair, and test case generation. While exte", "doi": "10.1109/ACCESS.2025.3625443"}
+{"id": "improving-patch-correctness-2024", "title": "Improving Patch Correctness Analysis via Random Testing and Large Language Models", "authors": ["Facundo Molina", "Juan Manuel Copia", "Alessandra Gorla"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICST60714.2024.00036", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Patch correctness assessment represents a crucial step in the patch validation process, with the potential to enhance the practical adoption of automated program repair (APR) techniques and substantia", "doi": "10.1109/ICST60714.2024.00036"}
+{"id": "detectlocalizerepair-unified-framework-2022", "title": "Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5", "authors": ["Nghi D. Q. Bui", "Yue Wang", "Steven C. H. Hoi"], "year": 2022, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2211.14875", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated software debugging is a crucial task for improving the productivity of software developers. Many neural-based techniques have been proven effective for debugging-related tasks such as bug lo", "arxiv_id": "2211.14875", "doi": "10.48550/arXiv.2211.14875"}
+{"id": "repairing-bugs-python-2022", "title": "Repairing Bugs in Python Assignments Using Large Language Models", "authors": ["Jialu Zhang", "J. Cambronero", "Sumit Gulwani", "Vu Le", "R. Piskac"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2209.14876", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Students often make mistakes on their introductory programming assignments as part of their learning process. Unfortunately, providing custom repairs for these mistakes can require a substantial amoun", "arxiv_id": "2209.14876", "doi": "10.48550/arXiv.2209.14876"}
+{"id": "large-language-models-2023", "title": "Using Large Language Models for Bug Localization and Fixing", "authors": ["Do Viet Tung", "K. Markov"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/iCAST57874.2023.10359304", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As part of their learning journey, students frequently encounter challenges and make errors, especially with algorithmic programming questions. Regrettably, providing tailored solutions for these mist", "doi": "10.1109/iCAST57874.2023.10359304"}
+{"id": "glad-neural-predicate-2022", "title": "GLAD: Neural Predicate Synthesis to Repair Omission Faults", "authors": ["Sungmin Kang", "S. Yoo"], "year": 2022, "venue": "2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)", "source_url": "https://arxiv.org/abs/2204.06771", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing template and learning-based Automated Program Repair (APR) tools have successfully found patches for many benchmark faults. However, our analysis of existing results shows that omission fault", "arxiv_id": "2204.06771", "doi": "10.1109/ICSE-Companion58688.2023.00087"}
+{"id": "understanding-software-engineering-2025", "title": "Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories", "authors": ["Islem Bouzenia", "Michael Pradel"], "year": 2025, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2506.18824", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-based agents are increasingly employed to automate complex software engineering tasks, such as program repair and issue resolution. These agents operate by autonomously gene", "arxiv_id": "2506.18824", "doi": "10.1109/ASE63991.2025.00234"}
+{"id": "pydex-repairing-bugs-2024", "title": "PyDex: Repairing Bugs in Introductory Python Assignments using LLMs", "authors": ["Jialu Zhang", "J. Cambronero", "Sumit Gulwani", "Vu Le", "R. Piskac"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3649850", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Students often make mistakes in their introductory programming assignments as part of their learning process. Unfortunately, providing custom repairs for these mistakes can require a substantial amoun", "doi": "10.1145/3649850"}
+{"id": "seeing-fixing-crossmodal-2025-2", "title": "Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing", "authors": ["Kai Huang", "Jian Zhang", "Xiaofei Xie", "Chunyang Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.16136", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model-(LLM) based automated program repair (APR) techniques have shown promising results in resolving real-world GitHub issue tasks. Existing APR systems are primarily evaluated in unim", "arxiv_id": "2506.16136", "doi": "10.48550/arXiv.2506.16136"}
+{"id": "llmbscvm-llmbased-blockchain-2025", "title": "LLM-BSCVM: An LLM-Based Blockchain Smart Contract Vulnerability Management Framework", "authors": ["Yanli Jin", "Chunpei Li", "Peng Fan", "Peng Liu", "Xianxian Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.17416", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Smart contracts are a key component of the Web 3.0 ecosystem, widely applied in blockchain services and decentralized applications. However, the automated execution feature of smart contracts makes th", "arxiv_id": "2505.17416", "doi": "10.48550/arXiv.2505.17416"}
+{"id": "finding-trojan-triggers-2025", "title": "Finding Trojan Triggers in Code LLMs: An Occlusion-Based Human-in-the-Loop Approach", "authors": ["Aftab Hussain", "Md Rafiqul Islam Rabin", "Toufique Ahmed", "Mohammad Amin Alipour", "Bowen Xu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CAIN66642.2025.00050", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs), e.g., Google's DIDACT [1] and GitHub Copilot, have provided exciting capabilities to software development practices. Automated code generation, code review, vulnerability", "doi": "10.1109/CAIN66642.2025.00050"}
+{"id": "synthetic-code-surgery-2025", "title": "Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data", "authors": ["David de-Fitero-Dominguez", "Antonio Garcia-Cabot", "Eva García-López"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.07372", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a novel methodology for enhancing Automated Program Repair (APR) through synthetic data generation utilizing Large Language Models (LLMs). Current APR systems are constrained by th", "arxiv_id": "2505.07372", "doi": "10.48550/arXiv.2505.07372"}
+{"id": "basics-binary-analysis-2025", "title": "BASICS: Binary Analysis and Stack Integrity Checker System for Buffer Overflow Mitigation", "authors": ["Luís Ferreirinha", "Ibéria Medeiros"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.19670", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Cyber-Physical Systems have played an essential role in our daily lives, providing critical services such as power and water, whose operability, availability, and reliability must be ensured. The C pr", "arxiv_id": "2511.19670", "doi": "10.48550/arXiv.2511.19670"}
+{"id": "learning-effects-software-2024", "title": "Learning the Effects of Software Changes", "authors": ["Laura Plein"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3650212.3685550", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software development requires several stages of code iterations, each one requiring debugging, testing, localizing and fixing bugs. While several tools have been developed to automate one of those tas", "doi": "10.1145/3650212.3685550"}
+{"id": "patchzero-zeroshot-automatic-2023", "title": "PatchZero: Zero-Shot Automatic Patch Correctness Assessment", "authors": ["Xin Zhou", "Bowen Xu", "Kisub Kim", "Donggyun Han", "Thanh Le-Cong"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2303.00202", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) techniques have shown more and more promising results in fixing real-world bugs. Despite the effectiveness, APR techniques still face an overfitting problem: a generated", "arxiv_id": "2303.00202", "doi": "10.48550/arXiv.2303.00202"}
+{"id": "can-chatgpt-fix-2023", "title": "Can ChatGPT Fix My Code?", "authors": ["Viktor Csuvik", "T. Gyimóthy", "László Vidács"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.5220/0012120800003538", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : ChatGPT, a large language model (LLM) developed by OpenAI, ﬁne-tuned on a massive dataset of text and source code, has recently gained signiﬁcant attention on the internet. The model, built using th", "doi": "10.5220/0012120800003538"}
+{"id": "automatic-refactoring-conditions-2023", "title": "Automatic refactoring of conditions and substitutions for B state transition models", "authors": ["Chenghao Cai", "Jing Sun", "G. Dobbie"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1002/spe.3255", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The automation of programming, which lies at the intersection of software engineering and artificial intelligence, enables machines to automatically generate programs that satisfy given requirements. ", "doi": "10.1002/spe.3255"}
+{"id": "next-syntacticunit-code-2022", "title": "Next Syntactic-Unit Code Completion and Applications", "authors": ["A. Nguyen", "Aashish Yadavally", "T. Nguyen"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3551349.3559544", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion is an important feature in an IDE to improve developers’ productivity. Existing code completion approaches focus on completing the current code token, next token or statement, or code ", "doi": "10.1145/3551349.3559544"}
+{"id": "comprehensive-finetuning-large-2025", "title": "Comprehensive Fine-Tuning Large Language Models of Code for Automated Program Repair", "authors": ["Kai Huang", "Jian Zhang", "Xinlei Bao", "Xu Wang", "Yang Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2025.3532759", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) research has entered the era of large language models (LLM), and researchers have conducted several empirical studies to explore the repair capabilities of LLMs for APR.", "doi": "10.1109/TSE.2025.3532759"}
+{"id": "automated-program-refinement-2025", "title": "Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus", "authors": ["Yufan Cai", "Zhe Hou", "David Sanán", "Xiaokun Luan", "Yun Lin"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3704905", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, the rise of code-centric Large Language Models (LLMs) has reshaped the software engineering world with low-barrier tools like Copilot that can easily generate code. However, there is no corr", "doi": "10.1145/3704905"}
+{"id": "assessing-effectiveness-recent-2025", "title": "Assessing the effectiveness of recent closed-source large language models in fault localization and automated program repair", "authors": ["Bo Wang", "Ming Deng", "Mingda Chen", "Youfang Lin", "Jianyi Zhou"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10515-025-00549-x", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10515-025-00549-x"}
+{"id": "impact-finetuning-large-2025", "title": "The Impact of Fine-Tuning Large Language Models on Automated Program Repair", "authors": ["Roman Machácek", "Anastasiia Grishina", "Max Hort", "Leon Moonen"], "year": 2025, "venue": "IEEE International Conference on Software Maintenance and Evolution", "source_url": "https://arxiv.org/abs/2507.19909", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) uses various tools and techniques to help developers achieve functional and errorfree code faster. In recent years, Large Language Models (LLMs) have gained popularity a", "arxiv_id": "2507.19909", "doi": "10.1109/ICSME64153.2025.00042"}
+{"id": "exploring-generalizable-automated-2025", "title": "Exploring Generalizable Automated Program Repair with Large Language Models", "authors": ["Viola Campos", "Ridwan Shariffdeen", "A. Ulges", "Yannic Noller"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2506.03283", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) proposes bug fixes to aid developers in maintaining software. The state of the art in this domain focuses on LLMs, leveraging their strong capabilities to comprehend spe", "arxiv_id": "2506.03283"}
+{"id": "comparative-study-large-2025", "title": "A comparative study of large language models with chain-of thought prompting for automated program repair", "authors": ["Eko Darwiyanto", "Rizky Akbar Gusnaen", "R. Nurtantyana"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.11591/ijai.v14.i6.pp4579-4589", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic code repair is an important task in software development to reduce bugs efficiently. This research focuses on developing and evaluating a chain-of-thought (CoT) prompting approach to improve", "doi": "10.11591/ijai.v14.i6.pp4579-4589"}
+{"id": "instructrepair-instruct-large-2025", "title": "InstructRepair: Instruct Large Language Models With Rich Bug Information for Automated Program Repair", "authors": ["Anmin Fu", "Pengyu Xu", "Jichunyang Li", "Boyu Kuang", "Yansong Gao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TIFS.2025.3618407", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) repairs software bugs based on buggy code snippets automatically. It is instrumental in reducing the time and effort required for software maintenance. Recently, large l", "doi": "10.1109/TIFS.2025.3618407"}
+{"id": "hybrid-automated-program-2024", "title": "Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis", "authors": ["Fengjie Li", "Jiajun Jiang", "Jiajun Sun", "Hongyu Zhang"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2406.00992", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in ", "arxiv_id": "2406.00992", "doi": "10.1145/3715004"}
+{"id": "impact-code-language-2023", "title": "Impact of Code Language Models on Automated Program Repair", "authors": ["Nan Jiang", "Kevin Liu", "Thibaud Lutellier", "Lin Tan"], "year": 2023, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2302.05020", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) aims to help developers improve software reliability by generating patches for buggy programs. Although many code language models (CLM) are developed and effective in ma", "arxiv_id": "2302.05020", "doi": "10.1109/ICSE48619.2023.00125"}
+{"id": "automated-cc-program-2024", "title": "Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models", "authors": ["Kangwei Xu", "Grace Li Zhang", "Xunzhao Yin", "Cheng Zhuo", "Ulf Schlichtmann"], "year": 2024, "venue": "Workshop on Machine Learning for CAD", "source_url": "https://arxiv.org/abs/2407.03889", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In High-Level Synthesis (HLS), converting a regular C/C++ program into its HLS-compatible counterpart (HLS-C) still requires tremendous manual effort. Various program scripts have been introduced to a", "arxiv_id": "2407.03889", "doi": "10.1145/3670474.3685953"}
+{"id": "assessing-latent-automated-2024", "title": "Assessing the Latent Automated Program Repair Capabilities of Large Language Models using Round-Trip Translation", "authors": ["Fernando Vallecillos Ruiz", "Anastasiia Grishina", "Max Hort", "Leon Moonen"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2401.07994", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Research shows that errors in natural language can be corrected by translating texts to another language and back using language models. We explore to what extent this latent correction capability ext", "arxiv_id": "2401.07994", "doi": "10.1145/3771922"}
+{"id": "exploring-potential-pretrained-2024", "title": "Exploring the Potential of Pre-Trained Language Models of Code for Automated Program Repair", "authors": ["Sichong Hao", "Xianjun Shi", "Hongwei Liu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics13071200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the realm of software development, automated program repair (APR) emerges as a pivotal technique, autonomously debugging faulty code to boost productivity. Despite the notable advancements of large", "doi": "10.3390/electronics13071200"}
+{"id": "revisiting-unnaturalness-automated-2024", "title": "Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models", "authors": ["Aidan Z. H. Yang", "Sophia Kolak", "Vincent J. Hellendoorn", "Ruben Martins", "Claire Le Goues"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2404.15236", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The problem of software quality has motivated the development of a variety of techniques for Automatic Program Repair (APR). Meanwhile, recent advances in AI and Large Language Models (LLMs) have prod", "arxiv_id": "2404.15236", "doi": "10.1109/ICSE55347.2025.00089"}
+{"id": "large-language-models-2024-2", "title": "Large Language Models Meet Automated Program Repair: Innovations, Challenges and Solutions", "authors": ["Yi Tang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.54254/2755-2721/2024.18303", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As the field of Automated Program Repair (APR) continues to evolve, traditional Neural Program Repair (NPR) methods, while successful in low-resource computing scenarios, still confront numerous chall", "doi": "10.54254/2755-2721/2024.18303"}
+{"id": "automated-program-repair-2022", "title": "Automated Program Repair in the Era of Large Pre-trained Language Models", "authors": ["Chun Xia", "Yuxiang Wei", "Lingming Zhang"], "year": 2022, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2210.14179", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited pa", "arxiv_id": "2210.14179", "doi": "10.1109/ICSE48619.2023.00129"}
+{"id": "empirical-study-finetuning-2023", "title": "An Empirical Study on Fine-Tuning Large Language Models of Code for Automated Program Repair", "authors": ["Kai Huang", "Xiangxin Meng", "Jian Zhang", "Yang Liu", "Wenjie Wang"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASE56229.2023.00181", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of large language models (LLMs) has opened up new opportunities for automated program repair (APR). In particular, some recent studies have explored how to leverage large language models of", "doi": "10.1109/ASE56229.2023.00181"}
+{"id": "can-test-cases-2026", "title": "Can test cases generated by large language models facilitate automated program repair?", "authors": ["Chengming Zhang", "Haoye Wang", "Chuyang Xu", "Jiakun Liu", "Kui Liu"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10664-026-10802-w", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10664-026-10802-w"}
+{"id": "leveraging-large-language-2024-2", "title": "Leveraging Large Language Models for Automated Program Repair in Programming Education", "authors": ["Pavithra Sripathanallur Murali"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3703408", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1145/3703408"}
+{"id": "minimal-edits-automated-2024", "title": "Towards Minimal Edits in Automated Program Repair: A Hybrid Framework Integrating Graph Neural Networks and Large Language Models", "authors": ["Zhenyu Xu", "Victor S. Sheng"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/978-3-031-72344-5_27", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/978-3-031-72344-5_27"}
+{"id": "large-language-models-2023-2", "title": "Large Language Models for Automated Program Repair", "authors": ["Francisco Ribeiro"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3618305.3623587", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces two methods for automated program repair (APR) utilizing pre-trained language models. The first method demonstrates program repair as a code completion task and is validated on a", "doi": "10.1145/3618305.3623587"}
+{"id": "thinkrepair-selfdirected-automated-2024", "title": "ThinkRepair: Self-Directed Automated Program Repair", "authors": ["Xin Yin", "Chao Ni", "Shaohua Wang", "Zhenhao Li", "Limin Zeng"], "year": 2024, "venue": "International Symposium on Software Testing and Analysis", "source_url": "https://arxiv.org/abs/2407.20898", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Though many approaches have been proposed for Automated Program Repair (APR) and indeed achieved remarkable performance, they still have limitations in fixing bugs that require analyzing and reasoning", "arxiv_id": "2407.20898", "doi": "10.1145/3650212.3680359"}
+{"id": "contrastrepair-enhancing-conversationbased-2024", "title": "ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs", "authors": ["Jiaolong Kong", "Xiaofei Xie", "Mingfei Cheng", "Shangqing Liu", "Xiaoning Du"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2403.01971", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in", "arxiv_id": "2403.01971", "doi": "10.1145/3719345"}
+{"id": "making-case-llmgenerated-2025", "title": "Making the Case for LLM-Generated Automated Program Repair Benchmarks", "authors": ["Yasser Ebrahim"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SNPD65828.2025.11252591", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) has made significant strides in recent years, particularly with the integration of large language models (LLMs) and deep learning techniques. Yet despite this progress, ", "doi": "10.1109/SNPD65828.2025.11252591"}
+{"id": "collaborative-agents-automated-2025", "title": "Collaborative Agents for Automated Program Repair in Ruby", "authors": ["Nikta Akbarpour", "Mahdieh Sadat Benis", "F. H. Fard", "Ali Ouni", "M. Saied"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.03925", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) has advanced rapidly with Large Language Models (LLMs), but most existing methods remain computationally expensive, and focused on a small set of languages. Ruby, despit", "arxiv_id": "2511.03925", "doi": "10.48550/arXiv.2511.03925"}
+{"id": "relrepair-enhancing-automated-2025", "title": "RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code", "authors": ["Shunyu Liu", "Guangdong Bai", "M. Utting", "Guowei Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.16701", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) has emerged as a promising paradigm for reducing debugging time and improving the overall efficiency of software development. Recent advances in Large Language Models (L", "arxiv_id": "2509.16701", "doi": "10.48550/arXiv.2509.16701"}
+{"id": "enhancing-automated-program-2025", "title": "Enhancing Automated Program Repair via Faulty Token Localization and Quality-Aware Patch Refinement", "authors": ["Jiaolong Kong", "Xiaofei Xie", "Yiheng Xiong", "Yuekun Wang", "Jian Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.18001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have recently demonstrated strong potential for automated program repair (APR). However, existing LLM-based techniques primarily rely on coarse-grained external feedback (", "arxiv_id": "2511.18001", "doi": "10.48550/arXiv.2511.18001"}
+{"id": "analysis-research-status-2025", "title": "Analysis of Research Status in the Field of Automated Program Repair", "authors": ["Jishang Han", "De-An Huang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.61173/3k7v9734", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As software systems keep developing and becoming more widely used, it’s impossible to avoid the program bugs that come with them. To reduce the pressure on developers when modifying programs and make ", "doi": "10.61173/3k7v9734"}
+{"id": "hafixagent-historyaware-automated-2025", "title": "HAFixAgent: History-Aware Automated Program Repair Agent", "authors": ["Yu Shi", "Hao Li", "Bram Adams", "Ahmed E. Hassan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01047", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) has recently shifted toward large language models and agent-based systems, yet most systems rely on local snapshot context, overlooking repository history. Prior work sh", "arxiv_id": "2511.01047", "doi": "10.48550/arXiv.2511.01047"}
+{"id": "tsapr-tree-search-2025", "title": "TSAPR: A Tree Search Framework For Automated Program Repair", "authors": ["Haichuan Hu", "Ye Shang", "Weifeng Sun", "Quanjun Zhang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2507.01827", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of Large Language Models (LLMs), traditional Automated Program Repair (APR) techniques have undergone significant transformation. Training-free approaches, such as zero-shot", "arxiv_id": "2507.01827"}
+{"id": "sustainability-face-automated-2025", "title": "The Sustainability Face of Automated Program Repair Tools", "authors": ["Matias Martinez", "Silverio Martínez-Fernández", "Xavier Franch"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3744900", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) aims to automatize the process of repairing software bugs in order to reduce the cost of maintaining software programs. While APR accuracy has significantly improved in ", "doi": "10.1145/3744900"}
+{"id": "dynafix-iterative-automated-2025", "title": "DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information", "authors": ["Zhilin Huang", "Ling Xu", "Chao Liu", "Weifeng Sun", "Xu Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.24635", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) aims to automatically generate correct patches for buggy programs. Recent approaches leveraging large language models (LLMs) have shown promise but face limitations. Mos", "arxiv_id": "2512.24635", "doi": "10.48550/arXiv.2512.24635"}
+{"id": "automated-program-repair-2025-2", "title": "Automated Program Repair of Uncompilable Student Code", "authors": ["Griffin Pitts", "Aum Pandya", "Darsh Rank", "Muntasir Hoq", "T. Bhatt"], "year": 2025, "venue": "Technical Symposium on Computer Science Education", "source_url": "https://arxiv.org/abs/2510.06187", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A significant portion of student programming submissions in CS1 learning environments are uncompilable, limiting their use in student modeling and downstream knowledge tracing. Traditional modeling pi", "arxiv_id": "2510.06187", "doi": "10.1145/3770761.3777323"}
+{"id": "how-safe-aigenerated-2025", "title": "How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench", "authors": ["Amirali Sajadi", "Kostadin Damevski", "Preetha Chatterjee"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2507.02976", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) and their agentic frameworks are increasingly adopted to perform development tasks such as automated program repair (APR). While prior work has identified security risks i", "arxiv_id": "2507.02976"}
+{"id": "extensive-study-model-2023", "title": "An Extensive Study on Model Architecture and Program Representation in the Domain of Learning-based Automated Program Repair", "authors": ["Dániel Horváth", "Viktor Csuvik", "T. Gyimóthy", "László Vidács"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/APR59189.2023.00013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bug fixing is one of the most time-consuming and resource-intensive tasks in the software development life cycle. Automated Program Repair (APR) might be able to help in this process, but it still has", "doi": "10.1109/APR59189.2023.00013"}
+{"id": "interactionaware-patch-assessment-2025", "title": "Interaction-Aware Patch Assessment for Multi-Fault Automated Program Repair", "authors": ["Omar I. Al-Bataineh"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASE63991.2025.00319", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Patch overfitting remains a persistent challenge in automated program repair (APR), especially when validation depends on incomplete test suites. We argue that this problem is significantly exacerbate", "doi": "10.1109/ASE63991.2025.00319"}
+{"id": "exploring-experiences-automated-2024", "title": "Exploring Experiences with Automated Program Repair in Practice", "authors": ["Fairuz Nawer Meem", "Justin Smith", "Brittany Johnson"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3597503.3639182", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair, also known as APR, is an approach for automatically repairing software faults. There is a large amount of research on automated program repair, but very little offers in-dept", "doi": "10.1145/3597503.3639182"}
+{"id": "automated-program-repair-2024", "title": "Automated Program Repair: Emerging Trends Pose and Expose Problems for Benchmarks", "authors": ["Joseph Renzullo", "Pemma Reiter", "Westley Weimer", "Stephanie Forrest"], "year": 2024, "venue": "ACM Computing Surveys", "source_url": "https://arxiv.org/abs/2405.05455", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Machine learning (ML) pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other task", "arxiv_id": "2405.05455", "doi": "10.1145/3704997"}
+{"id": "comprehensive-survey-aidriven-2024", "title": "A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation", "authors": ["Avinash Anand", "Akshita Gupta", "Nishchay Yadav", "Shaurya Bajaj"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.07586", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bug fixing and code generation have been core research topics in software development for many years. The recent explosive growth in Large Language Models has completely transformed these spaces, putt", "arxiv_id": "2411.07586", "doi": "10.48550/arXiv.2411.07586"}
+{"id": "llm-fault-localisation-2024", "title": "LLM Fault Localisation within Evolutionary Computation Based Automated Program Repair", "authors": ["Sardar Bin Murtaza", "Aidan Mccoy", "Z. Ren", "Aidan Murphy", "Wolfgang Banzhaf"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3638530.3664174", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Repairing bugs can be a daunting task for even a human experienced in debugging, so naturally, attempting to automatically repair programs with a computer system is quite challenging. The existing met", "doi": "10.1145/3638530.3664174"}
+{"id": "automated-program-repair-2024-2", "title": "Automated Program Repair with the GPT Family, Including GPT-2, GPT-3 and CodeX", "authors": ["Márk Lajkó", "Viktor Csuvik", "T. Gyimóthy", "László Vidács"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643788.3648021", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) is a promising approach for addressing software defects and improving software reliability. There are various approaches to APR, including using Machine Learning (ML) te", "doi": "10.1145/3643788.3648021"}
+{"id": "templateguided-program-repair-2025", "title": "Template-Guided Program Repair in the Era of Large Language Models", "authors": ["Kai Huang", "Jian Zhang", "Xiangxin Meng", "Yang Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE55347.2025.00030", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in automated program repair (APR) have been significantly driven by the application of Large Language Models (LLMs). In particular, the integration of LLMs with traditional templat", "doi": "10.1109/ICSE55347.2025.00030"}
+{"id": "automated-program-repair-2023", "title": "Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT", "authors": ["Chun Xia", "Lingming Zhang"], "year": 2023, "venue": "International Symposium on Software Testing and Analysis", "source_url": "https://arxiv.org/abs/2304.00385", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) aims to automatically generate patches for buggy programs. Traditional APR techniques suffer from a lack of patch variety as they rely heavily on handcrafted or mined bu", "arxiv_id": "2304.00385", "doi": "10.1145/3650212.3680323"}
+{"id": "searchbased-automated-program-2024", "title": "Search-based Automated Program Repair of CPS Controllers Modeled in Simulink-Stateflow", "authors": ["Aitor Arrieta", "Pablo Valle", "Shaukat Ali"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.04688", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Stateflow models are widely used in the industry to model the high-level control logic of Cyber-Physical Systems (CPSs) in Simulink--the defacto CPS simulator. Many approaches exist to test Simulink m", "arxiv_id": "2404.04688", "doi": "10.48550/arXiv.2404.04688"}
+{"id": "automated-program-repair-2024-3", "title": "Automated Program Repair for Introductory Programming Assignments", "authors": ["Han Wan", "Hongzhen Luo", "Mengying Li", "Xiaoyan Luo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TLT.2024.3403710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic program repair (APR) tools are valuable for students to assist them with debugging tasks since program repair captures the code modification to make a buggy program pass the given test-suite", "doi": "10.1109/TLT.2024.3403710"}
+{"id": "survey-learningbased-automated-2023", "title": "A Survey of Learning-based Automated Program Repair", "authors": ["Quanjun Zhang", "Chunrong Fang", "Yuxiang Ma", "Weisong Sun", "Zhenyu Chen"], "year": 2023, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2301.03270", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) aims to fix software bugs automatically and plays a crucial role in software development and maintenance. With the recent advances in deep learning (DL), an increasing n", "arxiv_id": "2301.03270", "doi": "10.48550/arXiv.2301.03270"}
+{"id": "gamma-revisiting-templatebased-2023", "title": "Gamma: Revisiting Template-Based Automated Program Repair Via Mask Prediction", "authors": ["Quanjun Zhang", "Chunrong Fang", "Tongke Zhang", "Bo-Chen Yu", "Weisong Sun"], "year": 2023, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2309.09308", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) aims to fix software bugs without manual debugging efforts and plays a crucial role in software development and maintenance. Template-based APR has been widely investiga", "arxiv_id": "2309.09308", "doi": "10.1109/ASE56229.2023.00063"}
+{"id": "enhancing-automated-program-2023", "title": "Enhancing Automated Program Repair through Fine-tuning and Prompt Engineering", "authors": ["Rishov Paul", "Md. Mohib Hossain", "Mohammed Latif Siddiq", "Masum Hasan", "Anindya Iqbal"], "year": 2023, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2304.07840", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Sequence-to-sequence models have been used to transform erroneous programs into correct ones when trained with a large enough dataset. Some recent studies also demonstrated strong empirical evidence t", "arxiv_id": "2304.07840"}
+{"id": "patchagent-practical-program-2025", "title": "PATCHAGENT: A Practical Program Repair Agent Mimicking Human Expertise", "authors": ["Zheng Yu", "Ziyi Guo", "Yuhang Wu", "→. Jiahao", "Yu Xu"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/b07de7825deb942c548489371657935d78f8bf8b", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "pathfix-automated-program-2025", "title": "PathFix: Automated Program Repair with Expected Path", "authors": ["Xu He", "Shu Wang", "Kun Sun"], "year": 2025, "venue": "IEEE Cybersecurity Development", "source_url": "https://arxiv.org/abs/2510.14341", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) techniques are effective in fixing inevitable defects in software, enhancing development efficiency and software robustness. However, due to the difficulty of generating", "arxiv_id": "2510.14341", "doi": "10.1109/SecDev66745.2025.00018"}
+{"id": "automated-program-repair-2023-2", "title": "Automated Program Repair Based on Code Review: How do Pre-trained Transformer Models Perform?", "authors": ["Rishov Paul", "Md. Mohib Hossain", "Masum Hasan", "Anindya Iqbal"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2304.07840", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2304.07840"}
+{"id": "reinforcement-learning-mutation-2023", "title": "Reinforcement learning for mutation operator selection in automated program repair", "authors": ["Carol Hanna", "Aymeric Blot", "J. Petke"], "year": 2023, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2306.05792", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of mutated program variants is explored t", "arxiv_id": "2306.05792", "doi": "10.1007/s10515-025-00501-z"}
+{"id": "evoapr-enhancing-large-2025", "title": "EvoAPR: Enhancing Large Language Models for Automatic Program Repair with Genetic Algorithm and Dynamic LoRA", "authors": ["Huan Zhang", "Qingyang Yan", "Weihuan Min", "Chenyuan Zhang", "Li Kuang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICWS67624.2025.00123", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) aims to automate the patch generation for buggy code and is vital in software devel-opment and maintenance. While large language models (LLMs) excel in various tasks, ou", "doi": "10.1109/ICWS67624.2025.00123"}
+{"id": "t3-multilevel-treebased-2025", "title": "T3: Multi-level Tree-based Automatic Program Repair with Large Language Models", "authors": ["Quanming Liu", "Xupeng Bu", "Zhichao Yan", "Ru Li"], "year": 2025, "venue": "IEEE International Joint Conference on Neural Network", "source_url": "https://arxiv.org/abs/2506.21211", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic Program Repair (APR) is a core technology in software development and maintenance, with aims to enable automated defect repair with minimal human intervention. In recent years, the substanti", "arxiv_id": "2506.21211", "doi": "10.1109/IJCNN64981.2025.11228000"}
+{"id": "dear-novel-deep-2022", "title": "DEAR: A Novel Deep Learning-based Approach for Automated Program Repair", "authors": ["Yi Li", "Shaohua Wang", "T. Nguyen"], "year": 2022, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2205.01859", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. We present DEAR, a DL-based approach that supports fixing for the general bu", "arxiv_id": "2205.01859", "doi": "10.1145/3510003.3510177"}
+{"id": "benchmarking-automated-program-2024", "title": "Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs", "authors": ["Yicheng Ouyang", "Jun Yang", "Lingming Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3650212.3652140", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As bugs are inevitable and prevalent in real-world programs, many Automated Program Repair (APR) techniques have been proposed to generate patches for them. However, due to the lack of a standard for ", "doi": "10.1145/3650212.3652140"}
+{"id": "autoflow-automated-workflow-2024", "title": "AutoFlow: Automated Workflow Generation for Large Language Model Agents", "authors": ["Zelong Li", "Shuyuan Xu", "Kai Mei", "Wenyue Hua", "Balaji Rama"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.12821", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages t", "arxiv_id": "2407.12821", "doi": "10.48550/arXiv.2407.12821"}
+{"id": "improving-automatically-generated-2022", "title": "Improving automatically generated code from Codex via Automated Program Repair", "authors": ["Zhiyu Fan", "Xiang Gao", "Abhik Roychoudhury", "Shin Hwei Tan"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2205.10583", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2205.10583"}
+{"id": "evolving-paradigms-automated-2024", "title": "Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and Opportunities", "authors": ["Kai Huang", "Zhengzi Xu", "Su Yang", "Hongyu Sun", "Xuejun Li"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3696450", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid development and large-scale popularity of program software, modern society increasingly relies on software systems. However, the problems exposed by software have also come to the fore.", "doi": "10.1145/3696450"}
+{"id": "bugsphp-dataset-automated-2024", "title": "BugsPHP: A dataset for Automated Program Repair in PHP", "authors": ["K. D. Pramod", "Wilson Silva", "W.U.K. Thabrew", "Ridwan Shariffdeen", "Sandareka Wickramanayake"], "year": 2024, "venue": "IEEE Working Conference on Mining Software Repositories", "source_url": "https://arxiv.org/abs/2401.07356", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on", "arxiv_id": "2401.07356", "doi": "10.1145/3643991.3644878"}
+{"id": "searchbased-automated-program-2024-2", "title": "Search-Based Automated Program Repair: A Survey", "authors": ["Shilong Zhang", "Dongcheng Li", "Man Zhao", "Hui Li", "W. E. Wong"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/QRS-C63300.2024.00063", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a review of state-of-the-art search-based automated program repair techniques, which are crucial for efficiently and accurately fixing bugs and vulnerabilities in software engineer", "doi": "10.1109/QRS-C63300.2024.00063"}
+{"id": "practical-useful-automated-2024", "title": "Towards Practical and Useful Automated Program Repair for Debugging", "authors": ["Qi Xin", "Haojun Wu", "Steven P. Reiss", "Jifeng Xuan"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.08958", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current automated program repair (APR) techniques are far from being practical and useful enough to be considered for realistic debugging. They rely on unrealistic assumptions including the requiremen", "arxiv_id": "2407.08958", "doi": "10.48550/arXiv.2407.08958"}
+{"id": "synergizing-human-expertise-2024", "title": "Synergizing human expertise and AI efficiency with language model for microscopy operation and automated experiment design", "authors": ["Yongtao Liu", "M. Checa", "Rama K. Vasudevan"], "year": 2024, "venue": "Machine Learning: Science and Technology", "source_url": "https://arxiv.org/abs/2401.13803", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the advent of large language models (LLMs), in both the open source and proprietary domains, attention is turning to how to exploit such artificial intelligence (AI) systems in assisting complex ", "arxiv_id": "2401.13803", "doi": "10.1088/2632-2153/ad52e9"}
+{"id": "orllmagent-automating-modeling-2025", "title": "OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model", "authors": ["Bowen Zhang", "Pengcheng Luo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2503.10009", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2503.10009"}
+{"id": "knod-domain-knowledge-2023", "title": "KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair", "authors": ["Nan Jiang", "Thibaud Lutellier", "Yiling Lou", "Lin Tan", "Dan Goldwasser"], "year": 2023, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2302.01857", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) improves soft-ware reliability by generating patches for a buggy program automatically. Recent APR techniques leverage deep learning (DL) to build models to learn to gen", "arxiv_id": "2302.01857", "doi": "10.1109/ICSE48619.2023.00111"}
+{"id": "survey-automated-program-2023", "title": "A Survey on Automated Program Repair Techniques", "authors": ["Kai Huang", "Zhengzi Xu", "Su Yang", "Hongyu Sun", "Xuejun Li"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2303.18184", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid development and large-scale popularity of program software, modern society increasingly relies on software systems. However, the problems exposed by software have also come to the fore.", "arxiv_id": "2303.18184", "doi": "10.48550/arXiv.2303.18184"}
+{"id": "prophetfuzz-fully-automated-2024", "title": "ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model", "authors": ["Dawei Wang", "Geng Zhou", "Li Chen", "Dan Li", "Yukai Miao"], "year": 2024, "venue": "Conference on Computer and Communications Security", "source_url": "https://arxiv.org/abs/2409.00922", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mu", "arxiv_id": "2409.00922", "doi": "10.1145/3658644.3690231"}
+{"id": "llm4cve-enabling-iterative-2025", "title": "LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models", "authors": ["Mohamad Fakih", "Rahul Dharmaji", "Halima Bouzidi", "Gustavo Quiros Araya", "O. Ogundare"], "year": 2025, "venue": "Euromicro Symposium on Digital Systems Design", "source_url": "https://arxiv.org/abs/2501.03446", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software vulnerabilities remain pervasive, even with the rise of AI-powered code assistants, advanced static analysis tools, and comprehensive testing frameworks. It’s clear that we must move beyond m", "arxiv_id": "2501.03446", "doi": "10.1109/DSD67783.2025.00087"}
+{"id": "extending-range-bugs-2022", "title": "Towards Extending the Range of Bugs That Automated Program Repair Can Handle", "authors": ["Omar I. Al-Bataineh", "L. Moonen"], "year": 2022, "venue": "International Conference on Software Quality, Reliability and Security", "source_url": "https://arxiv.org/abs/2211.03911", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern automated program repair (APR) is well-tuned to finding and repairing bugs that introduce observable erroneous behavior to a program. However, a significant class of bugs does not lead to such ", "arxiv_id": "2211.03911", "doi": "10.1109/QRS57517.2022.00031"}
+{"id": "automated-vulnerability-repair-2025", "title": "Automated Vulnerability Repair of Obfuscated and Non-Obfuscated Smart Contracts Using Large Language Models", "authors": ["Chihiro Kado", "Tatsuhiro Tsuchiya"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/PRDC67299.2025.00030", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, automated program repair using large language models (LLMs) has attracted growing attention. In the context of Ethereum smart contracts, where addressing vulnerabilities before deployment is", "doi": "10.1109/PRDC67299.2025.00030"}
+{"id": "cloudfix-automated-policy-2025", "title": "CloudFix: Automated Policy Repair for Cloud Access Control Policies Using Large Language Models", "authors": ["Bethel Hall", "Owen Ungaro", "William Eiers"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.09957", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Access control policies are vital for securing modern cloud computing, where organizations must manage access to sensitive data across thousands of users in distributed system settings. Cloud administ", "arxiv_id": "2512.09957", "doi": "10.48550/arXiv.2512.09957"}
+{"id": "runbugrun-executable-dataset-2023", "title": "RunBugRun - An Executable Dataset for Automated Program Repair", "authors": ["Julian Aron Prenner", "R. Robbes"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2304.01102", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, we can notice a transition to data-driven techniques in Automated Program Repair (APR), in particular towards deep neural networks. This entails training on hundreds of thousands or even mil", "arxiv_id": "2304.01102", "doi": "10.48550/arXiv.2304.01102"}
+{"id": "less-training-more-2022", "title": "Less training, more repairing please: revisiting automated program repair via zero-shot learning", "authors": ["Chun Xia", "Lingming Zhang"], "year": 2022, "venue": "ESEC/SIGSOFT FSE", "source_url": "https://arxiv.org/abs/2207.08281", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Due to the promising future of Automated Program Repair (APR), researchers have proposed various APR techniques, including heuristic-based, template-based, and constraint-based techniques. Among such ", "arxiv_id": "2207.08281", "doi": "10.1145/3540250.3549101"}
+{"id": "automated-test-case-2024", "title": "Automated Test Case Repair Using Language Models", "authors": ["Ahmadreza Saboor Yaraghi", "D. Holden", "N. Kahani", "Lionel C. Briand"], "year": 2024, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2401.06765", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Ensuring the quality of software systems through testing is essential, yet maintaining test cases poses significant challenges and costs. The need for frequent updates to align with the evolving syste", "arxiv_id": "2401.06765", "doi": "10.1109/TSE.2025.3541166"}
+{"id": "semagent-semantics-aware-2025", "title": "SemAgent: A Semantics Aware Program Repair Agent", "authors": ["Anvith Pabba", "Alex Mathai", "Anindya Chakraborty", "Baishakhi Ray"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.16650", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown impressive capabilities in downstream software engineering tasks such as Automated Program Repair (APR). In particular, there has been a lot of research on repo", "arxiv_id": "2506.16650", "doi": "10.48550/arXiv.2506.16650"}
+{"id": "fixing-7400-bugs-2025", "title": "Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair", "authors": ["Han Zheng", "Ilia Shumailov", "Tianqi Fan", "Aiden Hall", "Mathias Payer"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.13103", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of bug-finding techniques has led to the discovery of more vulnerabilities than developers can reasonably fix, creating an urgent need for effective Automated Program Repair (APR", "arxiv_id": "2505.13103", "doi": "10.48550/arXiv.2505.13103"}
+{"id": "contextaware-prompting-llmbased-2025", "title": "Context-aware prompting for LLM-based program repair", "authors": ["Yingling Li", "Muxin Cai", "Junjie Chen", "Yang Xu", "Lei Huang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10515-025-00512-w", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10515-025-00512-w"}
+{"id": "cigar-costefficient-program-2024", "title": "CigaR: Cost-efficient Program Repair with LLMs", "authors": ["D'avid Hidv'egi", "K. Etemadi", "Sofia Bobadilla", "Monperrus Martin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.06598", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLM) have proven to be effective at automated program repair (APR). However, using LLMs can be costly, with companies invoicing users by the number of tokens. In this paper, we ", "arxiv_id": "2402.06598", "doi": "10.48550/arXiv.2402.06598"}
+{"id": "automated-program-repair-2023-3", "title": "Automated Program Repair Using Generative Models for Code Infilling", "authors": ["Charles Koutcheme", "Sami Sarsa", "Juho Leinonen", "Arto Hellas", "Paul Denny"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/6584c335547028f0a831b2bacfd615813b83d50d", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "systematic-exploration-ctorust-2025", "title": "A systematic exploration of C-to-rust code translation based on large language models: prompt strategies and automated repair", "authors": ["Ruxin Zhang", "Shanxin Zhang", "Linbo Xie"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10515-025-00570-0", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10515-025-00570-0"}
+{"id": "memorization-llmbased-program-2025", "title": "Memorization in LLM-Based Program Repair", "authors": ["Jiaolong Kong", "Mingfei Cheng", "Xiaofei Xie"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/APR66717.2025.00011", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated Program Repair (APR) is a powerful technique for mitigating the impact of software bugs in software development. The recent remarkable success of Large Language Models (LLMs) has set new sta", "doi": "10.1109/APR66717.2025.00011"}
+{"id": "premm-llmbased-program-2025", "title": "PReMM: LLM-Based Program Repair for Multi-method Bugs via Divide and Conquer", "authors": ["Linna Xie", "Zhong Li", "Yu Pei", "Zhongzhen Wen", "Kui Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3763097", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large-language models (LLMs) have been leveraged to enhance the capability of automated program repair techniques in recent research. While existing LLM-based program repair techniques compared favora", "doi": "10.1145/3763097"}
+{"id": "input-reduction-enhanced-2025", "title": "Input Reduction Enhanced LLM-based Program Repair", "authors": ["Boyang Yang", "Luyao Ren", "Xin Yin", "Jiadong Ren", "Haoye Tian"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.15251", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown great potential in Automated Program Repair (APR). Test inputs, being crucial for reasoning the root cause of failures, are always included in the prompt for LL", "arxiv_id": "2507.15251", "doi": "10.48550/arXiv.2507.15251"}
+{"id": "enhancing-code-language-2023", "title": "Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework", "authors": ["Sichong Hao", "Xianjun Shi", "Hongwei Liu", "Yanjun Shu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSME58846.2023.00024", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair (APR) is a key technique for enhancing software maintenance productivity by fixing buggy code automatically. Recently, large code language models (CLMs) have exhibited impress", "doi": "10.1109/ICSME58846.2023.00024"}
+{"id": "boosting-redundancybased-automated-2023", "title": "Boosting Redundancy-Based Automated Program Repair by Fine-Grained Pattern Mining", "authors": ["Jiajun Jiang", "Fengjie Li", "Zijie Zhao", "Zhirui Ye", "Mengjiao Liu"], "year": 2023, "venue": "IEEE International Conference on Software Maintenance and Evolution", "source_url": "https://arxiv.org/abs/2312.15955", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Redundancy-based automated program repair (APR), which generates patches by referencing existing source code, has gained much attention since they are effective in repairing real-world bugs with good ", "arxiv_id": "2312.15955", "doi": "10.1109/ICSME64153.2025.00018"}
+{"id": "rethinking-kernel-program-2025", "title": "Rethinking Kernel Program Repair: Benchmarking and Enhancing LLMs with RGym", "authors": ["Kareem Shehada", "Yifan Wu", "Wyatt D. Feng", "Adithya Iyer", "Gryphon Kumfert"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.15757", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have revolutionized automated program repair (APR) but current benchmarks like SWE-Bench predominantly focus on userspace applications and overlook the complexities of ker", "arxiv_id": "2511.15757", "doi": "10.48550/arXiv.2511.15757"}
+{"id": "empirical-evaluation-llms-2025", "title": "Empirical Evaluation of LLMs for Automated Program Fault Localisation", "authors": ["Yong Liu", "Xin Wang", "Hengyuan Liu", "Ruishi Huang", "Yonghao Wu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/QRS-C65679.2025.00063", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many recent service interruptions caused by software faults have shown that fault localisation is crucial for automated debugging and repair. In this context, Large Language Models (LLMs) have emerged", "doi": "10.1109/QRS-C65679.2025.00063"}
+{"id": "monte-carlo-tree-2026", "title": "Monte Carlo Tree Search for Execution-Guided Program Repair with Large Language Models", "authors": ["Yi Liang"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.00129", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated program repair with large language models remains challenging at the repository level due to long-horizon reasoning requirements and the limitations of autoregressive decoding. We present Co", "arxiv_id": "2602.00129"}
+{"id": "aligning-objective-llmbased-2024", "title": "Aligning the Objective of LLM-Based Program Repair", "authors": ["Junjielong Xu", "Ying Fu", "Shin Hwei Tan", "Pinjia He"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2404.08877", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with", "arxiv_id": "2404.08877", "doi": "10.1109/ICSE55347.2025.00169"}
+{"id": "how-far-can-2024", "title": "How Far Can We Go with Practical Function-Level Program Repair?", "authors": ["Jiahong Xiang", "Xiaoyang Xu", "Fanchu Kong", "Mingyuan Wu", "Haotian Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.12833", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the sing", "arxiv_id": "2404.12833", "doi": "10.48550/arXiv.2404.12833"}
+{"id": "t5apr-empowering-automated-2023", "title": "T5APR: Empowering Automated Program Repair across Languages through Checkpoint Ensemble", "authors": ["Reza Gharibi", "M. Sadreddini", "S. M. Fakhrahmad"], "year": 2023, "venue": "Journal of Systems and Software", "source_url": "https://arxiv.org/abs/2309.15742", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "arxiv_id": "2309.15742", "doi": "10.1016/j.jss.2024.112083"}
+{"id": "large-language-model-2025-2", "title": "Large Language Model Powered Automated Modeling and Optimization of Active Distribution Network Dispatch Problems", "authors": ["Xu Yang", "Chenhui Lin", "Yuelin Yang", "Qi Wang", "Hao Liu"], "year": 2025, "venue": "IEEE Transactions on Smart Grid", "source_url": "https://arxiv.org/abs/2507.21162", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing penetration of distributed energy resources into active distribution networks (ADNs) has made effective ADN dispatch imperative. However, the numerous newly-integrated ADN operators, su", "arxiv_id": "2507.21162", "doi": "10.1109/TSG.2025.3621438"}
+{"id": "integrating-large-language-2025", "title": "Integrating Large Language Models in Automated Program Verification", "authors": ["Nina Narodytska"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.34727/2025/isbn.978-3-85448-084-6_4", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.34727/2025/isbn.978-3-85448-084-6_4"}
+{"id": "reapr-automatic-program-2025", "title": "ReAPR: Automatic program repair via retrieval-augmented large language models", "authors": ["Zixin Liu", "Xiaozhi Du", "Hairui Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s11219-025-09728-1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s11219-025-09728-1"}
+{"id": "art-repair-optimizing-2025", "title": "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models", "authors": ["Fernando Vallecillos Ruiz", "Max Hort", "Leon Moonen"], "year": 2025, "venue": "International Conference on Evaluation & Assessment in Software Engineering", "source_url": "https://arxiv.org/abs/2505.02931", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic program repair (APR) aims at reducing the manual efforts required to identify and fix errors in source code. Before the rise of Large Language Model (LLM)-based agents, a common strategy was", "arxiv_id": "2505.02931", "doi": "10.1145/3756681.3756966"}
+{"id": "effectively-leveraging-execution-2025", "title": "Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs", "authors": ["Mirazul Haque", "Petr Babkin", "Farima Farmahinifarahani", "Manuela Veloso"], "year": 2025, "venue": "Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing", "source_url": "https://arxiv.org/abs/2505.04441", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) show promising performance on various programming tasks, including Automatic Program Repair (APR). However, most approaches to LLM-based APR are limited to the static anal", "arxiv_id": "2505.04441", "doi": "10.18653/v1/2025.knowledgenlp-1.17"}
+{"id": "practical-program-repair-2022", "title": "Practical Program Repair in the Era of Large Pre-trained Language Models", "authors": ["Chun Xia", "Yuxiang Wei", "Lingming Zhang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2210.14179", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2210.14179"}
+{"id": "when-finetuning-llms-2024", "title": "When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair", "authors": ["Wenqiang Luo", "Jacky W. Keung", "Boyang Yang", "He Ye", "Claire Le Goues"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2412.01072", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software systems have been evolving rapidly and inevitably introducing bugs at an increasing rate, leading to significant maintenance costs. While large language models (LLMs) have demonstrated remark", "arxiv_id": "2412.01072", "doi": "10.1145/3733599"}
+{"id": "experepair-dualmemory-enhanced-2025", "title": "EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair", "authors": ["Fangwen Mu", "Junjie Wang", "Lin Shi", "Song Wang", "Shoubin Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.10484", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatically repairing software issues remains a fundamental challenge at the intersection of software engineering and AI. Although recent advancements in Large Language Models (LLMs) have demonstrat", "arxiv_id": "2506.10484", "doi": "10.48550/arXiv.2506.10484"}
+{"id": "when-large-language-2024", "title": "When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done?", "authors": ["Yuxiao Chen", "Jingzheng Wu", "Xiang Ling", "Changjiang Li", "Zhiqing Rui"], "year": 2024, "venue": "2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)", "source_url": "https://arxiv.org/abs/2403.00448", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, large language models (LLMs) have demonstrated substantial potential in addressing automatic program repair (APR) tasks. However, the current evaluation of these models for APR tasks ", "arxiv_id": "2403.00448", "doi": "10.1145/3639478.3647633"}
+{"id": "demystifying-memorization-llmbased-2025", "title": "Demystifying Memorization in LLM-Based Program Repair via a General Hypothesis Testing Framework", "authors": ["Jiaolong Kong", "Xiaofei Xie", "Shangqing Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3729390", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved remarkable success in various applications, particularly in code-related tasks such as code generation and program repair, setting new performance benchmarks", "doi": "10.1145/3729390"}
+{"id": "dlap-deep-learning-2024", "title": "DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection", "authors": ["Yanjing Yang", "Xin Zhou", "Runfeng Mao", "Jinwei Xu", "Lanxin Yang"], "year": 2024, "venue": "Journal of Systems and Software", "source_url": "https://arxiv.org/abs/2405.01202", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance ", "arxiv_id": "2405.01202", "doi": "10.48550/arXiv.2405.01202"}
+{"id": "siadafix-issue-description-2025", "title": "SIADAFIX: issue description response for adaptive program repair", "authors": ["Xin Cao", "Nan Yu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.16059", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose utilizing fast and slow thinking to enhance the capabilities of large language model-based agents on complex tasks such as program repair. In particular, we design an adaptive program repai", "arxiv_id": "2510.16059", "doi": "10.48550/arXiv.2510.16059"}
+{"id": "evaluating-fault-localization-2024", "title": "Evaluating Fault Localization and Program Repair Capabilities of Existing Closed-Source General-Purpose LLMs", "authors": ["She-ming Jiang", "Jiabao Zhang", "Wei Chen", "Bo Wang", "Jianyi Zhou"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643795.3648390", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated debugging is an emerging research field that aims to automatically find and repair bugs. In this field, Fault Localization (FL) and Automated Program Repair (APR) gain the most research effo", "doi": "10.1145/3643795.3648390"}
+{"id": "daloapr-llmbased-automatic-2025", "title": "DALO-APR: LLM-based automatic program repair with data augmentation and loss function optimization", "authors": ["Shaosheng Wang", "Lu Lu", "Shaojian Qiu", "Qingyan Tian", "Haisha Lin"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s11227-025-07102-3", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s11227-025-07102-3"}
+{"id": "aicode-wizard-ai-2025", "title": "AI-Code Wizard an AI Code Review & Generation Assistant", "authors": ["Chandnani Lucky"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.55041/ijsrem50852", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : Designed to simplify and enhance the development of code, AI CodeWizard is a web-based application that employs artificial intelligence and provides dependable online code management for developers.", "doi": "10.55041/ijsrem50852"}
+{"id": "ai-code-review-2025", "title": "AI Code Review Assistant: A Modern Web Based Solution for Automated Code Analysis and Developer Productivity Enhancement", "authors": ["Mohanakshi K M"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.22214/ijraset.2025.73682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents the development and implementation of an AI Code Review Assistant, a comprehensive web-based\napplication designed to enhance developer productivity through automated code analysis ", "doi": "10.22214/ijraset.2025.73682"}
+{"id": "does-ai-code-2025", "title": "Does AI Code Review Lead to Code Changes? A Case Study of GitHub Actions", "authors": ["Kexin Sun", "Hongyu Kuang", "Sebastian Baltes", "Xin Zhou", "He Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.18771", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI-based code review tools automatically review and comment on pull requests to improve code quality. Despite their growing presence, little is known about their actual impact. We present a large-scal", "arxiv_id": "2508.18771", "doi": "10.48550/arXiv.2508.18771"}
+{"id": "aipowered-code-review-2024", "title": "AI-powered Code Review with LLMs: Early Results", "authors": ["Zeeshan Rasheed", "Malik Abdul Sami", "Muhammad Waseem", "Kai-Kristian Kemell", "Xiaofeng Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.18496", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we present a novel approach to improving software quality and efficiency through a Large Language Model (LLM)-based model designed to review code and identify potential issues. Our prop", "arxiv_id": "2404.18496", "doi": "10.48550/arXiv.2404.18496"}
+{"id": "crscore-reinforcement-learning-2025", "title": "CRScore++: Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review", "authors": ["M. Kapadnis", "Atharva Naik", "Carolyn P. Rosé"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.00296", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement learning (RL) to improve code review comment generation requires handling unstructured outputs, making reinforcement learning (RL) feedback challenging. The two main RL approaches, namel", "arxiv_id": "2506.00296", "doi": "10.48550/arXiv.2506.00296"}
+{"id": "support-not-automation-2025", "title": "Support, Not Automation: Towards AI-supported Code Review For Code Quality and Beyond", "authors": ["Lo Heander", "Emma Söderberg", "Christofer Rydenfält"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3696630.3728505", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review is a well-established and valuable software development practice associated with code quality, interpersonal, and team benefits. However, it is also time-consuming, with developers spendin", "doi": "10.1145/3696630.3728505"}
+{"id": "bugdar-aiaugmented-secure-2025", "title": "Bugdar: AI-Augmented Secure Code Review for GitHub Pull Requests", "authors": ["John Naulty", "Eason Chen", "Joy Wang", "George Digkas", "K. Chalkias"], "year": 2025, "venue": "Conference on Algebraic Informatics", "source_url": "https://arxiv.org/abs/2503.17302", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As software systems grow increasingly complex, ensuring security during development poses significant challenges. Traditional manual code audits are often expensive, time-intensive, and ill-suited for", "arxiv_id": "2503.17302", "doi": "10.1109/CAI64502.2025.00113"}
+{"id": "integrating-aidriven-automated-2025", "title": "Integrating AI-Driven Automated Code Review in Agile Development: Benefits, Challenges, and Best Practices", "authors": ["Saad Ahmed"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.22161/ijaems.112.1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of AI-powered automated code review tools has significantly transformed Agile software development by improving efficiency, maintaining coding standards, and enhancing developer produc", "doi": "10.22161/ijaems.112.1"}
+{"id": "adaptive-multiagent-ai-2025", "title": "Adaptive Multi-Agent AI Framework for Real-Time Energy Optimization and Context-Aware Code Review in Software Development", "authors": ["Tanush Sharanarthi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ISCTIS65944.2025.11066037", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/ISCTIS65944.2025.11066037"}
+{"id": "aiassisted-fixes-code-2025", "title": "AI-Assisted Fixes to Code Review Comments at Scale", "authors": ["Chandra Maddila", "Negar Ghorbani", "James Saindon", "Parth Thakkar", "V. Murali"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.13499", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Aim. There are 10s of thousands of code review comments each week at Meta. We developed Metamate for Code Review (MetaMateCR) that provides AI-assisted fixes for reviewer comments in production at sca", "arxiv_id": "2507.13499", "doi": "10.48550/arXiv.2507.13499"}
+{"id": "deputydev-ai-powered-2025", "title": "DeputyDev - AI Powered Developer Assistant: Breaking the Code Review Logjam through Contextual AI to Boost Developer Productivity", "authors": ["Vishal Khare", "V. Saini", "Deepak Sharma", "Anand Kumar", "Ankit Rana"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.09676", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study investigates the implementation and efficacy of DeputyDev, an AI-powered code review assistant developed to address inefficiencies in the software development process. The process of code r", "arxiv_id": "2508.09676", "doi": "10.48550/arXiv.2508.09676"}
+{"id": "aiassisted-assessment-coding-2024", "title": "AI-Assisted Assessment of Coding Practices in Modern Code Review", "authors": ["Manushree Vijayvergiya", "M. Salawa", "Ivan Budiselic", "Dan Zheng", "Pascal Lamblin"], "year": 2024, "venue": "AIware", "source_url": "https://arxiv.org/abs/2405.13565", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important elem", "arxiv_id": "2405.13565", "doi": "10.1145/3664646.3665664"}
+{"id": "aiassisted-code-review-2025", "title": "AI-Assisted Code Review and Defect Prediction Research in Software Engineering", "authors": ["Yuheng Du"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICCR67387.2025.11291978", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code programming review in software engineering is its main task, but in the actual programming process, there are complex problems in code review, which can not be effectively implemented, so it shou", "doi": "10.1109/ICCR67387.2025.11291978"}
+{"id": "aidriven-continuous-integration-2025", "title": "AI-Driven Continuous Integration: Automating Code Review and Deployment with LLMs", "authors": ["Dina Omar Salem", "Yazan Alahmed", "Ragad Fnich", "Meryem Mazroub", "Mohammad Fnich"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FMEC65595.2025.11119365", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of a Large Language Models (LLMs) into Continuous Integration (CI) pipelines greatly enhances the efficiency of the software development process. AI-based CI improves code quality by d", "doi": "10.1109/FMEC65595.2025.11119365"}
+{"id": "tales-from-trenches-2024", "title": "Tales From the Trenches: Expectations and Challenges From Practice for Code Review in the Generative AI Era", "authors": ["Nicole Davila", "Jorge Melegati", "Igor Wiese"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2024.3428439", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this study, we investigate what has been discussed about generative AI in the code review context by performing a gray literature review. We analyzed 42 documents and found insights from practice a", "doi": "10.1109/MS.2024.3428439"}
+{"id": "aipowered-code-review-2024-2", "title": "AI-Powered Code Review Assistant for Streamlining Pull Request Merging", "authors": ["Chathurya Adapa", "Sai Sindhuri Avulamanda", "A. Anjana", "Ajay Victor"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICWITE59797.2024.10503540", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: WatsonX, a comprehensive data and AI platform, adeptly addresses contemporary challenges by meticulously training, validating, tuning, and deploying data to drive impactful business outcomes. The intr", "doi": "10.1109/ICWITE59797.2024.10503540"}
+{"id": "aipowered-code-review-2023", "title": "AI-Powered Code Review Enhancing Software Quality with Intelligent Agents", "authors": ["Ravikanth Konda"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.70528/ijlrp.v4.i3.1541", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The constantly changing world of software development requires incessant advancements in quality control measures. Code review, the essential practice for detecting bugs, imposing coding standards, an", "doi": "10.70528/ijlrp.v4.i3.1541"}
+{"id": "codingcare-ai-code-2025", "title": "CodingCare: AI Code Generation Security Framework for Common Vulnerability Mitigation", "authors": ["Zhiguo Ding", "Songyang Wu", "Yiling Zhang", "Tao Yang", "Yingna Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3732945.3732990", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article provides a comprehensive review of code generation LLMs (Large Language Models) focusing on security issues and possible solutions to software development workflows. Recent literature sug", "doi": "10.1145/3732945.3732990"}
+{"id": "enhancing-software-quality-2023", "title": "Enhancing Software Quality through AI - Assisted Code Review: Insights from AWS Cloud Infrastructure Development", "authors": ["Sai Tarun Kaniganti"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.21275/sr24716230727", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.21275/sr24716230727"}
+{"id": "evaluating-large-language-2025", "title": "Evaluating Large Language Models for Code Review", "authors": ["Umut Cihan", "Arda Içöz", "Vahid Haratian", "Eray Tüzün"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.20206", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: Code reviews are crucial for software quality. Recent AI advances have allowed large language models (LLMs) to review and fix code; now, there are tools that perform these reviews. However, t", "arxiv_id": "2505.20206", "doi": "10.48550/arXiv.2505.20206"}
+{"id": "multiagent-llm-collaboration-2025", "title": "Multi-Agent LLM Collaboration for Adaptive Code Review, Debugging, and Security Analysis", "authors": ["Tanush Sharanarthi", "Sreenidhi Polineni"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MRAI65197.2025.11135756", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated code review systems have improved software development, yet many lack contextual awareness, leading to redundant feedback and limited adaptability to user-specific coding styles. This paper ", "doi": "10.1109/MRAI65197.2025.11135756"}
+{"id": "ai-code-generators-2024", "title": "AI Code Generators for Security: Friend or Foe?", "authors": ["R. Natella", "Pietro Liguori", "Cristina Improta", "B. Cukic", "Domenico Cotroneo"], "year": 2024, "venue": "IEEE Security and Privacy", "source_url": "https://arxiv.org/abs/2402.01219", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code gener", "arxiv_id": "2402.01219", "doi": "10.1109/MSEC.2024.3355713"}
+{"id": "quo-vadis-code-2025", "title": "Quo Vadis, Code Review? Exploring the Future of Code Review", "authors": ["Michael Dorner", "Andreas Bauer", "Darja Šmite", "Lukas Thode", "Daniel Méndez"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.06879", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review has long been a core practice in collaborative software engineering, yet its future trajectory is unclear. In this research, we examine how professional developers experience code review t", "arxiv_id": "2508.06879", "doi": "10.48550/arXiv.2508.06879"}
+{"id": "rethinking-code-review-2025", "title": "Rethinking Code Review Workflows with LLM Assistance: An Empirical Study", "authors": ["Fannar Steinn Aðalsteinsson", "Björn Borgar Magnússon", "Mislav Milicevic", "Adam Nirving Davidsson", "Chih-hong Cheng"], "year": 2025, "venue": "International Symposium on Empirical Software Engineering and Measurement", "source_url": "https://arxiv.org/abs/2505.16339", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background: Code reviews are a critical yet timeconsuming aspect of modern software development, increasingly challenged by growing system complexity and the demand for faster delivery. Aims: We exami", "arxiv_id": "2505.16339", "doi": "10.1109/ESEM64174.2025.00013"}
+{"id": "gptbased-code-review-2024", "title": "A GPT-based Code Review System for Programming Language Learning", "authors": ["Lee Dong-Kyu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.04722", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing demand for programming language education and growing class sizes require immediate and personalized feedback. However, traditional code review methods have limitations in providing thi", "arxiv_id": "2407.04722", "doi": "10.48550/arXiv.2407.04722"}
+{"id": "literature-review-aipowered-2025", "title": "A Literature Review on AI-Powered Smart Code Base Navigator", "authors": ["Sukanya G", "Radhika S K", "Rashmitha R", "Sanjana N", "Shanthala M N"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.55041/ijsrem52774", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract - In contemporary software development, the vast size of codebases poses challenges in locating, comprehending, and reusing code. Traditional search tools that rely on keywords often fall sho", "doi": "10.55041/ijsrem52774"}
+{"id": "use-aidriven-code-2024", "title": "Use of AI-driven Code Generation Models in Teaching and Learning Programming: a Systematic Literature Review", "authors": ["Doga Cambaz", "Xiaoling Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3626252.3630958", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent emergence of LLM-based code generation models can potentially transform programming education. To pinpoint the current state of research on using LLM-based code generators to support the te", "doi": "10.1145/3626252.3630958"}
+{"id": "systematic-literature-review-2024-2", "title": "A systematic literature review on the impact of AI models on the security of code generation", "authors": ["Claudia Negri-Ribalta", "Rémi Geraud-Stewart", "A. Sergeeva", "Gabriele Lenzini"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3389/fdata.2024.1386720", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Introduction Artificial Intelligence (AI) is increasingly used as a helper to develop computing programs. While it can boost software development and improve coding proficiency, this practice offers n", "doi": "10.3389/fdata.2024.1386720"}
+{"id": "comparative-review-ai-2024", "title": "A Comparative Review of AI Techniques for Automated Code Generation in Software Development: Advancements, Challenges, and Future Directions", "authors": ["A. Odeh", "Nada Odeh", "Abdul Salam Mohammed"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.18421/tem131-76", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI), as one of the most important fields of computer science, plays a significant role in the software development life cycle process, especially in the implementation phase, ", "doi": "10.18421/tem131-76"}
+{"id": "leveraging-generative-ai-2025", "title": "Leveraging Generative AI for Automated Code Generation and Security Compliance in Cloud-Based DevOps Pipelines: A Review", "authors": ["Rahul Vadisetty", "Anand Polamarasetti", "Dr. Sateesh kumar Rongali", "Sameerkumar Prajapati", "Jinal Bhanubhai Butani"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.2139/ssrn.5218298", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.2139/ssrn.5218298"}
+{"id": "aiassisted-code-editors-2025", "title": "AI-Assisted Code Editors with Real-Time Collaboration: A Comprehensive Review", "authors": ["P. Chavan", "Narasimha Dixit", "Aniket Patil", "Ayaan Shilledar", "Krutika Sambranikar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.55041/ijsrem55648", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract - The rapid growth of distributed software development requires advanced collaborative coding tools to fulfill market demands. AI-assisted code editors serve as revolutionary platforms which ", "doi": "10.55041/ijsrem55648"}
+{"id": "exploring-role-large-2023", "title": "Exploring the Role of Large Language Models in Automated Code Review and Software Quality Enhancement", "authors": ["Gopinath Kathiresan"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.15680/ijirset.2023.1209004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As the newest, most savvy technology domain, Large Language Models (LLMs) are the technological\nforce which is now revolutionizing Automated Code Review and Software Quality Assurance, making the cont", "doi": "10.15680/ijirset.2023.1209004"}
+{"id": "assignment-incentives-reduce-2023", "title": "Using Assignment Incentives to Reduce Student Procrastination and Encourage Code Review Interactions", "authors": ["K. Wang", "Ramon Lawrence"], "year": 2023, "venue": "2023 International Conference on Computational Science and Computational Intelligence (CSCI)", "source_url": "https://arxiv.org/abs/2311.15125", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Procrastination causes student stress, reduced learning and performance, and results in very busy help sessions immediately before deadlines. A key challenge is encouraging students to complete assign", "arxiv_id": "2311.15125", "doi": "10.1109/CSCI62032.2023.00270"}
+{"id": "human-machine-how-2025", "title": "Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers", "authors": ["Adam Alami", "Neil A. Ernst"], "year": 2025, "venue": "IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies", "source_url": "https://arxiv.org/abs/2501.02092", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of artificial intelligence (AI) continues to increase and evolve, including in software engineering (SE). This integration involves processes traditionally entrusted to humans, such as", "arxiv_id": "2501.02092", "doi": "10.1109/CHASE66643.2025.00016"}
+{"id": "code-beyond-review-2024", "title": "Code and Beyond: A Review on the Impact of AI on Modern Software Development", "authors": ["Aneeta Lohana", "Sumbul Ghulamani", "Kamran Khowaja", "Kazim Raza Talpur", "Asadullah Shah"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.22555/pjets.v12i2.1210", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This comprehensive review examines the link between AI and software development by examining their interplay throughout all development life cycle stages. The research combines insights from current l", "doi": "10.22555/pjets.v12i2.1210"}
+{"id": "aidriven-innovations-software-2025", "title": "AI-Driven Innovations in Software Engineering: A Review of Current Practices and Future Directions", "authors": ["M. Alenezi", "Mohammed Akour"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/app15031344", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The software engineering landscape is undergoing a significant transformation with the advent of artificial intelligence (AI). AI technologies are poised to redefine traditional software development p", "doi": "10.3390/app15031344"}
+{"id": "from-code-life-2025", "title": "From Code to Life: The AI‐Driven Revolution in Genome Editing", "authors": ["Zhidong Li", "Wasi Ullah Khan", "Genxiang Bai", "Chao Dong", "Jungang Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/advs.202417029", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Genome editing has revolutionized modern biotechnology, enabling precise modifications to DNA sequences with far‐reaching applications in medicine, agriculture, and synthetic biology. Recent advanceme", "doi": "10.1002/advs.202417029"}
+{"id": "systematic-literature-review-2025", "title": "A Systematic Literature Review of Parameter-Efficient Fine-Tuning for Large Code Models", "authors": ["Saima Afrin", "Md Zahidul Haque", "A. Mastropaolo"], "year": 2025, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2504.21569", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rise of Artificial Intelligence (AI)-and particularly Large Language Models (LLMs) for code–has reshaped Software Engineering (SE) by enabling the automation of tasks such as code generation, bug ", "arxiv_id": "2504.21569", "doi": "10.1145/3796522"}
+{"id": "poetics-code-generative-2025", "title": "The Poetics of Code: Generative AI and the Redefinition of Literary Creativity", "authors": ["Jihan Abdul Rahman Oshiesh"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.53032/tvcr/2025.v7n1.23", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) has emerged as a transformative force across multiple domains, notably influencing the field of literature. This article investigates the creative potential of generative ", "doi": "10.53032/tvcr/2025.v7n1.23"}
+{"id": "quantitative-analysis-quality-2024", "title": "A Quantitative Analysis of Quality and Consistency in AI-generated Code", "authors": ["Autumn Clark", "Daniel Igbokwe", "Samantha Ross", "M. Zibran"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICoSSE62619.2024.00014", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the recent emergence of generative AI (Artificial intelligence), Large Language Model (LLM) based tools such as ChatGPT have become popular assistants to humans in diverse tasks. ChatGPT has also", "doi": "10.1109/ICoSSE62619.2024.00014"}
+{"id": "rapid-gcoding-obtaining-2025", "title": "Rapid G-Coding – Obtaining G-code Using AI", "authors": ["Dragi Tiro"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1088/1757-899X/1339/1/012014", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The traditional process of creating G-code is time-intensive and requires expertise in CNC programming. Recently, several AI software have appeared. They communicate using prompts, and the user can al", "doi": "10.1088/1757-899X/1339/1/012014"}
+{"id": "teaching-learning-computer-2025", "title": "Teaching and learning computer programming using ChatGPT: A rapid review of literature amid the rise of generative AI technologies", "authors": ["M. Garcia"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10639-025-13452-5", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10639-025-13452-5"}
+{"id": "aiassisted-programming-tasks-2024", "title": "AI-Assisted Programming Tasks Using Code Embeddings and Transformers", "authors": ["S. Kotsiantis", "V. Verykios", "M. Tzagarakis"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics13040767", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This review article provides an in-depth analysis of the growing field of AI-assisted programming tasks, specifically focusing on the use of code embeddings and transformers. With the increasing compl", "doi": "10.3390/electronics13040767"}
+{"id": "aipowered-code-reviews-2024", "title": "AI-Powered Code Reviews: Leveraging Large Language Models", "authors": ["Md Shain Shahid Chowdhury", "Md. Naseef-Ur-Rahman Chowdhury", "Fariha Ferdous Neha", "Ahshanul Haque"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SPARC61891.2024.10829223", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As the complexity and volume of software development continue to grow, the need for efficient and thorough code review processes becomes increasingly critical. This paper explores the integration of L", "doi": "10.1109/SPARC61891.2024.10829223"}
+{"id": "humancentered-ai-product-2024", "title": "Human-Centered AI Product Prototyping with No-Code AutoML: Conceptual Framework, Potentials and Limitations", "authors": ["Mario Truss", "Marc Schmitt"], "year": 2024, "venue": "International journal of human computer interactions", "source_url": "https://arxiv.org/abs/2402.07933", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract This paper addresses AI product prototyping, focusing on the challenges posed by the probabilistic nature of AI behavior and the limited accessibility of prototyping tools to AI non-experts. ", "arxiv_id": "2402.07933", "doi": "10.1080/10447318.2024.2425454"}
+{"id": "integrating-generative-ai-2024", "title": "Integrating Generative AI into the Software Development Lifecycle: Impacts on Code Quality and Maintenance", "authors": ["Ayyappa Sajja", "Dheerender Thakur", "Aditya Mehra"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.30574/ijsra.2024.13.1.1837", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in generative AI have depicted it as a revolutionary approach in the software development technologies pioneered to improve the codes' reliability and sustain their quality and perform", "doi": "10.30574/ijsra.2024.13.1.1837"}
+{"id": "review-tools-zerocode-2025", "title": "Review of Tools for Zero-Code LLM Based Application Development", "authors": ["Priyaranjan Pattnayak", "Hussain Bohra"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19747", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are transforming software creation by enabling zero code development platforms. Our survey reviews recent platforms that let users build applications without writing code,", "arxiv_id": "2510.19747", "doi": "10.48550/arXiv.2510.19747"}
+{"id": "exploring-role-ai-2024", "title": "Exploring the Role of AI in Web Design and Development: A Voyage through Automated Code Generation", "authors": ["Veera Harish Muthazhagu", "B Surendiran"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IITCEE59897.2024.10467409", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Web design now plays a crucial part in how people interact with websites thanks to the quick development of the digital world. Traditional web design methods that rely on manual coding have run afoul ", "doi": "10.1109/IITCEE59897.2024.10467409"}
+{"id": "exploring-synergy-between-2024", "title": "Exploring the synergy between generative AI and software engineering: Automating code optimization and bug fixing", "authors": ["Kodamasimham Krishna", "P. Murthy", "Saumya Sarangi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.30574/wjaets.2024.13.1.0464", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As applied to software engineering, generative AI is quickly transitioning from a zero-sum industry game changer into the primary automation tool for code optimization, bug identification, and problem", "doi": "10.30574/wjaets.2024.13.1.0464"}
+{"id": "what-can-youth-2024", "title": "What Can Youth Learn About Artificial Intelligence and Machine Learning in One Hour? Examining How Hour of Code Activities Address the Five Big Ideas of AI", "authors": ["Luis Morales-Navarro", "Yasmin B. Kafai", "Eric Yang", "A. Suryana"], "year": 2024, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2412.11911", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The prominence of artificial intelligence and machine learning in everyday life has led to efforts to foster AI literacy for all K–12 students. In this paper, we review how Hour of Code activities eng", "arxiv_id": "2412.11911", "doi": "10.1609/aaai.v39i28.35193"}
+{"id": "large-language-model-2025-3", "title": "Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead", "authors": ["Guang Yang", "Wei Zheng", "Xiang Chen", "Dong Liang", "Pengfei Hu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.00020", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation has emerged as a critical research area at the intersection of Software Engineering (SE) and Artificial Intelligence (AI), attracting significant attention from both academia and indus", "arxiv_id": "2512.00020", "doi": "10.48550/arXiv.2512.00020"}
+{"id": "impact-eu-laws-2025", "title": "Impact of EU Laws on AI Adoption in Smart Grids: A Review of Regulatory Barriers, Technological Challenges, and Stakeholder Benefits", "authors": ["B. Jørgensen", "S. Gunasekaran", "Z. Ma"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/en18123002", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This scoping review examines the evolving landscape of European Union (EU) legislation, as it pertains to the implementation of artificial intelligence (AI) in smart grid systems. By outlining the cur", "doi": "10.3390/en18123002"}
+{"id": "impact-artificial-intelligence-2025", "title": "The Impact of Artificial Intelligence Enhanced No-Code Software Development Platforms on Software Processes: A Literature Review", "authors": ["O. Koç", "I. Yücedag", "Ümit Şentürk"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.29130/dubited.1554356", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This literature review examines the impact of artificial intelligence-based (AI-based) no-code software \ndevelopment platforms on software processes. The study primarily focuses on accelerating softwa", "doi": "10.29130/dubited.1554356"}
+{"id": "beyond-hype-comprehensive-2024", "title": "Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Tools", "authors": ["James Prather", "Juho Leinonen", "Natalie Kiesler", "Jamie Gorson Benario", "Sam Lau"], "year": 2024, "venue": "ITiCSE-WGR", "source_url": "https://arxiv.org/abs/2412.14732", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Ma", "arxiv_id": "2412.14732", "doi": "10.1145/3689187.3709614"}
+{"id": "chatgpt-not-all-2023", "title": "ChatGPT is not all you need. A State of the Art Review of large Generative AI models", "authors": ["Roberto Gozalo-Brizuela", "E.C. Garrido-Merchán"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2301.04655", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion that have been published. Concretely, these models are able to perform tasks such as ", "arxiv_id": "2301.04655", "doi": "10.48550/arXiv.2301.04655"}
+{"id": "artificial-intelligence-software-2025", "title": "Artificial Intelligence in Software Development: A Review of Code Generation, Testing, Maintenance and Security", "authors": ["Ms. Prajakta Sudhir Khade", "Dr. Rajeshkumar U. Sambhe"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.47191/ijcsrr/v8-i4-08", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) is transforming software development by automating key processes such as code generation, testing, maintenance, and security. AI-powered tools like OpenAI Codex, GitHub Co", "doi": "10.47191/ijcsrr/v8-i4-08"}
+{"id": "ai-scientistv2-workshoplevel-2025", "title": "The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search", "authors": ["Yutaro Yamada", "R. Lange", "Cong Lu", "Shengran Hu", "Chris Lu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.08066", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI", "arxiv_id": "2504.08066", "doi": "10.48550/arXiv.2504.08066"}
+{"id": "perspective-review-will-2025", "title": "Perspective review: Will generative AI make common data models obsolete in future analyses of distributed data networks?", "authors": ["Jeffery L. Painter", "D. Ramcharran", "Andrew Bate"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1177/20420986251332743", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrating real-world healthcare data is challenging due to diverse formats and terminologies, making standardization resource-intensive. While Common Data Models (CDMs) facilitate interoperability, ", "doi": "10.1177/20420986251332743"}
+{"id": "review-generative-ai-2024", "title": "Review of Generative AI Methods in Cybersecurity", "authors": ["Yagmur Yigit", "William J. Buchanan", "Madjid G Tehrani", "Leandros A. Maglaras"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.08701", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Over the last decade, Artificial Intelligence (AI) has become increasingly popular, especially with the use of chatbots such as ChatGPT, Gemini, and DALL-E. With this rise, large language models (LLMs", "arxiv_id": "2403.08701", "doi": "10.48550/arXiv.2403.08701"}
+{"id": "role-genai-automated-2023", "title": "Role of GenAI in Automated Code Generation within DevOps Practices: Explore how Generative AI", "authors": ["Prachi Tembhekar", "Munivel Devan", "Jawaharbabu Jeyaraman"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.60087/jklst.vol2.n2.p512", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) is a pivotal domain within computer science, profoundly influencing the software development lifecycle, particularly during the implementation phase. Here, developers grap", "doi": "10.60087/jklst.vol2.n2.p512"}
+{"id": "decade-progress-systematic-2024", "title": "A Decade of Progress: A Systematic Literature Review on the Integration of AI in Software Engineering Phases and Activities (2013-2023)", "authors": ["U. Durrani", "Mustafa Akpınar", "M. Fatih Adak", "Abdullah Talha Kabakus", "Muhammed Maruf Öztürk"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2024.3488904", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The synergy between software engineering (SE) and artificial intelligence (AI) catalyzes software development, as numerous recent studies illustrate an intensified intersection between these domains. ", "doi": "10.1109/ACCESS.2024.3488904"}
+{"id": "state-art-security-2024", "title": "State of the Art of the Security of Code Generated by LLMs: A Systematic Literature Review", "authors": ["Leonardo Criollo Ramírez", "X. Limón", "Á. Sánchez-García", "Juan Carlos Pérez-Arriaga"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CONISOFT63288.2024.00050", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI -assisted programming has experienced a surge in popularity over the past few years, largely thanks to advancements in Large Language Model technologies. This has led to the emergence of tools like", "doi": "10.1109/CONISOFT63288.2024.00050"}
+{"id": "advancements-generative-ai-2023", "title": "Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers", "authors": ["Staphord Bengesi", "Hoda El-Sayed", "Md Kamruzzaman Sarker", "Yao Houkpati", "John Irungu"], "year": 2023, "venue": "IEEE Access", "source_url": "https://arxiv.org/abs/2311.10242", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The launch of ChatGPT in 2022 garnered global attention, marking a significant milestone in the Generative Artificial Intelligence (GAI) field. While GAI has been in effect for the past decade, the in", "arxiv_id": "2311.10242", "doi": "10.1109/ACCESS.2024.3397775"}
+{"id": "review-generative-ai-2025", "title": "A Review of Generative AI and DevOps Pipelines: CI/CD, Agentic Automation, MLOps Integration, and LLMs", "authors": ["Satyadhar Joshi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.55524/ijircst.2025.13.4.1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a comprehensive review of Generative AI applications in DevOps automation, covering 50 key research works published between 2023-2025. By synthesizing insights from recent research", "doi": "10.55524/ijircst.2025.13.4.1"}
+{"id": "artificial-intelligence-drug-2025", "title": "Artificial Intelligence in Drug Discovery: A Review of AI Approaches for Target Identification", "authors": ["Faustino Faustino"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.54216/mor.030102", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) has become a revolutionary solution in drug discovery and development in aspects including high costs, long times, and high failure rates. This review describes the develo", "doi": "10.54216/mor.030102"}
+{"id": "systematic-literature-review-2025-2", "title": "Systematic Literature Review on Generative AI: Ethical Challenges and Opportunities", "authors": ["F. P. Surbakti"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.14569/ijacsa.2025.0160530", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: —Generative Artificial Intelligence (GAI) has rapidly emerged as a transformative technology capable of autonomously creating human-like content across domains such as text, images, code, and media. W", "doi": "10.14569/ijacsa.2025.0160530"}
+{"id": "benchmarking-ai-models-2025", "title": "Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol", "authors": ["Roham Koohestani", "Philippe de Bekker", "Maliheh Izadi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2503.05860", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2503.05860"}
+{"id": "orthopaedics-entering-age-2025", "title": "Is orthopaedics entering the age of generative AI?—A narrative review of current applications challenges and future directions", "authors": ["F. Oettl", "James A Pruneski", "Bálint Zsidai", "Yinan Yu", "Ting Cong"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/ksa.70145", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Artificial intelligence (AI) in medicine is undergoing a pivotal transformation, evolving from discriminative models that classify data to generative AI systems capable of creating novel cont", "doi": "10.1002/ksa.70145"}
+{"id": "aidriven-scholarly-peer-2025", "title": "AI-Driven Scholarly Peer Review via Persistent Workflow Prompting, Meta-Prompting, and Meta-Reasoning", "authors": ["Evgeny Markhasin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.03332", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Critical peer review of scientific manuscripts presents a significant challenge for Large Language Models (LLMs), partly due to data limitations and the complexity of expert reasoning. This report int", "arxiv_id": "2505.03332", "doi": "10.48550/arXiv.2505.03332"}
+{"id": "cloud-platforms-developing-2024", "title": "Cloud Platforms for Developing Generative AI Solutions: A Scoping Review of Tools and Services", "authors": ["Dhavalkumar Patel", "Ganesh Raut", "S. N. Cheetirala", "Girish N. Nadkarni", "Robert Freeman"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.06044", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI is transforming enterprise application development by enabling machines to create content, code, and designs. These models, however, demand substantial computational power and data manag", "arxiv_id": "2412.06044", "doi": "10.48550/arXiv.2412.06044"}
+{"id": "review-advances-aipowered-2024", "title": "Review of Advances in AI-Powered Monitoring and Diagnostics for CI/CD Pipelines", "authors": ["Teemu Myllynen", "Eunice Kamau", "Sikirat Damilola Mustapha", "Gideon Opeyemi Babatunde", "Anuoluwapo Collins"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.54660/.ijmrge.2024.5.1.1119-1130", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Continuous Integration and Continuous Deployment (CI/CD) pipelines are critical components of modern software development, enabling rapid delivery of reliable applications. However, ensuring the seaml", "doi": "10.54660/.ijmrge.2024.5.1.1119-1130"}
+{"id": "chaos-engineering-20-2025", "title": "Chaos Engineering 2.0: A Review of AI-Driven, Policy-Guided Resilience for Multi-Cloud Systems", "authors": ["Lasbrey Chibuzo Opara", "Ogheneruemu Nathaniel Akatakpo", "Ifeanyi Charles Ironuru", "Kingsley Anyaene", "Benjamin Osaze Enobakhare"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.69739/jcsp.v2i2.846", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-cloud has become the default posture; 89 % of large enterprises now run workloads across two or more providers, yet most failure-testing playbooks were written for a single-vendor world. Chaos E", "doi": "10.69739/jcsp.v2i2.846"}
+{"id": "systematic-review-generative-2025", "title": "A Systematic Review of Generative AI in K-12: Mapping Goals, Activities, Roles, and Outcomes via the 3P Model", "authors": ["Xiaolin Lin", "Hao Tan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/systems13100840", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI is reshaping k–12 learning as a multi-agent system in which goals, activities, and roles co-evolve across formal and informal environments. Following PRISMA and appraising quality with M", "doi": "10.3390/systems13100840"}
+{"id": "short-review-responsible-2025", "title": "A Short Review of Responsible AI Music Generation", "authors": ["Elizabeth Wilson", "Anna Wszeborowska", "Nick Bryan-Kinns"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.5281/zenodo.16946342", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.5281/zenodo.16946342"}
+{"id": "systematic-literature-review-2024-2-2", "title": "Systematic Literature Review on Analyzing the Impact of Prompt Engineering on Efficiency, Code Quality, and Security in Crud Application Development", "authors": ["K. Shanuka", "J. Wijayanayake", "K. Vidanage"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.4038/jdrra.v2i1.57", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This research investigates the impact of prompt engineering on the efficiency, code quality, and security of CRUD (Create, Read, Update, Delete) operations in software development using large language", "doi": "10.4038/jdrra.v2i1.57"}
+{"id": "aipowered-software-development-2025", "title": "AI-Powered Software Development: A Systematic Review of Recommender Systems for Programmers", "authors": ["Efthimia Mavridou", "Eleni Vrochidou", "T. Kalampokas", "V. Kanakaris", "G. Papakostas"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/computers14040119", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software engineering is a field that demands extensive knowledge and involves numerous challenges in managing information. The information landscapes in software engineering encompass source code and ", "doi": "10.3390/computers14040119"}
+{"id": "benchmarking-ai-models-2025-2", "title": "Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality", "authors": ["Roham Koohestani", "Philippe de Bekker", "Begüm Koç", "Maliheh Izadi"], "year": 2025, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2503.05860", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Benchmarks are essential for unified evaluation and reproducibility. The rapid rise of Artificial Intelligence for Software Engineering (AI4SE) has produced numerous benchmarks for tasks such as code ", "arxiv_id": "2503.05860", "doi": "10.1109/TSE.2025.3644183"}
+{"id": "ai-scientist-fully-2024", "title": "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery", "authors": ["Chris Lu", "Cong Lu", "R. Lange", "J. Foerster", "Jeff Clune"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.06292", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been use", "arxiv_id": "2408.06292"}
+{"id": "cracking-code-scoping-2024", "title": "Cracking the code: a scoping review to unite disciplines in tackling legal issues in health artificial intelligence", "authors": ["Sophie Nunnelley", "Colleen M. Flood", "Michael Da Silva", "Tanya Horsley", "Sarathy Kanathasan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1136/bmjhci-2024-101112", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Objectives The rapid integration of artificial intelligence (AI) in healthcare requires robust legal safeguards to ensure safety, privacy and non-discrimination, crucial for maintaining trust. Yet, un", "doi": "10.1136/bmjhci-2024-101112"}
+{"id": "artificial-intelligence-infrastructureascodea-2026", "title": "Artificial Intelligence for Infrastructure-as-Code—A Systematic Literature Review", "authors": ["Claus Pahl", "Övgüm Can Sezen", "Florian Hofer"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics15040755", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ingInfrastructure-as-Code (IaC) is a systems management practice that involves managing and provisioning computing infrastructure through machine-readable definition files, rather than physical hardwa", "doi": "10.3390/electronics15040755"}
+{"id": "harnessing-large-language-2025", "title": "Harnessing Large Language Models for Curated Code Reviews", "authors": ["O. Sghaier", "M. Weyssow", "H. Sahraoui"], "year": 2025, "venue": "IEEE Working Conference on Mining Software Repositories", "source_url": "https://arxiv.org/abs/2502.03425", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In code review, generating structured and relevant comments is crucial for identifying code issues and facilitating accurate code changes that ensure an efficient code review process. Well-crafted com", "arxiv_id": "2502.03425", "doi": "10.1109/MSR66628.2025.00039"}
+{"id": "role-generative-ai-2024", "title": "The Role of Generative AI Tools in Application Development: A Comprehensive Review of Current Technologies and Practices", "authors": ["Anujkumarsinh Donvir", "Sriram Panyam", "Gunjan Paliwal", "Praveen Gujar"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EMCTECH63049.2024.10741797", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper provides a comprehensive review of the role of Generative AI (GenAI) tools in modern software application development. It highlights the advancements in machine learning, natural language p", "doi": "10.1109/EMCTECH63049.2024.10741797"}
+{"id": "empowering-business-transformation-2023", "title": "Empowering Business Transformation: The Positive Impact and Ethical Considerations of Generative AI in Software Product Management - A Systematic Literature Review", "authors": ["N. Parikh"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2306.04605", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Artificial Intelligence (GAI) has made outstanding strides in recent years, with a good-sized impact on software product management. Drawing on pertinent articles from 2016 to 2023, this sy", "arxiv_id": "2306.04605", "doi": "10.48550/arXiv.2306.04605"}
+{"id": "review-ai-assistant-2024", "title": "Review on AI Assistant Systems for Programming Language Learning in Learning Environments", "authors": ["S. Senanayake", "K.T. Karunanayaka", "KB Ekanayake"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SLAAI-ICAI63667.2024.10844969", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI) has made it possible to improve student engagement in the learning environment in many ways, including fusing the adaptive, individualized support of AI assistants with th", "doi": "10.1109/SLAAI-ICAI63667.2024.10844969"}
+{"id": "building-understandable-messaging-2024", "title": "Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI", "authors": ["Katherine A. Rosenfeld", "Maike Sonnewald", "S. Jindal", "Kevin A. McCarthy", "J. Proctor"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.12812", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce a framework for the use of large language models (LLMs) in Building Understandable Messaging for Policy and Evidence Review (BUMPER). LLMs are proving capable of providing interfaces for ", "arxiv_id": "2407.12812", "doi": "10.48550/arXiv.2407.12812"}
+{"id": "embodied-ai-education-2023", "title": "Embodied AI in education: A review on the body, environment, and mind", "authors": ["Bahar Memarian", "Tenzin Doleck"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10639-023-12346-8", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10639-023-12346-8"}
+{"id": "finetuned-large-language-2025", "title": "A fine-tuned large language model based molecular dynamics agent for code generation to obtain material thermodynamic parameters", "authors": ["Zhuo-Fan Shi", "Chunxiao Xin", "Tong Huo", "Yun-Tao Jiang", "Bowen Wu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41598-025-92337-6", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the field of materials science, addressing the complex relationship between the material structure and properties has increasingly involved leveraging the text generation capabilities of AI-generat", "doi": "10.1038/s41598-025-92337-6"}
+{"id": "chatgpt-utility-healthcare-2023", "title": "ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns", "authors": ["Malik Sallam"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.3390/healthcare11060887", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ChatGPT is an artificial intelligence (AI)-based conversational large language model (LLM). The potential applications of LLMs in health care education, research, and practice could be promising if th", "doi": "10.3390/healthcare11060887"}
+{"id": "aipowered-peer-review-2023", "title": "AI-powered peer review process", "authors": ["Eduardo A. Oliveira", "Shannon A. Rios", "Zhuoxuan Jiang"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.14742/apubs.2023.482", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \nCode review is a common type of peer review in Computer Science (CS) education. It’s a peer review process that involves CS students other than the original author examining source code and is widely", "doi": "10.14742/apubs.2023.482"}
+{"id": "cocreating-automated-mhealth-2023", "title": "Cocreating an Automated mHealth Apps Systematic Review Process With Generative AI: Design Science Research Approach", "authors": ["Guido Giunti", "Colin P. Doherty"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.2196/48949", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background The use of mobile devices for delivering health-related services (mobile health [mHealth]) has rapidly increased, leading to a demand for summarizing the state of the art and practice throu", "doi": "10.2196/48949"}
+{"id": "factool-factuality-detection-2023", "title": "FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios", "authors": ["Ethan Chern", "Steffi Chern", "Shiqi Chen", "Weizhe Yuan", "Kehua Feng"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2307.13528", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: ", "arxiv_id": "2307.13528", "doi": "10.48550/arXiv.2307.13528"}
+{"id": "evaluation-impact-code-2025", "title": "An Evaluation of the Impact of Code Generation Tools on Software Development", "authors": ["Luiz Fernando Mendes Osório", "P. D. A. S. Neto", "Guilherme Avelino", "Werney Ayala Luz Lira"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.5753/sbsi.2025.246605", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: The rise of AI-assisted tools like GitHub Copilot aims to improve productivity in software development, raising questions about their practical impact on developer performance and code qualit", "doi": "10.5753/sbsi.2025.246605"}
+{"id": "neural-network-decoder-2023", "title": "Neural network decoder for near-term surface-code experiments", "authors": ["B. Varbanov", "Marc Serra-Peralta", "David Byfield", "B. Terhal"], "year": 2023, "venue": "Physical Review Research", "source_url": "https://arxiv.org/abs/2307.03280", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural network decoders can achieve a lower logical error rate compared to conventional decoders, like minimum-weight perfect matching, when decoding the surface code. Furthermore, these decoders requ", "arxiv_id": "2307.03280", "doi": "10.1103/PhysRevResearch.7.013029"}
+{"id": "deepreview-improving-llmbased-2025", "title": "DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process", "authors": ["Minjun Zhu", "Yixuan Weng", "Linyi Yang", "Yue Zhang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.08569", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly utilized in scientific research assessment, particularly in automated paper review. However, existing LLM-based review systems face significant challenges", "arxiv_id": "2503.08569", "doi": "10.48550/arXiv.2503.08569"}
+{"id": "review-aidriven-approaches-2025", "title": "Review of AI-Driven Approaches for Automated Defect Detection and Classification in Software Testing", "authors": ["Alex Thomas Thomas"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.52403/ijrr.20250619", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Increased size and complexity of modern software systems have necessitated novel, smart approaches to detecting and classifying defects. Within this review, we provide a comprehensive perspective on a", "doi": "10.52403/ijrr.20250619"}
+{"id": "inide-humanai-experience-2024", "title": "In-IDE Human-AI Experience in the Era of Large Language Models; A Literature Review", "authors": ["Agnia Sergeyuk", "Sergey Titov", "M. Izadi"], "year": 2024, "venue": "Ide", "source_url": "https://arxiv.org/abs/2401.10739", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrated Development Environments (IDEs) have become central to modern software development, especially with the integration of Artificial Intelligence (AI) to enhance programming efficiency and dec", "arxiv_id": "2401.10739", "doi": "10.1145/3643796.3648463"}
+{"id": "applications-attitudes-ethical-2026", "title": "Applications, attitudes and ethical considerations of Generative Artificial Intelligence (Gen AI) in nursing education: a scoping review", "authors": ["Philip Hardie", "Andrew Darley", "Rosemarie Derwin", "Jessica Eustace-Cook", "S. Kearns"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1186/s12912-025-04253-9", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Artificial Intelligence (Gen AI) is a type of artificial intelligence that can learn from and mimic large amounts of data to create content such as text, images, music, videos, code, and mo", "doi": "10.1186/s12912-025-04253-9"}
+{"id": "systematic-review-infrastructure-2023", "title": "Systematic Review of Infrastructure as Code (IaC) and GitOps for Cloud Automation and Governance", "authors": ["Nneka Adaobi Ochuba", "Denis Kisina", "O. S. Adanigbo", "Abel Chukwuemeke Uzoka", "Oyinomomo-emi Emmanuel Akpe"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.54660/.ijmrge.2023.3.2.664-670", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a systematic review of Infrastructure as Code (IaC) and GitOps, exploring their transformative roles in cloud automation and governance. IaC, an approach that defines and provision", "doi": "10.54660/.ijmrge.2023.3.2.664-670"}
+{"id": "democratizing-digital-transformation-2025", "title": "Democratizing Digital Transformation: A Multisector Study of Low-Code Adoption Patterns, Limitations, and Emerging Paradigms", "authors": ["Zhengwu Shi", "Junyu Dong", "Yanhai Gan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/app15126481", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Low-code development platforms (LCDPs) have emerged as transformative tools for accelerating digital transformation across industries by enabling rapid application development with minimal hand-coding", "doi": "10.3390/app15126481"}
+{"id": "how-generative-ai-2024", "title": "How Generative AI Is Transforming Journalism: Development, Application and Ethics", "authors": ["Yi Shi", "Lin Sun"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/journalmedia5020039", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative artificial intelligence (GAI) is a technology based on algorithms, models, etc., that creates content such as text, audio, images, videos, and code. GAI is deeply integrated into journalism", "doi": "10.3390/journalmedia5020039"}
+{"id": "combining-costconstrained-runtime-2025", "title": "Combining Cost-Constrained Runtime Monitors for AI Safety", "authors": ["Tim Tian Hua", "James Baskerville", "Henri Lemoine", "Mia Hopman", "Aryan Bhatt"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.15886", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Monitoring AIs at runtime can help us detect and stop harmful actions. In this paper, we study how to efficiently combine multiple runtime monitors into a single monitoring protocol. The protocol's ob", "arxiv_id": "2507.15886", "doi": "10.48550/arXiv.2507.15886"}
+{"id": "language-models-code-2025", "title": "Language Models for Code Optimization: Survey, Challenges and Future Directions", "authors": ["Jingzhi Gong", "Vardan K. Voskanyan", "Paul Brookes", "Fan Wu", "Wei Jie"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.01277", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks such as code generation, completion, and repair. This h", "arxiv_id": "2501.01277", "doi": "10.48550/arXiv.2501.01277"}
+{"id": "robots-here-navigating-2023", "title": "The Robots Are Here: Navigating the Generative AI Revolution in Computing Education", "authors": ["J. Prather", "Paul Denny", "Juho Leinonen", "Brett A. Becker", "Ibrahim Albluwi"], "year": 2023, "venue": "ITiCSE-WGR", "source_url": "https://arxiv.org/abs/2310.00658", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in artificial intelligence (AI) and specifically generative AI (GenAI) are threatening to fundamentally reshape computing and society. Largely driven by large language models (LLMs", "arxiv_id": "2310.00658", "doi": "10.1145/3623762.3633499"}
+{"id": "improving-automated-secure-2025", "title": "Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws", "authors": ["Leonardo Centellas-Claros", "Juan J. Alonso-Lecaros", "Juan Pablo Sandoval Alcocer", "Andres Neyem"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.16310", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automation of code reviews using AI models has garnered substantial attention in the software engineering community as a strategy to reduce the cost and effort associated with traditional peer review ", "arxiv_id": "2504.16310", "doi": "10.48550/arXiv.2504.16310"}
+{"id": "generative-ai-software-2025", "title": "Generative AI for Software Architecture. Applications, Trends, Challenges, and Future Directions", "authors": ["Matteo Esposito", "Xiaozhou Li", "Sergio Moreschini", "Noman Ahmad", "Tomás Cerný"], "year": 2025, "venue": "Journal of Systems and Software", "source_url": "https://arxiv.org/abs/2503.13310", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: Generative Artificial Intelligence (GenAI) is transforming much of software development, yet its application in software architecture is still in its infancy, and no prior study has systemati", "arxiv_id": "2503.13310", "doi": "10.48550/arXiv.2503.13310"}
+{"id": "enhancing-software-development-2024", "title": "Enhancing software development practices with AI insights in high-tech companies", "authors": ["Daniel Ajiga", "Patrick Azuka Okeleke", "Samuel Olaoluwa Folorunsho", "Chinedu Ezeigweneme"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.51594/csitrj.v5i8.1450", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) is revolutionizing software development practices in high-tech companies, providing transformative insights and tools that enhance productivity, quality, and efficiency. T", "doi": "10.51594/csitrj.v5i8.1450"}
+{"id": "aipowered-learning-support-2025", "title": "AI-Powered Learning Support: A Study of Retrieval-Augmented Generation (RAG) Chatbot Effectiveness in an Online Course", "authors": ["Guido Lang", "Tan Gurpinar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.62273/zklk5988", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study investigates the effectiveness of a Retrieval-Augmented Generation (RAG) chatbot to enhance learning and engagement in a self-paced, asynchronous online R programming course. To contextuali", "doi": "10.62273/zklk5988"}
+{"id": "aipowered-solutions-computer-2025", "title": "AI-Powered Solutions in Computer Science: A Comprehensive COPRAS Evaluation", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.46632/jdaai/3/1/9", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) has become a transformative force in computer science, revolutionizing technological advancement across multiple domains. This research explores the multifaceted applicati", "doi": "10.46632/jdaai/3/1/9"}
+{"id": "agentmesh-cooperative-multiagent-2025", "title": "AgentMesh: A Cooperative Multi-Agent Generative AI Framework for Software Development Automation", "authors": ["Sourena Khanzadeh"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.19902", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software development is a complex, multi-phase process traditionally requiring collaboration among individuals with diverse expertise. We propose AgentMesh, a Python-based framework that uses multiple", "arxiv_id": "2507.19902", "doi": "10.48550/arXiv.2507.19902"}
+{"id": "autonomous-supplier-evaluation-2025", "title": "Autonomous Supplier Evaluation and Data Stewardship with AI: Building Transparent and Resilient Supply Chains", "authors": ["Chandra Bonthu", "Ganpati Goel"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.22399/ijcesen.3854", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Global supply chains remain fragile with geopolitical tensions, pandemic disruption, port congestion, and climate shocks. Conventional supplier scorecards are sluggish, passive, and rarely audit-worth", "doi": "10.22399/ijcesen.3854"}
+{"id": "weaknesses-llmgenerated-code-2024", "title": "Weaknesses in LLM-Generated Code for Embedded Systems Networking", "authors": ["Murray Dunne", "Kylee Schram", "Sebastian Fischmeister"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/QRS62785.2024.00033", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern firmware development is done in a fast-paced, time-constrained environment. This pressure tempts developers to use generative AI to write code for them to save time. While this is a powerful to", "doi": "10.1109/QRS62785.2024.00033"}
+{"id": "evaluating-dental-ai-2025", "title": "Evaluating Dental AI Research Papers: Key Considerations for Editors and Reviewers.", "authors": ["Sergio E. Uribe", "Manal Hamdan", "Nicola Alberto Valente", "S. Yamaguchi", "F. Umer"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.jdent.2025.105867", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: OBJECTIVE\nArtificial intelligence (AI) is increasingly used in dental research for diagnosis, treatment planning, and disease prediction. However, many dental AI studies lack methodological rigor, tra", "doi": "10.1016/j.jdent.2025.105867"}
+{"id": "role-artificial-intelligence-2025-2", "title": "The Role of Artificial Intelligence in Computer Science Education: A Systematic Review with a Focus on Database Instruction", "authors": ["Alkmini Gaitantzi", "Ioannis Kazanidis"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/app15073960", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of artificial intelligence (AI) into computer science (CS) education is evolving, yet its specific application in database instruction remains underexplored. This systematic review ana", "doi": "10.3390/app15073960"}
+{"id": "use-generative-ai-2024", "title": "The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation", "authors": ["Y. Gwon", "Jae Heon Kim", "Hyunsuk Chung", "E. Jung", "Joey Chun"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.2196/51187", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Background A large language model is a type of artificial intelligence (AI) model that opens up great possibilities for health care practice, research, and education, although scholars have e", "doi": "10.2196/51187"}
+{"id": "assessment-cs50-ai-2025", "title": "Assessment in CS50 with AI: Leveraging Generative Artificial Intelligence for Personalized Student Evaluation", "authors": ["Rong Liu", "Benjamin Xu", "Christopher Perez", "Julianna Zhao", "Yuliia Zhukovets"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3641555.3705061", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The scalability challenges of code review and pair-programming assessments in large computer science courses, such as CS50 at Harvard University, have opened up opportunities for the application of Ge", "doi": "10.1145/3641555.3705061"}
+{"id": "comparing-codefree-bespoke-2024", "title": "Comparing code-free and bespoke deep learning approaches in ophthalmology", "authors": ["Carolyn Yu Tung Wong", "Ciara O’Byrne", "Priyal Taribagil", "Timing Liu", "F. Antaki"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s00417-024-06432-x", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code-free deep learning (CFDL) allows clinicians without coding expertise to build high-quality artificial intelligence (AI) models without writing code. In this review, we comprehensively review the ", "doi": "10.1007/s00417-024-06432-x"}
+{"id": "navigating-copyright-aienhanced-2025", "title": "Navigating Copyright in AI-Enhanced Game Design: Legal Challenges in Multimodal and Dynamic Content Creation", "authors": ["Andrew Begemann", "James Hutson"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.58567/jie03010001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of artificial intelligence (AI) in video game design has transformed traditional workflows, allowing for the generation of text, images, music, videos, and code at unprecedented scales", "doi": "10.58567/jie03010001"}
+{"id": "transforming-software-development-2024", "title": "Transforming Software Development: A Study on the Integration of Multi-Agent Systems and Large Language Models for Automatic Code Generation", "authors": ["Rolando Ramírez-Rueda", "E. Benítez-Guerrero", "Carmen Mezura-Godoy", "E. Bárcenas"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CONISOFT63288.2024.00013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper explores the integration of Multi-Agent Systems (MAS) and Large Language Models (LLMs) for auto-matic code generation, addressing the limitations of traditional manual coding. By conducting", "doi": "10.1109/CONISOFT63288.2024.00013"}
+{"id": "mixrevdetect-detecting-aigenerated-2025", "title": "MixRevDetect: Towards Detecting AI-Generated Content in Hybrid Peer Reviews", "authors": ["Sandeep Kumar", "Samarth Garg", "Sagnik Sengupta", "Tirthankar Ghosal", "Asif Ekbal"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.naacl-short.79", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The growing use of large language models (LLMs) in academic peer review poses significant challenges, particularly in distinguishing AI-generated content from human-written feedback. This research add", "doi": "10.18653/v1/2025.naacl-short.79"}
+{"id": "large-language-models-2025-5", "title": "Large Language Models (LLMs) and Generative AI in Cybersecurity and Privacy: A Survey of Dual-Use Risks, AI-Generated Malware, Explainability, and Defensive Strategies", "authors": ["Kiarash Ahi", "Saeed Valizadeh"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SVCC65277.2025.11133642", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) and generative AI (GenAI) systems, such as ChatGPT, Claude, Gemini, LLaMA, Copilot, Stable Diffusion by OpenAI, Anthropic, Google, Meta, Microsoft, Stability AI, respectiv", "doi": "10.1109/SVCC65277.2025.11133642"}
+{"id": "concordance-randomised-controlled-2024", "title": "Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines", "authors": ["Alexander P. L. Martindale", "Carrie D. Llewellyn", "R. D. de Visser", "Benjamin Ng", "V. Ngai"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41467-024-45355-3", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled tr", "doi": "10.1038/s41467-024-45355-3"}
+{"id": "policyasprompt-turning-ai-2025", "title": "Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents", "authors": ["Gauri Kholkar", "Ratinder Ahuja"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.23994", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As autonomous AI agents are used in regulated and safety-critical settings, organizations need effective ways to turn policy into enforceable controls. We introduce a regulatory machine learning frame", "arxiv_id": "2509.23994"}
+{"id": "efficiency-fairness-security-2025", "title": "On Efficiency, Fairness and Security in AI Accelerator Resource Sharing: A Survey", "authors": ["Jiahua Huang", "Weiwei Lin", "Wentai Wu", "Yang Wang", "Haocheng Zhong"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3721427", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The effective and efficient utilization of AI accelerators represents a critical issue for the practitioners engaged in the field of deep learning. Practical evidence from companies such as Alibaba, S", "doi": "10.1145/3721427"}
+{"id": "developing-custom-computer-2025", "title": "Developing custom computer vision models with Njobvu‐AI: A collaborative, user‐friendly platform for ecological research", "authors": ["Cara Appel", "Ashwin Subramanian", "Jonathan S. Koning", "Marnet Ngosi", "Christopher M Sullivan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/eap.70096", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Computer vision models show great promise for assisting researchers with rapid processing of ecological data from many sources, including images from camera traps. Access to user‐friendly wor", "doi": "10.1002/eap.70096"}
+{"id": "rathandravidianlangtech-2025-annaparavai-2025", "title": "RATHAN@DravidianLangTech 2025: Annaparavai - Separate the Authentic Human Reviews from AI-generated one", "authors": ["Jubeerathan Thevakumar", "Luheerathan Thevakumar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.dravidianlangtech-1.66", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Detecting AI-generated reviews is crucial for maintaining the authenticity of online feed-back in low-resource languages like Tamil and Malayalam. We propose a transfer learning-based approach using e", "doi": "10.18653/v1/2025.dravidianlangtech-1.66"}
+{"id": "application-generative-ai-2025", "title": "Application of Generative AI to Enhance Obstetrics and Gynecology Research", "authors": ["Tetsuya Kawakita", "Melissa S. Wong", "Kelly S Gibson", "Megha Gupta", "A. Gimovsky"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1055/a-2616-4182", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract The rapid evolution of large-language models such as ChatGPT, Claude, and Gemini is reshaping the methodological landscape of obstetrics and gynecology (OBGYN) research. This narrative review", "doi": "10.1055/a-2616-4182"}
+{"id": "introduction-generative-ai-2025", "title": "Introduction to Generative AI and DevOps: Synergies, Challenges and Applications", "authors": ["Satyadhar Joshi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48175/ijarsct-23634", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper provides a comprehensive review of the applications of Generative AI in DevOps, analyzing recent advancements, methodologies, and challenges. We examine key contributions from the literatur", "doi": "10.48175/ijarsct-23634"}
+{"id": "ai-nonmem-coding-2025", "title": "AI for NONMEM Coding in Pharmacometrics Research and Education: Shortcut or Pitfall?", "authors": ["Wenhao Zheng", "Wanbing Wang", "Carl M J Kirkpatrick", "Cornelia B Landersdorfer", "Huaxiu Yao"], "year": 2025, "venue": "CPT: Pharmacometrics & Systems Pharmacology", "source_url": "https://arxiv.org/abs/2507.08144", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI) is increasingly being explored as a tool to support pharmacometric modeling, particularly in addressing the coding challenges associated with NONMEM. In this study, we eva", "arxiv_id": "2507.08144", "doi": "10.1002/psp4.70125"}
+{"id": "teaching-programming-age-2025", "title": "Teaching Programming in the Age of Generative AI: Insights from Literature, Pedagogical Proposals, and Student Perspectives", "authors": ["C. Rubio-Manzano", "Jazna Meza", "Rodolfo Fernandez-Santibanez", "Christian Vidal-Castro"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.00108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Computer programming is undergoing a true transformation driven by powerful new tools for automatic source code generation based on large language models. This transformation is also manifesting in in", "arxiv_id": "2507.00108", "doi": "10.48550/arXiv.2507.00108"}
+{"id": "we-need-talk-2023", "title": "“We Need To Talk About ChatGPT”: The Future of AI and Higher Education", "authors": ["Michael Neumann", "Maria Rauschenberger", "Eva-Maria Schön"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SEENG59157.2023.00010", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: On November 30th, 2022, OpenAI released the large language model ChatGPT, an extension of GPT-3. The AI chatbot provides real-time communication in response to users’ requests. The quality of ChatGPT’", "doi": "10.1109/SEENG59157.2023.00010"}
+{"id": "aipowered-software-development-2025-2", "title": "AI-Powered Software Development Life Cycle: From Requirements to Maintenance", "authors": ["Sandeep Burte"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.64229/ykh1jf83", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of artificial intelligence into software development represents a paradigmatic shift with profound implications for productivity, quality, and efficiency in modern software engineering", "doi": "10.64229/ykh1jf83"}
+{"id": "diagnostic-codes-ai-2025", "title": "Diagnostic Codes in AI prediction models and Label Leakage of Same-admission Clinical Outcomes", "authors": ["Bashar Ramadan", "Mbbs Ming-Chieh Liu", "Michael C. Burkhart", "PhD William F Parker", "Brett K. Beaulieu-Jones MDPhD"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2025.08.09.25333360", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1101/2025.08.09.25333360"}
+{"id": "case-study-ai-2025", "title": "Case Study of Using AI as Co-Pilot in Biotech Research: Functional Network Analysis of Invasive Cancer", "authors": ["Hongda Jiang", "Jahziel K Chase", "L. Fu", "Sathvik Shivaram", "Anton V. Sinitskiy"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2025.05.14.654152", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1101/2025.05.14.654152"}
+{"id": "implementing-evaluating-aiassisted-2025", "title": "Towards Implementing and Evaluating AI-Assisted Pull Requests in Software Engineering Education", "authors": ["Esteban Parra", "Sophia Willingham"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CSEET66350.2025.00008", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pull requests allow developers to suggest and review codebase changes collaboratively. This process is standard for maintaining code quality and following best practices. The recent emergence of Large", "doi": "10.1109/CSEET66350.2025.00008"}
+{"id": "aiassisted-programming-decreases-2025", "title": "AI-Assisted Programming Decreases the Productivity of Experienced Developers by Increasing the Technical Debt and Maintenance Burden", "authors": ["Feiyang Xu", "Poonacha K. Medappa", "M. M. Tunç", "Martijn Vroegindeweij", "Jan C Fransoo"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2510.10165", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: GenAI solutions like GitHub Copilot have been shown to increase the productivity of software developers. Yet prior work remains unclear on the quality of code produced and the challenges of maintainin", "arxiv_id": "2510.10165"}
+{"id": "aiempowered-superresolution-microscopy-2025", "title": "AI-empowered super-resolution microscopy: a revolution in nanoscale cellular imaging.", "authors": ["Sen Li", "Xiangjie Meng", "Bo Zhou", "Wenfeng Tian", "Liangyi Chen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41592-025-02871-4", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1038/s41592-025-02871-4"}
+{"id": "role-generative-ai-2025", "title": "The Role of Generative AI in Strengthening Secure Software Coding Practices: A Systematic Perspective", "authors": ["H. Alwageed", "Rafiq Ahmad Khan"], "year": 2025, "venue": "EASE Companion", "source_url": "https://arxiv.org/abs/2504.19461", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As software security threats continue to evolve, the demand for innovative ways of securing coding has tremendously grown. The integration of Generative AI (GenAI) into software development holds sign", "arxiv_id": "2504.19461", "doi": "10.1145/3727967.3756840"}
+{"id": "meet-your-new-2025", "title": "Meet your new AI teacher: hypes, promises, and realities in AI-powered language education platforms", "authors": ["Ali Fuad Selvi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1515/applirev-2025-0224", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Amid the “breathless hype” surrounding generative AI in language education, this paper critically analyzes 48 online platforms claiming to use AI technologies through Critical Discourse Analy", "doi": "10.1515/applirev-2025-0224"}
+{"id": "secure-coding-ai-2025", "title": "Secure Coding with AI -- From Detection to Repair", "authors": ["Vladislav Belozerov", "Peter J. Barclay", "Ashkan Sami"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2504.20814", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While several studies have examined the security of code generated by GPT and other Large Language Models (LLMs), most have relied on controlled experiments rather than real developer interactions. Th", "arxiv_id": "2504.20814"}
+{"id": "impact-ai-tools-2025", "title": "The Impact of AI Tools on Software Development: A Case Study with GitHub Copilot and Other AI Assistants", "authors": ["Sérgio Cavalcante", "Erick Ribeiro", "Ana Carolina Oran"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.5220/0013294700003929", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : Background - With the increasing complexity of software projects and the demand for rapid and high-quality deliveries, Generative Artificial Intelligence (GenAI) tools have emerged as powerful allie", "doi": "10.5220/0013294700003929"}
+{"id": "it-takes-two-2023", "title": "It takes two to code: a comparative analysis of collective bargaining and artificial intelligence", "authors": ["O. Molina", "Florian Butollo", "C. Mako", "A. Godino", "Ursula Holtgrewe"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1177/10242589231156515", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The extension of artificial intelligence (AI) and algorithmic management mechanisms by companies has led to growing trade union demands to regulate their use. This article explores the role of collect", "doi": "10.1177/10242589231156515"}
+{"id": "systematic-literature-review-2022", "title": "Systematic Literature Review on Solving Competitive Programming Problem with Artificial Intelligence (AI)", "authors": ["Francis Alexander", "Alexander Agung", "Santoso Gunawan", "Edwin Ario Abdiwijaya", "Felix Pherry"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICoSEIT55604.2022.10029949", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Computer programming has emerged in research, industry, and everyday life as a general-purpose problem-solving tool. From this expansion, there has been a steady rise in demand for tools that can help", "doi": "10.1109/ICoSEIT55604.2022.10029949"}
+{"id": "generative-ai-pull-2024", "title": "Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions", "authors": ["Tao Xiao", "Hideaki Hata", "Christoph Treude", "Kenichi Matsumoto"], "year": 2024, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2402.08967", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: GitHub's Copilot for Pull Requests (PRs) is a promising service aiming to automate various developer tasks related to PRs, such as generating summaries of changes or providing complete walkthroughs wi", "arxiv_id": "2402.08967", "doi": "10.1145/3643773"}
+{"id": "precedentbased-professional-role-2025", "title": "Precedent-Based Professional Role Ethics for AI Decision Analysis", "authors": ["Chris Rauch"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1609/aies.v8i3.36794", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly being used in professional fields such as healthcare, law, and engineering. In these domains, errors can lead to serious consequences. Many current AI eth", "doi": "10.1609/aies.v8i3.36794"}
+{"id": "aibased-toolchain-vision-2025", "title": "An AI-Based Toolchain Vision for Developing Safety Critical and Compliant Systems", "authors": ["Oscar Slotosch"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.65391/r3356", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose a vision for an ISO 26262-compliant toolchain designed for the development of safety-critical automotive software systems. The distinguishing feature of this toolchain is its extensive inte", "doi": "10.65391/r3356"}
+{"id": "comparative-study-ai-2025", "title": "Comparative Study of AI and Human Evaluation for Student Website Projects", "authors": ["L. Feklistova", "Artur Kašnikov"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.34190/icair.5.1.4301", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence tools based on large language models are increasingly being adopted across a wide range of fields, including higher education. Given the substantial workload often faced by edu", "doi": "10.34190/icair.5.1.4301"}
+{"id": "code-less-traveled-2024", "title": "The Code Less Traveled By", "authors": ["Nathan Ahlgrim"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.5840/adc20245324", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Can AI be trusted to make life and death decisions? In this work of philosophical short story fiction, the AI that was created to “prevent the most harm and protect the most good” is telling the story", "doi": "10.5840/adc20245324"}
+{"id": "explainable-ai-software-2024", "title": "Explainable AI In Software Engineering: Enhancing Developer-AI Collaboration", "authors": ["Jyoti Kunal Shah"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.37547/tajet/volume06issue07-11", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) tools are increasingly integrated into software engineering tasks such as code generation, defect prediction, and project planning. However, widespread adoption is hindere", "doi": "10.37547/tajet/volume06issue07-11"}
+{"id": "vibe-coding-practice-2025", "title": "Vibe Coding in Practice: Motivations, Challenges, and a Future Outlook - a Grey Literature Review", "authors": ["Ahmed Fawzy", "Amjed Tahir", "Kelly Blincoe"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.00328", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI code generation tools are transforming software development, especially for novice and non-software developers, by enabling them to write code and build applications faster and with little to no hu", "arxiv_id": "2510.00328", "doi": "10.48550/arXiv.2510.00328"}
+{"id": "generative-ai-construction-2024", "title": "Generative AI in the Construction Industry: A State-of-the-art Analysis", "authors": ["R. Taiwo", "I. T. Bello", "S. Abdulai", "Abdul-Mugis Yussif", "B. Salami"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.09939", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The construction industry is a vital sector of the global economy, but it faces many productivity challenges in various processes, such as design, planning, procurement, inspection, and maintenance. G", "arxiv_id": "2402.09939", "doi": "10.48550/arXiv.2402.09939"}
+{"id": "low-code-smart-2023", "title": "Low Code for Smart Software Development", "authors": ["Jordi Cabot", "R. Clarisó", "T. Menzies"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2022.3211352", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The more we know about patterns in code, the better we can support those patterns. In this article, Jordi Cabot and Robert Claris ´o discuss the promise and perils of AI enhanced low-code environments", "doi": "10.1109/MS.2022.3211352"}
+{"id": "best-practices-ai-2023", "title": "Best Practices for Using AI Tools as an Author, Peer Reviewer, or Editor", "authors": ["Tiffany I. Leung", "Taiane de Azevedo Cardoso", "A. Mavragani", "G. Eysenbach"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.2196/51584", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ethics of generative artificial intelligence (AI) use in scientific manuscript content creation has become a serious matter of concern in the scientific publishing community. Generative AI has com", "doi": "10.2196/51584"}
+{"id": "transforming-software-development-2024-2", "title": "Transforming Software Development Through Generative AI : A Systematic Analysis of Automated Development Practices", "authors": ["Mitul Dilip Bhai Modi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.32628/cseit24106197", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article presents a comprehensive analysis of the transformative impact of Generative Artificial Intelligence (GenAI) on software development practices. Through systematic evaluation of implementa", "doi": "10.32628/cseit24106197"}
+{"id": "where-do-ai-2026", "title": "Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub", "authors": ["Ramtin Ehsani", "Sakshi Pathak", "S. Rawal", "Abdullah Al Mujahid", "M. M. Imran"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.15195", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI coding agents are now submitting pull requests (PRs) to software projects, acting not just as assistants but as autonomous contributors. As these agentic contributions are rapidly increasing across", "arxiv_id": "2601.15195", "doi": "10.48550/arXiv.2601.15195"}
+{"id": "natural-language-outlines-2024", "title": "Natural Language Outlines for Code: Literate Programming in the LLM Era", "authors": ["Kensen Shi", "Deniz Altinbüken", "Saswat Anand", "Mihai Christodorescu", "Katja Grünwedel"], "year": 2024, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2408.04820", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose using natural language outlines as a novel modality and interaction surface for providing AI assistance to developers throughout the software development process. An NL outline for a code f", "arxiv_id": "2408.04820", "doi": "10.1145/3696630.3728541"}
+{"id": "prompt-sapper-llmempowered-2023", "title": "Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains", "authors": ["Yu Cheng", "Jieshan Chen", "Qing Huang", "Zhenchang Xing", "Xiwei Xu"], "year": 2023, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2306.12028", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of foundation models, such as large language models (LLMs) GPT-4 and text-to-image models DALL-E, has opened up numerous possibilities across various domains. People can now use natural ", "arxiv_id": "2306.12028", "doi": "10.1145/3638247"}
+{"id": "generative-ai-software-2024", "title": "Generative Ai in Software Development : an Overview and Evaluation of Modern Coding Tools", "authors": ["Aarti"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.36948/ijfmr.2024.v06i03.23271", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI has significantly transformed software development by leveraging advanced machine learning models to automate coding tasks, generate code, and enhance productivity. This paper provides a", "doi": "10.36948/ijfmr.2024.v06i03.23271"}
+{"id": "measuring-technical-debt-2024", "title": "Measuring Technical Debt in AI-Based Competition Platforms", "authors": ["Dionysios Sklavenitis", "Dimitris Kalles"], "year": 2024, "venue": "Hellenic Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2405.11825", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Advances in AI have led to new types of technical debt in software engineering projects. AI-based competition platforms face challenges due to rapid prototyping and a lack of adherence to software eng", "arxiv_id": "2405.11825", "doi": "10.1145/3688671.3688783"}
+{"id": "exploring-adversarial-robustness-2024", "title": "Exploring adversarial robustness of JPEG AI: methodology, comparison and new methods", "authors": ["Egor Kovalev", "Georgii Bychkov", "Khaled Abud", "A. Gushchin", "A. Chistyakova"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.11795", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Adversarial robustness of neural networks is an increasingly important area of research, combining studies on computer vision models, large language models (LLMs), and others. With the release of JPEG", "arxiv_id": "2411.11795", "doi": "10.48550/arXiv.2411.11795"}
+{"id": "ai-software-engineering-2024", "title": "AI in Software Engineering at Google: Progress and the Path Ahead (Invited Talk)", "authors": ["Satish Chandra"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3664646.3676277", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Over a period of just about 5 years, the use of AI-based tools for software engineering has gone from being a very promising research investigation to indispensable features in modern developer enviro", "doi": "10.1145/3664646.3676277"}
+{"id": "determinants-chromatin-organization-2024", "title": "Determinants of Chromatin Organization in Aging and Cancer—Emerging Opportunities for Epigenetic Therapies and AI Technology", "authors": ["R. M. Castilho", "Leonard S Castilho", "B. H. Palomares", "C. Squarize"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/genes15060710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This review article critically examines the pivotal role of chromatin organization in gene regulation, cellular differentiation, disease progression and aging. It explores the dynamic between the euch", "doi": "10.3390/genes15060710"}
+{"id": "from-cuffs-code-2025", "title": "From Cuffs to Code: Machine Learning in Non-Invasive Blood Pressure Monitoring.", "authors": ["R. Pal", "Joshua Le", "Theodora Wingert", "Oren Avram", "Yu Jiayu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.accpm.2025.101655", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Blood pressure (BP) measurement in both acute care and outpatient settings is essential, as conditions like hypertension and hypotension are common and often asymptomatic until organ damage occurs. Th", "doi": "10.1016/j.accpm.2025.101655"}
+{"id": "advancing-software-quality-2025", "title": "Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques", "authors": ["Avinash Patil"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.13766", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software Quality Assurance (SQA) is critical for delivering reliable, secure, and efficient software products. The Software Quality Assurance Process aims to provide assurance that work products and p", "arxiv_id": "2505.13766", "doi": "10.48550/arXiv.2505.13766"}
+{"id": "building-trustworthy-ai-2023", "title": "Building Trustworthy AI Solutions: A Case for Practical Solutions for Small Businesses", "authors": ["Keeley A. Crockett", "Edwin Colyer", "Luciano Gerber", "A. Latham"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TAI.2021.3137091", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Building trustworthy artificial intelligence (AI) solutions, whether in academia or industry, must take into consideration a number of dimensions including legal, social, ethical, public opinion, and ", "doi": "10.1109/TAI.2021.3137091"}
+{"id": "how-do-data-2023", "title": "How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study", "authors": ["Ken Gu", "Madeleine Grunde-McLaughlin", "Andrew M. McNutt", "Jeffrey Heer", "Tim Althoff"], "year": 2023, "venue": "International Conference on Human Factors in Computing Systems", "source_url": "https://arxiv.org/abs/2309.10108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Data analysis is challenging as analysts must navigate nuanced decisions that may yield divergent conclusions. AI assistants have the potential to support analysts in planning their analyses, enabling", "arxiv_id": "2309.10108", "doi": "10.1145/3613904.3641891"}
+{"id": "source-code-comprehension-2023", "title": "Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation", "authors": ["Marvin Wyrich"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.11301", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Be it in debugging, testing, code review or, more recently, pair programming with AI assistance: in all these activities, software engineers need to understand source code. Accordingly, plenty of rese", "arxiv_id": "2310.11301", "doi": "10.48550/arXiv.2310.11301"}
+{"id": "ai-pair-programming-2024", "title": "AI Pair Programming Acceptance: A Value-Based Approach with AHP Analysis", "authors": ["Murat Tahir Çaldağ"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CoDIT62066.2024.10708135", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of Artificial Intelligence (AI) tools is transforming every aspect of life with new opportunities and risks. An impact of AI tools can be seen in AI pair programming which is defined as ", "doi": "10.1109/CoDIT62066.2024.10708135"}
+{"id": "ewallet-delivery-technology-2025", "title": "E-wallet Delivery Technology Architecture Adoption: A Review", "authors": ["Kalaivani Chellappan", "Tharsshinee Elanchselvan", "Asma’ Abu-Samah"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.17576/jkukm-2025-37(1)-14", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: E-wallet is a fintech digital tool that allows to make cashless, quick, and easy transactions, and to review and analyze payment histories. Meanwhile, the expansion of digital wallet usage has contrib", "doi": "10.17576/jkukm-2025-37(1)-14"}
+{"id": "intelligent-devops-leveraging-2024", "title": "Intelligent DevOps : Leveraging AI to Revolutionize Software Delivery", "authors": ["Apurva Reddy Kistampally"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.32628/cseit241061165", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article examines the transformative impact of artificial intelligence on DevOps practices and software delivery methodologies, presenting a comprehensive analysis of current implementations, chal", "doi": "10.32628/cseit241061165"}
+{"id": "exploring-potential-locally-2024", "title": "Exploring the Potential of Locally Run Large Language (AI) Models for Automated Grading in Introductory Computer Science Courses", "authors": ["Samuel B. Mazzone", "Jack Forden", "Dennis Brylow"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FIE61694.2024.10892816", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This innovative practice full paper describes the effectiveness of self-hosted large language models (LLMs) in assisting with the automatic grading of CSI assignments. Educators often rely on automate", "doi": "10.1109/FIE61694.2024.10892816"}
+{"id": "leveraging-generative-ai-2023", "title": "Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study", "authors": ["Danissa V. Rodriguez", "K. Lawrence", "Javier González", "Beatrix Brandfield-Harvey", "Lynn Xu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.2196/52885", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and re", "doi": "10.2196/52885"}
+{"id": "tracking-moving-target-2025", "title": "Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry", "authors": ["Maider Azanza", "Beatriz Pérez Lamancha", "Eneko Pizarro"], "year": 2025, "venue": "International Conference on Evaluation & Assessment in Software Engineering", "source_url": "https://arxiv.org/abs/2504.18985", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown great potential in automating software testing tasks, including test generation. However, their rapid evolution poses a critical challenge for companies impleme", "arxiv_id": "2504.18985", "doi": "10.1145/3756681.3756946"}
+{"id": "grammllm-grammarguided-llm-2025", "title": "GrammLLM: Grammar-Guided LLM Test Generation for Compiler Validation", "authors": ["Mohamed Gamal Talaat", "M. Hassan", "Mohamed Sayed", "Mohanad Mohamed", "I. Ahmed"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICM66518.2025.11322450", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a method for generating comprehensive test cases of a specific language-construct by leveraging grammar rules defined in a Language Reference Manual (LRM). The approach begins by analyzing ", "doi": "10.1109/ICM66518.2025.11322450"}
+{"id": "semantic-compression-memory-2026", "title": "Semantic Compression for Memory Retention in LLM Test Generation", "authors": ["Gan Wang", "Hiroaki Hashiura"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1587/transinf.2025kbl0001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1587/transinf.2025kbl0001"}
+{"id": "rug-turbo-llm-2025", "title": "Rug: Turbo Llm for Rust Unit Test Generation", "authors": ["Xiang Cheng", "Fan Sang", "Yizhuo Zhai", "Xiaokuan Zhang", "Taesoo Kim"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE55347.2025.00097", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing improves software quality by evaluating isolated sections of the program. This approach alleviates the need for comprehensive program-wide testing and confines the potential error scope w", "doi": "10.1109/ICSE55347.2025.00097"}
+{"id": "citywalk-enhancing-llmbased-2025", "title": "CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge", "authors": ["Yuwei Zhang", "Qingyuan Lu", "Kai Liu", "Wensheng Dou", "Jiaxin Zhu"], "year": 2025, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2501.16155", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing plays a pivotal role in the software development lifecycle, as it ensures code quality. However, writing high-quality unit tests remains a time-consuming task for developers in practice. ", "arxiv_id": "2501.16155", "doi": "10.1145/3763791"}
+{"id": "test-wars-comparative-2025", "title": "Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation", "authors": ["A. Abdullin", "P. Derakhshanfar", "Annibale Panichella"], "year": 2025, "venue": "International Conference on Information Control Systems & Technologies", "source_url": "https://arxiv.org/abs/2501.10200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generating tests automatically is a key and ongoing area of focus in software engineering research. The emergence of Large Language Models (LLMs) has opened up new op-portunities, given their ability ", "arxiv_id": "2501.10200", "doi": "10.1109/ICST62969.2025.10989033"}
+{"id": "hits-highcoverage-llmbased-2024", "title": "HITS: High-coverage LLM-based Unit Test Generation via Method Slicing", "authors": ["Zejun Wang", "Kaibo Liu", "Ge Li", "Zhi Jin"], "year": 2024, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2408.11324", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have behaved well in generating unit tests for Java projects. However, the performance for covering the complex focal methods within the projects is poor. Complex methods ", "arxiv_id": "2408.11324", "doi": "10.1145/3691620.3695501"}
+{"id": "prompt-alchemist-automated-2025", "title": "The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation", "authors": ["Shuzheng Gao", "Chaozheng Wang", "Cuiyun Gao", "Xiaoqi Jiao", "Chun Yong Chong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.01329", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test cases are essential for validating the reliability and quality of software applications. Recent studies have demonstrated the capability of Large Language Models (LLMs) to generate useful test ca", "arxiv_id": "2501.01329", "doi": "10.48550/arXiv.2501.01329"}
+{"id": "verilogreader-llmaided-hardware-2024", "title": "VerilogReader: LLM-Aided Hardware Test Generation", "authors": ["Ruiyang Ma", "Yuxin Yang", "Ziqian Liu", "Jiaxi Zhang", "Min Li"], "year": 2024, "venue": "2024 IEEE LLM Aided Design Workshop (LAD)", "source_url": "https://arxiv.org/abs/2406.04373", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test generation has been a critical and labor-intensive process in hardware design verification. Recently, the emergence of Large Language Model (LLM) with their advanced understanding and inference c", "arxiv_id": "2406.04373", "doi": "10.1109/LAD62341.2024.10691801"}
+{"id": "typeaware-llmbased-regression-2025", "title": "Type-aware LLM-based Regression Test Generation for Python Programs", "authors": ["Runlin Liu", "Zhe Zhang", "Yunge Hu", "Yuhang Lin", "Xiang Gao"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.14000", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated regression test generation has been extensively explored, yet generating high-quality tests for Python programs remains particularly challenging. Because of the Python's dynamic typing featu", "arxiv_id": "2503.14000"}
+{"id": "ktester-leveraging-domain-2025", "title": "KTester: Leveraging Domain and Testing Knowledge for More Effective LLM-based Test Generation", "authors": ["Anji Li", "Mingwei Liu", "Zhenxi Chen", "Zheng Pei", "Zi Li"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2511.14224", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated unit test generation using large language models (LLMs) holds great promise but often struggles with generating tests that are both correct and maintainable in real-world projects. This pape", "arxiv_id": "2511.14224", "doi": "10.1145/3744916.3787769"}
+{"id": "test-generation-from-2025", "title": "Test Generation from Use Case Specifications for IoT Systems: Custom, LLM-Based, and Hybrid Approaches", "authors": ["Zacharie Chenail-Larcher", "Jean Baptiste Minani", "Naouel Moha"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICST62969.2025.10988996", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: IoT systems are increasingly developed and deployed across various domains, where End-to-End (E2E) testing is critical to ensure reliability and expected behavior. However, generating comprehensive te", "doi": "10.1109/ICST62969.2025.10988996"}
+{"id": "primg-efficient-llmdriven-2025", "title": "PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization", "authors": ["Mohamed Salah Bouafif", "Mohammad Hamdaqa", "Ed Zulkoski"], "year": 2025, "venue": "International Conference on Evaluation & Assessment in Software Engineering", "source_url": "https://arxiv.org/abs/2505.05584", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Mutation testing is a widely recognized technique for assessing and enhancing the effectiveness of software test suites by introducing deliberate code mutations. However, its application often results", "arxiv_id": "2505.05584", "doi": "10.1145/3756681.3756991"}
+{"id": "llmbased-unit-test-2024", "title": "LLM-based Unit Test Generation via Property Retrieval", "authors": ["Zhe Zhang", "Xingyu Liu", "Yuanzhang Lin", "Xiang Gao", "Hailong Sun"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.13542", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. Moreover, in the context of unit test generation, these tools prioriti", "arxiv_id": "2410.13542", "doi": "10.48550/arXiv.2410.13542"}
+{"id": "chatunitest-framework-llmbased-2023", "title": "ChatUniTest: A Framework for LLM-Based Test Generation", "authors": ["Yinghao Chen", "Zehao Hu", "Chen Zhi", "Junxiao Han", "Shuiguang Deng"], "year": 2023, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2305.04764", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing is an essential yet frequently arduous task. Various automated unit test generation tools have been introduced to mitigate this challenge. Notably, methods based on large language models ", "arxiv_id": "2305.04764", "doi": "10.1145/3663529.3663801"}
+{"id": "static-program-analysis-2024", "title": "Static Program Analysis Guided LLM Based Unit Test Generation", "authors": ["Sujoy Roy Chowdhury", "G. Sridhara", "A. Raghavan", "Joy Bose", "Sourav Mazumdar"], "year": 2024, "venue": "COMAD/CODS", "source_url": "https://arxiv.org/abs/2503.05394", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We describe a novel approach to automating unit test generation for Java methods using large language models (LLMs). Existing LLM-based approaches rely on sample usage(s) of the method to test (focal ", "arxiv_id": "2503.05394", "doi": "10.1145/3703323.3703742"}
+{"id": "threatlens-llmguided-threat-2025", "title": "ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification", "authors": ["Dipayan Saha", "Hasan Al Shaikh", "Shams Tarek", "Farimah Farahmandi"], "year": 2025, "venue": "IACR Cryptology ePrint Archive", "source_url": "https://arxiv.org/abs/2505.06821", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current hardware security verification processes predominantly rely on manual threat modeling and test plan generation, which are labor-intensive, error-prone, and struggle to scale with increasing de", "arxiv_id": "2505.06821", "doi": "10.48550/arXiv.2505.06821"}
+{"id": "evogpt-leveraging-llmdriven-2025", "title": "EvoGPT: Leveraging LLM-Driven Seed Diversity to Improve Search-Based Test Suite Generation", "authors": ["Lior Broide", "Roni Stern", "Argaman Mordoch"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2505.12424", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Search-Based Software Testing (SBST) is a well-established approach for automated unit test generation, yet it often suffers from premature convergence and limited diversity in the generated test suit", "arxiv_id": "2505.12424"}
+{"id": "llmpowered-test-case-2024", "title": "LLM-Powered Test Case Generation for Detecting Tricky Bugs", "authors": ["Kaibo Liu", "Yiyang Liu", "Zhenpeng Chen", "Jie M. Zhang", "Yudong Han"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2404.10304", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2404.10304"}
+{"id": "test-case-generation-2025", "title": "Test Case Generation for Requirements in Natural Language - An LLM Comparison Study", "authors": ["Brahma Reddy Korraprolu", "Pavitra Pinninti", "Y. R. Reddy"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3717383.3717389", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid evolution of Large Language Models (LLMs) have opened new possibilities in automating tasks across the software developing life cycle, including test case generation This paper presents a co", "doi": "10.1145/3717383.3717389"}
+{"id": "trae-agent-llmbased-2025", "title": "Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling", "authors": ["Pengfei Gao", "Zhao Tian", "Xiangxin Meng", "Xinchen Wang", "Ruida Hu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.23370", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software issue resolution is a critical challenge in software engineering and has garnered increasing attention in recent years. With the rapid advancement of large language models (LLMs), substantial", "arxiv_id": "2507.23370", "doi": "10.48550/arXiv.2507.23370"}
+{"id": "llm4fin-fully-automating-2024", "title": "LLM4Fin: Fully Automating LLM-Powered Test Case Generation for FinTech Software Acceptance Testing", "authors": ["Zhiyi Xue", "Liangguo Li", "Senyue Tian", "Xiaohong Chen", "Pingping Li"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3650212.3680388", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: FinTech software, crucial for both safety and timely market deployment, presents a compelling case for automated acceptance testing against regulatory business rules. However, the inherent challenges ", "doi": "10.1145/3650212.3680388"}
+{"id": "haven-hallucinationmitigated-llm-2025", "title": "HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers", "authors": ["Yiyao Yang", "Fu Teng", "Pengju Liu", "Mengnan Qi", "Chenyang Lv"], "year": 2025, "venue": "Design, Automation and Test in Europe", "source_url": "https://arxiv.org/abs/2501.04908", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, the use of large language models (LLMs) for Verilog code generation has attracted great research interest to enable hardware design automation. However, previous works have shown a gap betwe", "arxiv_id": "2501.04908", "doi": "10.23919/DATE64628.2025.10993072"}
+{"id": "target-traffic-rulebased-2023", "title": "TARGET: Traffic Rule-Based Test Generation for Autonomous Driving via Validated LLM-Guided Knowledge Extraction", "authors": ["Yao Deng", "Zhi Tu", "Jiaohong Yao", "Mengshi Zhang", "Tianyi Zhang"], "year": 2023, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2305.06018", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent incidents with autonomous vehicles highlight the need for rigorous testing to ensure safety and robustness. Constructing test scenarios for autonomous driving systems (ADSs), however, is labor-", "arxiv_id": "2305.06018", "doi": "10.1109/TSE.2025.3569086"}
+{"id": "domain-knowledge-all-2024", "title": "Domain Knowledge is All You Need: A Field Deployment of LLM-Powered Test Case Generation in FinTech Domain", "authors": ["Zhiyi Xue", "Liangguo Li", "Senyue Tian", "Xiaohong Chen", "Pingping Li"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3639478.3643087", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the promise of automation, general-purpose Large Language Models (LLMs) face difficulties in generating complete and accurate test cases from informal software requirements, primarily due to c", "doi": "10.1145/3639478.3643087"}
+{"id": "llmpowered-test-case-2024-2", "title": "LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs", "authors": ["Kaibo Liu", "Zhenpeng Chen", "Yiyang Liu", "Jie M. Zhang", "Mark Harman"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2404.10304", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Detecting tricky bugs in plausible programs, those that pass existing test suites yet still contain bugs, remains a significant challenge in software testing. To address this problem, we propose Trick", "arxiv_id": "2404.10304", "doi": "10.18653/v1/2025.acl-long.20"}
+{"id": "llm-tg-automated-2024", "title": "LLM - TG: Towards Automated Test Case Generation for Processors Using Large Language Models", "authors": ["Yifei Deng", "Renzhi Chen", "Chao Xiao", "Zhijie Yang", "Yuanfeng Luo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICCD63220.2024.00066", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Design verification (DV) has existed for decades and is crucial for identifying potential bugs before chip tape- out. Hand-crafting test cases is time-consuming and error-prone, even for experienced v", "doi": "10.1109/ICCD63220.2024.00066"}
+{"id": "ragmcp-mitigating-prompt-2025", "title": "RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation", "authors": ["Tiantian Gan", "Qiyao Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.03275", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) struggle to effectively utilize a growing number of external tools, such as those defined by the Model Context Protocol (MCP)\\cite{IntroducingMCP}, due to prompt bloat and", "arxiv_id": "2505.03275", "doi": "10.48550/arXiv.2505.03275"}
+{"id": "hypothesis-generation-materials-2025", "title": "Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents", "authors": ["Shrinidhi Kumbhar", "Venkatesh Mishra", "Kevin Coutinho", "Divij Handa", "Ashif Iquebal"], "year": 2025, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2501.13299", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Materials discovery and design are essential for advancing technology across various industries by enabling the development of application-specific materials. Recent research has leveraged Large Langu", "arxiv_id": "2501.13299", "doi": "10.48550/arXiv.2501.13299"}
+{"id": "navigating-confidentiality-test-2024", "title": "Navigating Confidentiality in Test Automation: A Case Study in LLM Driven Test Data Generation", "authors": ["Hrishikesh Karmarkar", "Supriya Agrawal", "Avriti Chauhan", "Pranav Shete"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SANER60148.2024.00041", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In out sourced industrial projects for testing of web applications, often neither the application to be tested, nor its source code are provided to the testing team, due to confidentiality reasons, ma", "doi": "10.1109/SANER60148.2024.00041"}
+{"id": "do-we-truly-2025", "title": "Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute", "authors": ["Jianhao Chen", "Zishuo Xun", "Bocheng Zhou", "Han Qi", "Qiaosheng Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.00762", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with", "arxiv_id": "2504.00762", "doi": "10.48550/arXiv.2504.00762"}
+{"id": "aipowered-multiagent-framework-2024", "title": "AI-Powered Multi-Agent Framework for Automated Unit Test Case Generation: Enhancing Software Quality through LLM’s", "authors": ["Anusha Garlapati", "Satya Sai", "Dr Muni Parmesh", "Jaisri S Savitha"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/GCAT62922.2024.10923987", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent years have witnessed an enormous rise in the design, repair and the enhancement of software automation tests. The reliability of program’s unit testing has major impact on its overall performan", "doi": "10.1109/GCAT62922.2024.10923987"}
+{"id": "llm-test-script-2023", "title": "LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities", "authors": ["Shengcheng Yu", "Chunrong Fang", "Yucheng Ling", "Chentian Wu", "Zhenyu Chen"], "year": 2023, "venue": "International Conference on Software Quality, Reliability and Security", "source_url": "https://arxiv.org/abs/2309.13574", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation. Test script generation is a vital component of software testing, enab", "arxiv_id": "2309.13574", "doi": "10.1109/QRS60937.2023.00029"}
+{"id": "liredroid-llmenhanced-test-2024", "title": "LIReDroid: LLM-Enhanced Test Case Generation for Static Sensitive Behavior Replication", "authors": ["Yin Wang", "Ming Fan", "Xicheng Zhang", "Jifei Shi", "Zhaoyu Qiu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3671016.3671404", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Malicious Android applications often employ covert behaviors to exfiltrate sensitive data, thereby compromising user privacy. Traditional detection techniques predominantly utilize static analysis of ", "doi": "10.1145/3671016.3671404"}
+{"id": "evaluating-effectiveness-large-2025", "title": "Evaluating the Effectiveness of Large Language Models in Automated Unit Test Generation", "authors": ["Tharindu Godage", "Sivaraj Nimishan", "S. Vasanthapriyan", "Palanisamy Vigneshwaran", "Charles Joseph"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICARC64760.2025.10962997", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing use of Artificial Intelligence (AI) in software development underscores the need to select suitable Large Language Models (LLMs) for automating software unit test generation. No prior w", "doi": "10.1109/ICARC64760.2025.10962997"}
+{"id": "vericoder-enhancing-llmbased-2025", "title": "VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation", "authors": ["Anjiang Wei", "Huanmi Tan", "Tarun Suresh", "Daniel Mendoza", "Thiago S. F. X. Teixeira"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.15659", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in Large Language Models (LLMs) have sparked growing interest in applying them to Electronic Design Automation (EDA) tasks, particularly Register Transfer Level (RTL) code generation. ", "arxiv_id": "2504.15659", "doi": "10.48550/arXiv.2504.15659"}
+{"id": "hallucination-consensus-multiagent-2025", "title": "Hallucination to Consensus: Multi-Agent LLMs for End-to-End Test Generation", "authors": ["Qinghua Xu", "Guancheng Wang", "Lionel C. Briand", "Kui Liu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2506.02943", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing plays a critical role in ensuring software correctness. However, writing unit tests manually is labor-intensive, especially for strongly typed languages like Java, motivating the need for", "arxiv_id": "2506.02943"}
+{"id": "benchmarking-llms-unit-2025", "title": "Benchmarking LLMs for Unit Test Generation from Real-World Functions", "authors": ["Dong Huang", "Jie M. Zhang", "Mark Harman", "Qianru Zhang", "Mingzhe Du"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.00408", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, large language models (LLMs) have shown great promise in automating unit test generation, significantly reducing the manual effort required by developers. To effectively evaluate the capabil", "arxiv_id": "2508.00408", "doi": "10.48550/arXiv.2508.00408"}
+{"id": "coverup-effective-high-2024", "title": "CoverUp: Effective High Coverage Test Generation for Python", "authors": ["Juan Altmayer Pizzorno", "E. Berger"], "year": 2024, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2403.16218", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains challeng", "arxiv_id": "2403.16218", "doi": "10.1145/3729398"}
+{"id": "github-copilot-test-2024", "title": "Using GitHub Copilot for Test Generation in Python: An Empirical Study", "authors": ["Khalid El Haji", "C. Brandt", "A. Zaidman"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3644032.3644443", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Writing unit tests is a crucial task in software development, but it is also recognized as a time-consuming and tedious task. As such, numerous test generation approaches have been proposed and invest", "doi": "10.1145/3644032.3644443"}
+{"id": "holistic-framework-multimodal-2024", "title": "Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation", "authors": ["Cheng-Yi Li", "Kao-Jung Chang", "Cheng-Fu Yang", "Hsin-Yu Wu", "Wenting Chen"], "year": 2024, "venue": "Nature Communications", "source_url": "https://arxiv.org/abs/2407.02235", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-ba", "arxiv_id": "2407.02235", "doi": "10.1038/s41467-025-57426-0"}
+{"id": "llmdriven-selfimproving-framework-2025", "title": "LLM-Driven, Self-Improving Framework for Security Test Automation: Leveraging Karate DSL for Augmented API Resilience", "authors": ["E. Pasca", "Daniela Delinschi", "Rudolf Erdei", "O. Matei"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3554960", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern software architectures heavily rely on APIs, yet face significant security challenges, particularly with Broken Object Level Authorization (BOLA) vulnerabilities, which remain the most critical", "doi": "10.1109/ACCESS.2025.3554960"}
+{"id": "system-automated-unit-2024", "title": "A System for Automated Unit Test Generation using Large Language Models and Assessment of Generated Test Suites", "authors": ["Andrea Lops", "F. Narducci", "Azzurra Ragone", "Michelantonio Trizio", "Claudio Bartolini"], "year": 2024, "venue": "International Conference on Software Testing, Verification and Validation Workshops", "source_url": "https://arxiv.org/abs/2408.07846", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit tests are fundamental for ensuring software correctness but are costly and time-intensive to design and create. Recent advances in Large Language Models (LLMs) have shown potential for automating", "arxiv_id": "2408.07846", "doi": "10.1109/ICSTW64639.2025.10962454"}
+{"id": "hardtests-synthesizing-highquality-2025", "title": "HardTests: Synthesizing High-Quality Test Cases for LLM Coding", "authors": ["Zhongmou He", "Yee Man Choi", "Kexun Zhang", "Jiabao Ji", "Junting Zhou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.24098", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Verifiers play a crucial role in large language model (LLM) reasoning, needed by post-training techniques such as reinforcement learning. However, reliable verifiers are hard to get for difficult codi", "arxiv_id": "2505.24098", "doi": "10.48550/arXiv.2505.24098"}
+{"id": "use-propertybased-testing-2025", "title": "Use Property-Based Testing to Bridge LLM Code Generation and Validation", "authors": ["Lehan He", "Zeren Chen", "Zhe Zhang", "Jing Shao", "Xiang Gao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.18315", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) excel at code generation, but ensuring their outputs to be functionally correct, especially in complex programming tasks, is a persistent challenge. While traditional Test", "arxiv_id": "2506.18315", "doi": "10.48550/arXiv.2506.18315"}
+{"id": "llms-prompting-unit-2024", "title": "LLMs and Prompting for Unit Test Generation: A Large-Scale Evaluation", "authors": ["Wendkûuni C. Ouédraogo", "Kader Kabore", "Haoye Tian", "Yewei Song", "Anil Koyuncu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3691620.3695330", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing, essential for identifying bugs, is often neglected due to time constraints. Automated test generation tools exist but typically lack readability and require developer intervention. Large", "doi": "10.1145/3691620.3695330"}
+{"id": "agentsllm-augmentative-generation-2025", "title": "AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework", "authors": ["Yuguang Yao", "S. Bhatnagar", "Markus Mazzola", "Vasileios Belagiannis", "Igor Gilitschenski"], "year": 2025, "venue": "IEEE/RJS International Conference on Intelligent RObots and Systems", "source_url": "https://arxiv.org/abs/2507.13729", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Rare, yet critical, scenarios pose a significant challenge in testing and evaluating autonomous driving planners. Relying solely on real-world driving scenes requires collecting massive datasets to ca", "arxiv_id": "2507.13729", "doi": "10.1109/IROS60139.2025.11246348"}
+{"id": "laagrv-llm-assisted-2024", "title": "LAAG-RV: LLM Assisted Assertion Generation for RTL Design Verification", "authors": ["Karthik Maddala", "Bhabesh Mali", "C. Karfa"], "year": 2024, "venue": "2024 IEEE 8th International Test Conference India (ITC India)", "source_url": "https://arxiv.org/abs/2409.15281", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Writing SystemVerilog Assertions (SVA) is an important but complex step in verifying Register Transfer Level (RTL) designs. Conventionally, experts need to understand the design specifications and wri", "arxiv_id": "2409.15281", "doi": "10.1109/ITCIndia62949.2024.10651860"}
+{"id": "promptpex-automatic-test-2025", "title": "PromptPex: Automatic Test Generation for Language Model Prompts", "authors": ["Reshabh K Sharma", "J. D. Halleux", "Shraddha Barke", "Benjamin Zorn"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.05070", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are being used in many applications and prompts for these models are integrated into software applications as code-like artifacts. These prompts behave much like tradition", "arxiv_id": "2503.05070", "doi": "10.48550/arXiv.2503.05070"}
+{"id": "texttoaudio-generation-instructiontuned-2023", "title": "Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model", "authors": ["Deepanway Ghosal", "Navonil Majumder", "Ambuj Mehrish", "Soujanya Poria"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2304.13731", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and ", "arxiv_id": "2304.13731", "doi": "10.48550/arXiv.2304.13731"}
+{"id": "automated-test-generation-2024", "title": "Automated test generation to evaluate tool-augmented LLMs as conversational AI agents", "authors": ["Samuel Arcadinho", "David Aparício", "Mariana Almeida"], "year": 2024, "venue": "GENBENCH", "source_url": "https://arxiv.org/abs/2409.15934", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Tool-augmented LLMs are a promising approach to create AI agents that can have realistic conversations, follow procedures, and call appropriate functions. However, evaluating them is challenging due t", "arxiv_id": "2409.15934", "doi": "10.48550/arXiv.2409.15934"}
+{"id": "beyond-testtime-compute-2025", "title": "Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference", "authors": ["Patrick Wilhelm", "Thorsten Wittkopp", "Odej Kao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3721146.3721953", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks but come with substantial energy and computational costs, particularly in request-heavy scenarios. In many real-wo", "doi": "10.1145/3721146.3721953"}
+{"id": "python-symbolic-execution-2024", "title": "Python Symbolic Execution with LLM-powered Code Generation", "authors": ["Wenhan Wang", "Kaibo Liu", "An Ran Chen", "Ge Li", "Zhi Jin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.09271", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Symbolic execution is a key technology in software testing, which generates test cases by collecting symbolic path constraints and then solving constraints with SMT solvers. Symbolic execution has bee", "arxiv_id": "2409.09271", "doi": "10.48550/arXiv.2409.09271"}
+{"id": "empirical-study-unit-2024", "title": "An Empirical Study of Unit Test Generation with Large Language Models", "authors": ["Lin Yang", "Chen Yang", "Shutao Gao", "Weijing Wang", "Bo Wang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2406.18181", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2406.18181"}
+{"id": "understandable-test-generation-2024", "title": "Understandable Test Generation Through Capture/Replay and LLMs", "authors": ["Amirhossein Deljouyi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3639478.3639789", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic unit test generators, particularly search-based software testing (SBST) tools such as EvoSuite, efficiently generate unit test suites with acceptable coverage. Although this removes the burd", "doi": "10.1145/3639478.3639789"}
+{"id": "optimizing-searchbased-unit-2024", "title": "Optimizing Search-Based Unit Test Generation with Large Language Models: An Empirical Study", "authors": ["Danni Xiao", "Yimeng Guo", "Yanhui Li", "Lin Chen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3671016.3674813", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Search-based unit test generation methods have been considered effective and widely applied, and Large Language Models (LLMs) have also demonstrated their powerful generation ability. Therefore, some ", "doi": "10.1145/3671016.3674813"}
+{"id": "no-more-manual-2023", "title": "No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation", "authors": ["Zhiqiang Yuan", "Yiling Lou", "Mingwei Liu", "Shiji Ding", "Kaixin Wang"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2305.04207", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing is essential in detecting bugs in functionally-discrete program units. Manually writing high-quality unit tests is time-consuming and laborious. Although traditional techniques can genera", "arxiv_id": "2305.04207", "doi": "10.48550/arXiv.2305.04207"}
+{"id": "autocypher-improving-llms-2024", "title": "Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework", "authors": ["Aman Tiwari", "Shiva Krishna Reddy Malay", "Vikas Yadav", "Masoud Hashemi", "Sathwik Tejaswi Madhusudhan"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2412.12612", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Graph databases like Neo4j are gaining popularity for handling complex, interconnected data, over traditional relational databases in modeling and querying relationships. While translating natural lan", "arxiv_id": "2412.12612", "doi": "10.18653/v1/2025.naacl-short.53"}
+{"id": "scissorhands-exploiting-persistence-2023", "title": "Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time", "authors": ["Zichang Liu", "Aditya Desai", "Fangshuo Liao", "Weitao Wang", "Victor Xie"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2305.17118", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models(LLMs) have sparked a new wave of exciting AI applications. Hosting these models at scale requires significant memory resources. One crucial memory bottleneck for the deployment s", "arxiv_id": "2305.17118", "doi": "10.48550/arXiv.2305.17118"}
+{"id": "zeroshot-prompting-approaches-2024", "title": "Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation", "authors": ["Kristian Kolthoff", "Felix Kretzer", "Lennart Fiebig", "Christian Bartelt", "Alexander Maedche"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.11328", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Graphical user interface (GUI) prototyping represents an essential activity in the development of interactive systems, which are omnipresent today. GUI prototypes facilitate elicitation of requirement", "arxiv_id": "2412.11328", "doi": "10.48550/arXiv.2412.11328"}
+{"id": "highquality-generation-approach-2024", "title": "A High-Quality Generation Approach for Educational Programming Projects Using LLM", "authors": ["Tian Song", "Hang Zhang", "Yijia Xiao"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TLT.2024.3499751", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: High-quality programming projects for education are critically required in teaching. However, it is hard to develop those projects efficiently and artificially constrained by the lecturers' experience", "doi": "10.1109/TLT.2024.3499751"}
+{"id": "llmaided-testbench-generation-2024", "title": "LLM-Aided Testbench Generation and Bug Detection for Finite-State Machines", "authors": ["Jitendra Bhandari", "J. Knechtel", "Ramesh Narayanaswamy", "Siddharth Garg", "Ramesh Karri"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.17132", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work investigates the potential of tailoring Large Language Models (LLMs), specifically GPT3.5 and GPT4, for the domain of chip testing. A key aspect of chip design is functional testing, which r", "arxiv_id": "2406.17132", "doi": "10.48550/arXiv.2406.17132"}
+{"id": "african-woman-rhythmic-2024", "title": "The African Woman is Rhythmic and Soulful: An Investigation of Implicit Biases in LLM Open-ended Text Generation", "authors": ["Serene Lim", "Mar'ia P'erez-Ortiz"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2407.01270", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates the subtle and often concealed biases present in Large Language Models (LLMs), focusing on implicit biases that may remain despite passing explicit bias tests. Implicit biases ", "arxiv_id": "2407.01270"}
+{"id": "zeroshot-llmguided-counterfactual-2024", "title": "Zero-shot LLM-guided Counterfactual Generation: A Case Study on NLP Model Evaluation", "authors": ["Amrita Bhattacharjee", "Raha Moraffah", "Joshua Garland", "Huan Liu"], "year": 2024, "venue": "BigData Congress [Services Society]", "source_url": "https://arxiv.org/abs/2405.04793", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the development and proliferation of large, complex, black-box models for solving many natural language processing (NLP) tasks, there is also an increasing necessity of methods to stress-test the", "arxiv_id": "2405.04793", "doi": "10.1109/BigData62323.2024.10825537"}
+{"id": "chasing-progress-not-2024", "title": "Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation", "authors": ["Sukai Huang", "Trevor Cohn", "N. Lipovetzky"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.10675", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs' reasoning skills are ineffective in planning tasks, while others rep", "arxiv_id": "2412.10675", "doi": "10.48550/arXiv.2412.10675"}
+{"id": "test-smells-llmgenerated-2024", "title": "Test smells in LLM-Generated Unit Tests", "authors": ["Wendkûuni C. Ouédraogo", "Yinghua Li", "A. Kaboré", "Xunzhu Tang", "Anil Koyuncu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.10628", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs promise to transform unit test generation from a manual burden into an automated solution. Yet, beyond metrics such as compilability or coverage, little is known about the quality of LLM-generate", "arxiv_id": "2410.10628", "doi": "10.48550/arXiv.2410.10628"}
+{"id": "model-cascading-code-2024", "title": "Model Cascading for Code: Reducing Inference Costs with Model Cascading for LLM Based Code Generation", "authors": ["Boyuan Chen", "Mingzhi Zhu", "Brendan Dolan-Gavitt", "Muhammad Shafique", "Siddharth Garg"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2405.15842", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2405.15842"}
+{"id": "initial-investigation-chatgpt-2023", "title": "An initial investigation of ChatGPT unit test generation capability", "authors": ["V. Guilherme", "A. Vincenzi"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3624032.3624035", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: Software testing ensures software quality, but developers often disregard it. The use of automated testing generation is pursued to reduce the consequences of overlooked test cases in a softw", "doi": "10.1145/3624032.3624035"}
+{"id": "adaptive-test-generation-2023", "title": "Adaptive Test Generation Using a Large Language Model", "authors": ["Max Schäfer", "Sarah Nadi", "A. Eghbali", "Frank Tip"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2302.06527", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2302.06527"}
+{"id": "llmbased-code-generation-2023", "title": "LLM-Based Code Generation Method for Golang Compiler Testing", "authors": ["Qi Gu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3611643.3617850", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern optimizing compilers are among the most complex software systems humans build. One way to identify subtle compiler bugs is fuzzing. Both the quantity and the quality of testcases are crucial to", "doi": "10.1145/3611643.3617850"}
+{"id": "reinforcement-learning-from-2023", "title": "Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation", "authors": ["Benjamin Steenhoek", "Michele Tufano", "Neel Sundaresan", "Alexey Svyatkovskiy"], "year": 2023, "venue": "Workshop on Deep Learning for Testing and Testing for Deep Learning", "source_url": "https://arxiv.org/abs/2412.14308", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because ", "arxiv_id": "2412.14308", "doi": "10.1109/DeepTest66595.2025.00011"}
+{"id": "aart-aiassisted-redteaming-2023", "title": "AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications", "authors": ["Bhaktipriya Radharapu", "Kevin Robinson", "Lora Aroyo", "Preethi Lahoti"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2311.08592", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to t", "arxiv_id": "2311.08592", "doi": "10.48550/arXiv.2311.08592"}
+{"id": "automating-rest-api-2024", "title": "Automating REST API Postman Test Cases Using LLM", "authors": ["S. Sri", "S. MohammedAadil", "R. SanjjushriVarshini", "Raja CSP Raman", "G. Rajagopal"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.10678", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the contemporary landscape of technological advancements, the automation of manual processes is crucial, compelling the demand for huge datasets to effectively train and test machines. This researc", "arxiv_id": "2404.10678", "doi": "10.48550/arXiv.2404.10678"}
+{"id": "floodbrain-flood-disaster-2023", "title": "FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM", "authors": ["Grace Colverd", "Paul Darm", "Leonard Silverberg", "Noah Kasmanoff"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.02597", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fast disaster impact reporting is crucial in planning humanitarian assistance. Large Language Models (LLMs) are well known for their ability to write coherent text and fulfill a variety of tasks relev", "arxiv_id": "2311.02597", "doi": "10.48550/arXiv.2311.02597"}
+{"id": "codecontests-highquality-test-2025", "title": "CodeContests+: High-Quality Test Case Generation for Competitive Programming", "authors": ["Zihan Wang", "Siyao Liu", "Yang Sun", "Hongyan Li", "Kai Shen"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2506.05817", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Competitive programming, due to its high reasoning difficulty and precise correctness feedback, has become a key task for both training and evaluating the reasoning capabilities of large language mode", "arxiv_id": "2506.05817", "doi": "10.48550/arXiv.2506.05817"}
+{"id": "demystifying-llmbased-software-2025", "title": "Demystifying LLM-Based Software Engineering Agents", "authors": ["Chun Xia", "Yinlin Deng", "S. Dunn", "Lingming Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3715754", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recent", "doi": "10.1145/3715754"}
+{"id": "videot1-testtime-scaling-2025", "title": "Video-T1: Test-Time Scaling for Video Generation", "authors": ["Fangfu Liu", "Hanyang Wang", "Yimo Cai", "Kaiyan Zhang", "Xiaohang Zhan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.18942", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the scale capability of increasing training data, model size, and computational cost, video generation has achieved impressive results in digital creation, enabling users to express creativity ac", "arxiv_id": "2503.18942", "doi": "10.48550/arXiv.2503.18942"}
+{"id": "rescue-ranking-llm-2023", "title": "Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation", "authors": ["Yikun Wang", "Rui Zheng", "Haoming Li", "Qi Zhang", "Tao Gui"], "year": 2023, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2311.09136", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Customizing LLMs for a specific task involves separating high-quality responses from lower-quality ones. This skill can be developed using supervised fine-tuning with extensive human preference data. ", "arxiv_id": "2311.09136", "doi": "10.18653/v1/2024.acl-srw.32"}
+{"id": "hallulens-llm-hallucination-2025", "title": "HalluLens: LLM Hallucination Benchmark", "authors": ["Yejin Bang", "Ziwei Ji", "A. Schelten", "A. Hartshorn", "T. Fowler"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2504.17550", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as\"hallucination.\"These hallucinations undermine user trust and hinder the adopt", "arxiv_id": "2504.17550", "doi": "10.48550/arXiv.2504.17550"}
+{"id": "artificial-intelligence-health-2022", "title": "Artificial intelligence for health message generation: an empirical study using a large language model (LLM) and prompt engineering", "authors": ["Sue Lim", "Ralf Schmälzle"], "year": 2022, "venue": "Frontiers in Communication", "source_url": "https://arxiv.org/abs/2212.07507", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Introduction This study introduces and examines the potential of an AI system to generate health awareness messages. The topic of folic acid, a vitamin that is critical during pregnancy, served as a t", "arxiv_id": "2212.07507", "doi": "10.3389/fcomm.2023.1129082"}
+{"id": "dynacode-dynamic-complexityaware-2025", "title": "DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation", "authors": ["Wenhao Hu", "Jinhao Duan", "Chunchen Wei", "Li Zhang", "Yue-feng Zhang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.10452", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) has significantly improved their performance in code generation tasks. However, existing code benchmarks remain static, consisting of fixed datase", "arxiv_id": "2503.10452", "doi": "10.48550/arXiv.2503.10452"}
+{"id": "dynamic-benchmarking-reasoning-2025", "title": "Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination", "authors": ["Simin Chen", "Pranav Pusarla", "Baishakhi Ray"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2503.04149", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid evolution of code largelanguage models underscores the need for effective and transparent benchmarking of their reasoning capabilities. However, the current benchmarking approach heavily dep", "arxiv_id": "2503.04149", "doi": "10.48550/arXiv.2503.04149"}
+{"id": "your-benchmark-still-2025", "title": "Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models", "authors": ["Batu Guan", "Xiao Wu", "Yuanyuan Yuan", "Shaohua Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.06643", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we tackle a critical challenge in model evaluation: how to keep code benchmarks useful when models might have already seen them during training. We introduce a novel solution, dynamic b", "arxiv_id": "2503.06643", "doi": "10.48550/arXiv.2503.06643"}
+{"id": "advancing-large-language-2025", "title": "Advancing Large Language Models in Code Generation: Usaco Benchmark and Bug Mitigation Insights", "authors": ["Jacob Trentini", "Victor Liu", "Yi Peng", "Ziliang Zong"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICPC66645.2025.00057", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, Large Language Models (LLMs) have made substantial progress in code generation, but they still frequently generate code containing logic errors or syntax bugs. While research has focused on ", "doi": "10.1109/ICPC66645.2025.00057"}
+{"id": "cruxevalx-benchmark-multilingual-2024", "title": "CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution", "authors": ["Ruiyang Xu", "Jialun Cao", "Yaojie Lu", "Hongyu Lin", "Xianpei Han"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2408.13001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code benchmarks such as HumanEval are widely adopted to evaluate Large Language Models' (LLMs) coding capabilities. However, there is an unignorable programming language bias in existing code benchmar", "arxiv_id": "2408.13001", "doi": "10.48550/arXiv.2408.13001"}
+{"id": "devbench-realistic-developerinformed-2026", "title": "DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models", "authors": ["P. Golnari", "Adarsh Kumarappan", "Wen Wen", "Xiaoyu Liu", "Gabriel Ryan"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11895", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on realistic code completion tasks. It includes 1,800 evaluation instances across six programming languages a", "arxiv_id": "2601.11895", "doi": "10.48550/arXiv.2601.11895"}
+{"id": "realmath-continuous-benchmark-2025", "title": "RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics", "authors": ["Jie Zhang", "Cezara Petrui", "Kristina Nikoli'c", "F. Tramèr"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.12575", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to cap", "arxiv_id": "2505.12575", "doi": "10.48550/arXiv.2505.12575"}
+{"id": "planning-natural-language-2024", "title": "Planning In Natural Language Improves LLM Search For Code Generation", "authors": ["Evan Z. Wang", "Federico Cassano", "Catherine Wu", "Yunfeng Bai", "Will Song"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.03733", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing com", "arxiv_id": "2409.03733", "doi": "10.48550/arXiv.2409.03733"}
+{"id": "importance-sampling-all-2025", "title": "Importance Sampling is All You Need: Predict LLM's performance on new benchmark by reusing existing benchmark", "authors": ["Ju-Ran Shi", "Wei Ma", "Shi Ying", "Lingxiao Jiang", "Yang Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.01203", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of large language models , code generation has become a key benchmark for evaluating LLM capabilities. However, existing benchmarks face two major challenges: (1) the escala", "arxiv_id": "2508.01203", "doi": "10.48550/arXiv.2508.01203"}
+{"id": "leakage-code-generation-2024", "title": "On Leakage of Code Generation Evaluation Datasets", "authors": ["Alexandre Matton", "Tom Sherborne", "Dennis Aumiller", "Elena Tommasone", "Milad Alizadeh"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2407.07565", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we consider contamination by code generation test sets, in particular in their use in modern large language models. We discuss three possible sources of such contamination and show find", "arxiv_id": "2407.07565", "doi": "10.48550/arXiv.2407.07565"}
+{"id": "evaluation-llms-syntaxaware-2024", "title": "Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks", "authors": ["Linyuan Gong", "Sida Wang", "Mostafa Elhoushi", "Alvin Cheung"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2403.04814", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. This benchmark focuses on syntax-aware comp", "arxiv_id": "2403.04814", "doi": "10.48550/arXiv.2403.04814"}
+{"id": "turning-tide-repositorybased-2025", "title": "Turning the Tide: Repository-based Code Reflection", "authors": ["Wei Zhang", "Jian Yang", "Jiaxin Yang", "Ya Wang", "Zhoujun Li"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2507.09866", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code large language models (LLMs) enhance programming by understanding and generating code across languages, offering intelligent feedback, bug detection, and code updates through reflection, improvin", "arxiv_id": "2507.09866", "doi": "10.48550/arXiv.2507.09866"}
+{"id": "codereviewqa-code-review-2025", "title": "CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models", "authors": ["H. Lin", "Chunhua Liu", "Haoyu Gao", "Patanamon Thongtanunam", "Christoph Treude"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.16167", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: State-of-the-art large language models (LLMs) have demonstrated impressive code generation capabilities but struggle with real-world software engineering tasks, such as revising source code to address", "arxiv_id": "2503.16167", "doi": "10.48550/arXiv.2503.16167"}
+{"id": "bamboo-comprehensive-benchmark-2023", "title": "BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models", "authors": ["Zican Dong", "Tianyi Tang", "Junyi Li", "Wayne Xin Zhao", "Ji-Rong Wen"], "year": 2023, "venue": "International Conference on Language Resources and Evaluation", "source_url": "https://arxiv.org/abs/2309.13345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length. Recently, multiple studies have committed to extending the context length and enhancing the long text", "arxiv_id": "2309.13345", "doi": "10.48550/arXiv.2309.13345"}
+{"id": "beyond-accuracy-evaluating-2025", "title": "Beyond Accuracy: Evaluating and Explaining the Capability Boundaries of Large Language Models in Syntax-Preserving Code Translation", "authors": ["Yaxin Zhao", "Qi Han", "Hui Shu", "Yan Guang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.32604/cmc.2025.070511", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : Large Language Models (LLMs) are increasingly applied in the field of code translation. However, existing evaluation methodologies suffer from two major limitations: (1) the high overlap between tes", "doi": "10.32604/cmc.2025.070511"}
+{"id": "corecodebench-decoupling-code-2025", "title": "CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks", "authors": ["Lingyue Fu", "Hao Guan", "Bolun Zhang", "Haowei Yuan", "Yaoming Zhu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2507.05281", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The evaluation of Large Language Models (LLMs) for software engineering has shifted towards complex, repository-level tasks. However, existing benchmarks predominantly rely on coarse-grained pass rate", "arxiv_id": "2507.05281"}
+{"id": "featbench-more-realistic-2025", "title": "FeatBench: Towards More Realistic Evaluation of Feature-level Code Generation", "authors": ["Hao Chen", "Chengze Li", "Jia Li"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.22237", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating Large Language Models (LLMs) on repository-level feature implementation is a critical frontier in software engineering. However, establishing a benchmark that faithfully mirrors realistic d", "arxiv_id": "2509.22237"}
+{"id": "specializing-llms-code-2025", "title": "Specializing LLMs for Code: A Framework for Efficient Adaptation and Robust Evaluation", "authors": ["Aditya Singh", "Jianhua Chen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FLLM67465.2025.11391170", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) show promise in code generation, they often lack the specialized reasoning ability needed for complex, novel programming tasks. This paper introduces a complete fram", "doi": "10.1109/FLLM67465.2025.11391170"}
+{"id": "can-llms-write-2025", "title": "Can LLMs Write Fast System-Aware Numerical Computation Code?", "authors": ["Xin Yang", "Bintao Tang", "Yuhao Wang", "Zimo Ji", "Wenyuan Jiang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SMC58881.2025.11343280", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Large Language Models (LLMs) have demonstrated impressive capabilities in code generation and mathematical reasoning, their ability to produce correct and highly optimized numerical computation ", "doi": "10.1109/SMC58881.2025.11343280"}
+{"id": "beyond-correctness-benchmarking-2024", "title": "Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models", "authors": ["Jiasheng Zheng", "Boxi Cao", "Zheng Ma", "Ruotong Pan", "Hongyu Lin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.11470", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, current benchmarks primarily assess the accuracy", "arxiv_id": "2407.11470", "doi": "10.48550/arXiv.2407.11470"}
+{"id": "probing-language-models-2024", "title": "Probing Language Models for Pre-training Data Detection", "authors": ["Zhenhua Liu", "Tong Zhu", "Chuanyuan Tan", "Haonan Lu", "Bing Liu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.01333", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the ", "arxiv_id": "2406.01333", "doi": "10.48550/arXiv.2406.01333"}
+{"id": "codemorph-mitigating-data-2025", "title": "CODEMORPH: Mitigating Data Leakage in Large Language Model Assessment", "authors": ["Hongzhou Rao", "Yanjie Zhao", "Wenjie Zhu", "Ling Xiao", "Meizhen Wang"], "year": 2025, "venue": "2025 IEEE/ACM 47th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)", "source_url": "https://arxiv.org/abs/2506.17627", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Concerns about benchmark leakage in large language models for code (Code LLMs) have raised issues of data contamination and inflated evaluation metrics. The diversity and inaccessibility of many train", "arxiv_id": "2506.17627", "doi": "10.1109/ICSE-Companion66252.2025.00081"}
+{"id": "throwbench-benchmarking-llms-2025", "title": "ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions", "authors": ["Julian Aron Prenner", "Romain Robbes"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.04241", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern Large Language Models (LLMs) have shown astounding capabilities of code understanding and synthesis. In order to assess such capabilities, several benchmarks have been devised (e.g., HumanEval)", "arxiv_id": "2503.04241", "doi": "10.48550/arXiv.2503.04241"}
+{"id": "narrowing-complexity-gap-2026", "title": "Narrowing the Complexity Gap in the Evaluation of Large Language Models", "authors": ["Yang Chen", "Shu-Ya Liu", "Reyhaneh Jabbarvand"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.18928", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating Large Language Models (LLMs) with respect to real-world code complexity is essential. Otherwise, there is a risk of overestimating LLMs'programming abilities based on simplistic benchmarks,", "arxiv_id": "2602.18928"}
+{"id": "evaluating-large-language-2024-2", "title": "Evaluating Large Language Models for Generalization and Robustness via Data Compression", "authors": ["Yucheng Li", "Yunhao Guo", "Frank Guerin", "Chenghua Lin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.00861", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Existing methods for evaluating large language models face challenges such as data contamination, sensitivity to prompts, and the high cost of benchmark creation. To address this, we propose a lossles", "arxiv_id": "2402.00861", "doi": "10.48550/arXiv.2402.00861"}
+{"id": "codeinsight-curated-dataset-2024", "title": "CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow", "authors": ["Jacob Austin", "Augustus Odena", "Maxwell I. Nye", "Maarten Bosma", "H. Michalewski"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2409.16819", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce a novel dataset tailored for code generation, aimed at aiding developers in common tasks. Our dataset provides examples that include a clarified intent, code snippets associated, and an a", "arxiv_id": "2409.16819", "doi": "10.18653/v1/2024.findings-acl.354"}
+{"id": "structtest-benchmarking-llms-2024", "title": "StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs", "authors": ["Hailin Chen", "Fangkai Jiao", "Mathieu Ravaut", "Nawshad Farruque", "Xuan-Phi Nguyen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.18011", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) demands robust, unbiased, and scalable evaluation methods. However, human annotations are costly to scale, model-based evaluations are susceptible", "arxiv_id": "2412.18011", "doi": "10.48550/arXiv.2412.18011"}
+{"id": "benchmarking-large-language-2024", "title": "Benchmarking Large Language Models with Integer Sequence Generation Tasks", "authors": ["Dan O’Malley", "Manish Bhattarai", "Javier E. Santos"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.04372", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a novel benchmark designed to rigorously evaluate the capabilities of large language models (LLMs) in mathematical reasoning and algorithmic code synthesis tasks. The benchmark comprises in", "arxiv_id": "2411.04372", "doi": "10.48550/arXiv.2411.04372"}
+{"id": "webbench-llm-code-2025", "title": "Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks", "authors": ["Kai Xu", "Yi Mao", "Xin Guan", "Zilong Feng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.07473", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The application of large language models (LLMs) in the field of coding is evolving rapidly: from code assistants, to autonomous coding agents, and then to generating complete projects through natural ", "arxiv_id": "2505.07473", "doi": "10.48550/arXiv.2505.07473"}
+{"id": "feabench-benchmark-evaluating-2025", "title": "FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation", "authors": ["Wei Li", "Xin Zhang", "Zhongxin Guo", "Shaoguang Mao", "Wen Luo"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2503.06680", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Implementing new features in repository-level codebases is a crucial application of code generation models. However, current benchmarks lack a dedicated evaluation framework for this capability. To fi", "arxiv_id": "2503.06680", "doi": "10.48550/arXiv.2503.06680"}
+{"id": "ojbench-competition-level-2025", "title": "OJBench: A Competition Level Code Benchmark For Large Language Models", "authors": ["Zhexu Wang", "Yiping Liu", "Yejie Wang", "Wen He", "Bofei Gao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.16395", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in large language models (LLMs) have demonstrated significant progress in math and code reasoning capabilities. However, existing code benchmark are limited in their ability to eva", "arxiv_id": "2506.16395", "doi": "10.48550/arXiv.2506.16395"}
+{"id": "autocodebench-large-language-2025", "title": "AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators", "authors": ["Jason Chou", "Ao Liu", "Yuchi Deng", "Zhiyin Zeng", "Tao Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.09101", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to eva", "arxiv_id": "2508.09101", "doi": "10.48550/arXiv.2508.09101"}
+{"id": "detecting-benchmark-contamination-2025", "title": "Detecting Benchmark Contamination Through Watermarking", "authors": ["Tom Sander", "Pierre Fernandez", "Saeed Mahloujifar", "Alain Durmus", "Chuan Guo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.17259", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Benchmark contamination poses a significant challenge to the reliability of Large Language Models (LLMs) evaluations, as it is difficult to assert whether a model has been trained on a test set. We in", "arxiv_id": "2502.17259", "doi": "10.48550/arXiv.2502.17259"}
+{"id": "fragility-benchmark-contamination-2025", "title": "On The Fragility of Benchmark Contamination Detection in Reasoning Models", "authors": ["Han Wang", "Haoyu Li", "Brian Ko", "Huan Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.02386", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Leaderboards for LRMs have turned evaluation into a competition, incentivizing developers to optimize directly on benchmark suites. A shortcut to achieving higher rankings is to incorporate evaluation", "arxiv_id": "2510.02386", "doi": "10.48550/arXiv.2510.02386"}
+{"id": "cruxeval-benchmark-code-2024", "title": "CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution", "authors": ["Alex Gu", "Baptiste Rozière", "Hugh Leather", "Armando Solar-Lezama", "Gabriel Synnaeve"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2401.03065", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to tw", "arxiv_id": "2401.03065", "doi": "10.48550/arXiv.2401.03065"}
+{"id": "plot2code-comprehensive-benchmark-2025", "title": "Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots", "authors": ["Chengyue Wu", "Yixiao Ge", "Qiushan Guo", "Jiahao Wang", "Zhixuan Liang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.findings-naacl.164", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ,", "doi": "10.18653/v1/2025.findings-naacl.164"}
+{"id": "livebench-challenging-contaminationfree-2024", "title": "LiveBench: A Challenging, Contamination-Free LLM Benchmark", "authors": ["Colin White", "Samuel Dooley", "∗. ManleyRoberts", "Arka Pal", "Ben Feuer"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2406.19314", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2406.19314"}
+{"id": "beyond-memorization-reasoningdriven-2025", "title": "Beyond Memorization: Reasoning-Driven Synthesis as a Mitigation Strategy Against Benchmark Contamination", "authors": ["Terry Jingchen Zhang", "Gopal Dev", "Ning Wang", "Nicole Ni", "Wenyuan Jiang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.00072", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Capability evaluation of large language models (LLMs) is increasingly shadowed by rising concerns of data contamination that cast doubts on whether static benchmarks measure genuine reasoning or mere ", "arxiv_id": "2509.00072", "doi": "10.48550/arXiv.2509.00072"}
+{"id": "designbench-comprehensive-benchmark-2025", "title": "DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation", "authors": ["Jingyu Xiao", "Ming Wang", "Man Ho Lam", "Yuxuan Wan", "Junliang Liu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.06251", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in automated front-end engineering, e.g., generating UI code from visual designs. However, existing front-end UI code", "arxiv_id": "2506.06251", "doi": "10.48550/arXiv.2506.06251"}
+{"id": "astrovisbench-code-benchmark-2025", "title": "AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy", "authors": ["S. Joseph", "Syed Murtaza Husain", "S. Offner", "Stéphanie Juneau", "Paul Torrey"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.20538", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research ideas, and ", "arxiv_id": "2505.20538", "doi": "10.48550/arXiv.2505.20538"}
+{"id": "pacost-paired-confidence-2024", "title": "PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models", "authors": ["Huixuan Zhang", "Yun Lin", "Xiaojun Wan"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.18326", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are known to be trained on vast amounts of data, which may unintentionally or intentionally include data from commonly used benchmarks. This inclusion can lead to cheating", "arxiv_id": "2406.18326", "doi": "10.48550/arXiv.2406.18326"}
+{"id": "livebench-challenging-contaminationlimited-2024", "title": "LiveBench: A Challenging, Contamination-Limited LLM Benchmark", "authors": ["Colin White", "Samuel Dooley", "Manley Roberts", "Arka Pal", "Ben Feuer"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2406.19314", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To", "arxiv_id": "2406.19314"}
+{"id": "codecriticbench-holistic-code-2025", "title": "CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models", "authors": ["Alex L. Zhang", "Marcus Dong", "Jiaheng Liu", "Wei Zhang", "Yejie Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.16614", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The critique capacity of Large Language Models (LLMs) is essential for reasoning abilities, which can provide necessary suggestions (e.g., detailed analysis and constructive feedback). Therefore, how ", "arxiv_id": "2502.16614", "doi": "10.48550/arXiv.2502.16614"}
+{"id": "webmmu-benchmark-multimodal-2025", "title": "WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation", "authors": ["Rabiul Awal", "Mahsa Massoud", "Zichao Li", "Aarash Feizi", "Suyuchen Wang"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2508.16763", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing involving HTML/CSS/JavaScript, and (3) mockup-to-code generatio", "arxiv_id": "2508.16763", "doi": "10.48550/arXiv.2508.16763"}
+{"id": "proving-coding-interview-2025", "title": "Proving the Coding Interview: A Benchmark for Formally Verified Code Generation", "authors": ["Quinn Dougherty", "Ronak D. Mehta"], "year": 2025, "venue": "2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)", "source_url": "https://arxiv.org/abs/2502.05714", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples for writing programs and proving their correctness, the largest formal verification ", "arxiv_id": "2502.05714", "doi": "10.1109/LLM4Code66737.2025.00014"}
+{"id": "emperors-new-clothes-2025", "title": "The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination", "authors": ["Yifan Sun", "Hanru Wang", "Dongbai Li", "Ganghui Wang", "Huan Zhang"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2503.16402", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Benchmark Data Contamination (BDC)-the inclusion of benchmark testing samples in the training set-has raised increasing concerns in Large Language Model (LLM) evaluation, leading to falsely inflated p", "arxiv_id": "2503.16402", "doi": "10.48550/arXiv.2503.16402"}
+{"id": "mmlucf-contaminationfree-multitask-2024", "title": "MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark", "authors": ["Qihao Zhao", "Yangyu Huang", "Tengchao Lv", "Lei Cui", "Qinzheng Sun"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.15194", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multiple-choice question (MCQ) datasets like Massive Multitask Language Understanding (MMLU) are widely used to evaluate the commonsense, understanding, and problem-solving abilities of large language", "arxiv_id": "2412.15194", "doi": "10.48550/arXiv.2412.15194"}
+{"id": "exploring-code-language-2025", "title": "Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis", "authors": ["Jiahao Gai", "H. Chen", "Zhican Wang", "Hongyu Zhou", "Wanru Zhao"], "year": 2025, "venue": "Asia and South Pacific Design Automation Conference", "source_url": "https://arxiv.org/abs/2502.13921", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities ", "arxiv_id": "2502.13921", "doi": "10.1145/3658617.3697616"}
+{"id": "humanevalxl-multilingual-code-2024", "title": "HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization", "authors": ["Qiwei Peng", "Yekun Chai", "Xuhong Li"], "year": 2024, "venue": "International Conference on Language Resources and Evaluation", "source_url": "https://arxiv.org/abs/2402.16694", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have made significant progress in generating codes from textual prompts. However, existing benchmarks have mainly concentrated on translating English prompts to multilingu", "arxiv_id": "2402.16694", "doi": "10.48550/arXiv.2402.16694"}
+{"id": "evocodebench-evolving-code-2024", "title": "EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories", "authors": ["Jia Li", "Ge Li", "Xuanming Zhang", "Yihong Dong", "Zhi Jin"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.00599", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: How to evaluate Large Language Models (LLMs) in code generation is an open question. Existing benchmarks demonstrate poor alignment with real-world code repositories and are insufficient to evaluate t", "arxiv_id": "2404.00599", "doi": "10.48550/arXiv.2404.00599"}
+{"id": "deveval-manuallyannotated-code-2024", "title": "DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories", "authors": ["Jia Li", "Ge Li", "Yunfei Zhao", "Yongming Li", "Huanyu Liu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2405.19856", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficien", "arxiv_id": "2405.19856", "doi": "10.48550/arXiv.2405.19856"}
+{"id": "geoanalystbench-geoai-benchmark-2025", "title": "GeoAnalystBench: A GeoAI Benchmark for Assessing Large Language Models for Spatial Analysis Workflow and Code Generation", "authors": ["Qianheng Zhang", "Song Gao", "Chen Wei", "Yibo Zhao", "Ying Nie"], "year": 2025, "venue": "Trans. GIS", "source_url": "https://arxiv.org/abs/2509.05881", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models (LLMs) have fueled growing interest in automating geospatial analysis and GIS workflows, yet their actual capabilities remain uncertain. In this work, we call ", "arxiv_id": "2509.05881", "doi": "10.1111/tgis.70135"}
+{"id": "redcode-risky-code-2024", "title": "RedCode: Risky Code Execution and Generation Benchmark for Code Agents", "authors": ["Chengquan Guo", "Xun Liu", "Chulin Xie", "Andy Zhou", "Yi Zeng"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2411.07781", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding, safety concerns, such as generating or executing risky code, have become significant barriers to the real-w", "arxiv_id": "2411.07781", "doi": "10.48550/arXiv.2411.07781"}
+{"id": "how-should-i-2025", "title": "How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs", "authors": ["Jialun Cao", "Yuk-Kit Chan", "Zixuan Ling", "Wenxuan Wang", "Shuqing Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2501.10711", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2501.10711"}
+{"id": "webuibench-comprehensive-benchmark-2025", "title": "WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code", "authors": ["Zhiyu Lin", "Zhengda Zhou", "Zhiyuan Zhao", "Tianrui Wan", "Yi-An Ma"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2506.07818", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of Generative AI technology, Multimodal Large Language Models(MLLMs) have the potential to act as AI software engineers capable of executing complex web application developm", "arxiv_id": "2506.07818", "doi": "10.48550/arXiv.2506.07818"}
+{"id": "dacode-agent-data-2024", "title": "DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models", "authors": ["Yiming Huang", "Jianwen Luo", "Yang Yu", "Yitong Zhang", "Fangyu Lei"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2410.07331", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code a", "arxiv_id": "2410.07331", "doi": "10.48550/arXiv.2410.07331"}
+{"id": "mercury-code-efficiency-2024", "title": "Mercury: A Code Efficiency Benchmark for Code Large Language Models", "authors": ["Mingzhe Du", "A. Luu", "Bin Ji", "Qian Liu", "See-Kiong Ng"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2402.07844", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Amidst the recent strides in evaluating Large Language Models for Code (Code LLMs), existing benchmarks have mainly focused on the functional correctness of generated code, neglecting the importance o", "arxiv_id": "2402.07844", "doi": "10.52202/079017-0529"}
+{"id": "evocodebench-evolving-code-2024-2", "title": "EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations", "authors": ["Jia Li", "Ge Li", "Xuanming Zhang", "Yunfei Zhao", "Yihong Dong"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2410.22821", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: How to evaluate Large Language Models (LLMs) in code generation remains an open question. Existing benchmarks have two limitations - data leakage and lack of domain-specific evaluation. The former hur", "arxiv_id": "2410.22821", "doi": "10.48550/arXiv.2410.22821"}
+{"id": "complexcodeeval-benchmark-evaluating-2024", "title": "ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code", "authors": ["Jia Feng", "Jiachen Liu", "Cuiyun Gao", "Chun Yong Chong", "Chaozheng Wang"], "year": 2024, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2409.10280", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, with the widespread attention of academia and industry on the application of large language models (LLMs) to code-related tasks, an increasing number of large code models (LCMs) have ", "arxiv_id": "2409.10280", "doi": "10.1145/3691620.3695552"}
+{"id": "mercury-efficiency-benchmark-2024", "title": "Mercury: An Efficiency Benchmark for LLM Code Synthesis", "authors": ["Mingzhe Du", "A. Luu", "Bin Ji", "See-Kiong Ng"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2402.07844", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2402.07844"}
+{"id": "how-efficient-llmgenerated-2024", "title": "How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark", "authors": ["Ruizhong Qiu", "Wei Zeng", "Hanghang Tong", "James Ezick", "Christopher Lott"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2406.06647", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Advancement of LLM-based program synthesis calls for a thorough evaluation of LLM-generated c", "arxiv_id": "2406.06647", "doi": "10.48550/arXiv.2406.06647"}
+{"id": "domaineval-autoconstructed-benchmark-2024", "title": "DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation", "authors": ["Qiming Zhu", "Jialun Cao", "Yaojie Lu", "Hongyu Lin", "Xianpei Han"], "year": 2024, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2408.13204", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code benchmarks such as HumanEval are widely adopted to evaluate the capabilities of Large Language Models (LLMs), providing insights into their strengths and weaknesses. However, current benchmarks p", "arxiv_id": "2408.13204", "doi": "10.48550/arXiv.2408.13204"}
+{"id": "quantifying-contamination-evaluating-2024", "title": "Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models", "authors": ["Martin Riddell", "Ansong Ni", "Arman Cohan"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2403.04811", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models have achieved remarkable performance on various code generation benchmarks, there have been growing concerns regarding potential contamination of these benchmarks as they m", "arxiv_id": "2403.04811", "doi": "10.48550/arXiv.2403.04811"}
+{"id": "qiskit-humaneval-evaluation-2024", "title": "Qiskit HumanEval: An Evaluation Benchmark for Quantum Code Generative Models", "authors": ["Sanjay Vishwakarma", "Francis Harkins", "Siddharth Golecha", "Vishal Sharathchandra Bajpe", "Nicolas Dupuis"], "year": 2024, "venue": "International Conference on Quantum Computing and Engineering", "source_url": "https://arxiv.org/abs/2406.14712", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Quantum programs are typically developed using quantum Software Development Kits (SDKs). The rapid advancement of quantum computing necessitates new tools to streamline this development process, and o", "arxiv_id": "2406.14712", "doi": "10.1109/QCE60285.2024.00137"}
+{"id": "iaceval-code-generation-2024", "title": "IaC-Eval: A Code Generation Benchmark for Cloud Infrastructure-as-Code Programs", "authors": ["Patrick Tser Jern Kon", "Jiachen Liu", "Yiming Qiu", "Weijun Fan", "Ting He"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.52202/079017-4273", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Infrastructure-as-Code (IaC), an important component of cloud computing, allows the definition of cloud infrastructure in high-level programs. However, developing IaC programs is challenging, complica", "doi": "10.52202/079017-4273"}
+{"id": "nlp-evaluation-trouble-2023", "title": "NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark", "authors": ["Oscar Sainz", "Jon Ander Campos", "Iker García-Ferrero", "Julen Etxaniz", "Oier López de Lacalle"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2310.18018", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a ", "arxiv_id": "2310.18018", "doi": "10.48550/arXiv.2310.18018"}
+{"id": "has-my-code-2025", "title": "Has My Code Been Stolen for Model Training? A Naturalness Based Approach to Code Contamination Detection", "authors": ["Haris Ali Khan", "Yanjie Jiang", "Qasim Umer", "Yuxia Zhang", "Waseem Akram"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3715765", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: It is often valuable to know whether a given piece of source code has or hasn’t been used to train a given deep learning model. On one side, it helps avoid data contamination problems that may exagger", "doi": "10.1145/3715765"}
+{"id": "repotransbench-realworld-multilingual-2024", "title": "RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation", "authors": ["Yanlin Wang", "Yanlin Wang", "Suiquan Wang", "Daya Guo", "Jiachi Chen"], "year": 2024, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2412.17744", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Repository-level code translation refers to translating an entire code repository from one programming language to another while preserving the functionality of the source repository. Many benchmarks ", "arxiv_id": "2412.17744", "doi": "10.1109/TSE.2025.3645056"}
+{"id": "classeval-manuallycrafted-benchmark-2023", "title": "ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation", "authors": ["Xueying Du", "Mingwei Liu", "Kaixin Wang", "Hanlin Wang", "Junwei Liu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2308.01861", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code genera", "arxiv_id": "2308.01861", "doi": "10.48550/arXiv.2308.01861"}
+{"id": "crosscodeeval-diverse-multilingual-2023", "title": "CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion", "authors": ["Yangruibo Ding", "Zijian Wang", "Wasi Uddin Ahmad", "Hantian Ding", "Ming Tan"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2310.11248", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single", "arxiv_id": "2310.11248", "doi": "10.48550/arXiv.2310.11248"}
+{"id": "code-benchmark-httf-2023", "title": "Code Benchmark of the HTTF Pressurized Conduction Cooldown Test Using SAM", "authors": ["T. Hua", "L. Zou", "Rui Hu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1080/00295639.2023.2186163", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract The High Temperature Test Facility (HTTF) at Oregon State University is an integral system test facility to simulate postulated reactor transients of prismatic high-temperature gas-cooled rea", "doi": "10.1080/00295639.2023.2186163"}
+{"id": "concerned-data-contamination-2024", "title": "Concerned with Data Contamination? Assessing Countermeasures in Code Language Model", "authors": ["Jialun Cao", "Wuqi Zhang", "S. Cheung"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.16898", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Various techniques have been proposed to leverage the capabilities of code language models (CLMs) for SE tasks. While these techniques typically evaluate their effectiveness using publicly available d", "arxiv_id": "2403.16898", "doi": "10.48550/arXiv.2403.16898"}
+{"id": "drawing-pandas-benchmark-2024", "title": "Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code", "authors": ["Timur Galimzyanov", "Sergey Titov", "Yaroslav Golubev", "Egor Bogomolov"], "year": 2024, "venue": "IEEE Working Conference on Mining Software Repositories", "source_url": "https://arxiv.org/abs/2412.02764", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces the human-curated Pandas-PlotBench dataset, designed to evaluate language models’ effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code", "arxiv_id": "2412.02764", "doi": "10.1109/MSR66628.2025.00083"}
+{"id": "pythonsaga-redefining-benchmark-2024", "title": "PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMs", "authors": ["Ankit Yadav", "Himanshu Beniwal", "Mayank Singh"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2401.03855", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Driven by the surge in code generation using large language models (LLMs), numerous benchmarks have emerged to evaluate these LLMs capabilities. We conducted a large-scale human evaluation of HumanEva", "arxiv_id": "2401.03855", "doi": "10.18653/v1/2024.findings-emnlp.996"}
+{"id": "minicodeprops-minimal-benchmark-2024", "title": "miniCodeProps: a Minimal Benchmark for Proving Code Properties", "authors": ["Evan Lohn", "S. Welleck"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.11915", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents have shown initial promise in automating mathematical theorem proving in proof assistants such as Lean. The same proof assistants can be used to verify the correctness of code by pairing cod", "arxiv_id": "2406.11915", "doi": "10.48550/arXiv.2406.11915"}
+{"id": "codemmlu-multitask-benchmark-2024", "title": "CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding&Reasoning Capabilities of CodeLLMs", "authors": ["Dũng Nguyễn Mạnh", "Thang Phan Chau", "Nam Le Hai", "Thong T. Doan", "Nam V. Nguyen"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2410.01999", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this ", "arxiv_id": "2410.01999"}
+{"id": "codemmlu-multitask-benchmark-2024-2", "title": "CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs", "authors": ["Dũng Nguyễn Mạnh", "Thang Phan Chau", "Nam Le Hai", "Thong T. Doan", "Nam V. Nguyen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2410.01999", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2410.01999"}
+{"id": "collubench-benchmark-predicting-2024", "title": "Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code", "authors": ["Nan Jiang", "Qi Li", "Lin Tan", "Tian-yu Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.09997", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite their success, large language models (LLMs) face the critical challenge of hallucinations, generating plausible but incorrect content. While much research has focused on hallucinations in mult", "arxiv_id": "2410.09997", "doi": "10.48550/arXiv.2410.09997"}
+{"id": "webapp1k-practical-codegeneration-2024", "title": "WebApp1K: A Practical Code-Generation Benchmark for Web App Development", "authors": ["Yi Cui"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.00019", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce WebApp1K, a practical code-generation benchmark to measure LLM ability to develop web apps. This benchmark aims to calibrate LLM output and aid the models to progressively improve code co", "arxiv_id": "2408.00019", "doi": "10.48550/arXiv.2408.00019"}
+{"id": "metrex-benchmark-verilog-2024", "title": "MetRex: A Benchmark for Verilog Code Metric Reasoning Using LLMs", "authors": ["Manar Abdelatty", "Jingxiao Ma", "Sherief Reda"], "year": 2024, "venue": "Asia and South Pacific Design Automation Conference", "source_url": "https://arxiv.org/abs/2411.03471", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have been applied to various hardware design tasks, including Verilog code generation, EDA tool scripting, and RTL bug fixing. Despite this extensive exploration, LLMs are", "arxiv_id": "2411.03471", "doi": "10.1145/3658617.3697625"}
+{"id": "rethinking-benchmark-contamination-2023", "title": "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples", "authors": ["Shuo Yang", "Wei-Lin Chiang", "Lianmin Zheng", "Joseph Gonzalez", "Ion Stoica"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.04850", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models are increasingly trained on all the data ever produced by humans. Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-t", "arxiv_id": "2311.04850", "doi": "10.48550/arXiv.2311.04850"}
+{"id": "top-general-performance-2024", "title": "Top General Performance = Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark", "authors": ["Dewu Zheng", "Yanlin Wang", "Ensheng Shi", "Xilin Liu", "Yuchi Ma"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2412.18573", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of large language models (LLMs), extensive research has been conducted to investigate the code generation capabilities of LLMs. However, existing efforts primarily focus on ", "arxiv_id": "2412.18573"}
+{"id": "javabench-benchmark-objectoriented-2024", "title": "JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models", "authors": ["Jialun Cao", "Zhiyong Chen", "Jiarong Wu", "S. Cheung", "Chang Xu"], "year": 2024, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2406.12902", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs’ capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, i", "arxiv_id": "2406.12902", "doi": "10.1145/3691620.3695470"}
+{"id": "humanevo-evolutionaware-benchmark-2024", "title": "HumanEvo: An Evolution-Aware Benchmark for More Realistic Evaluation of Repository-Level Code Generation", "authors": ["Dewu Zheng", "Yanlin Wang", "Ensheng Shi", "Ruikai Zhang", "Yuchi Ma"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2406.06918", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To evaluate the repository-level code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation methods have been developed. These m", "arxiv_id": "2406.06918", "doi": "10.1109/ICSE55347.2025.00228"}
+{"id": "crqbench-benchmark-code-2024", "title": "CRQBench: A Benchmark of Code Reasoning Questions", "authors": ["Elizabeth Dinella", "Satish Chandra", "Petros Maniatis"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.08453", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models have demonstrated exceptional proficiency on coding tasks, but it is challenging to precisely evaluate their code reasoning ability. Existing benchmarks are insufficient as they ", "arxiv_id": "2408.08453", "doi": "10.48550/arXiv.2408.08453"}
+{"id": "verification-rmcsaragr-nuclear-2024", "title": "Verification of the RMC-SaraGR Nuclear Design Code System Based on the HTTR Benchmark", "authors": ["Yuan Yuan", "Guoming Liu", "Peng Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1115/icone31-135368", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \n In order to perform a detailed nuclear analysis for the small modular prismatic HTGRs, the RMC-SaraGR nuclear design code system has been developed. The verification of the code system using the HTT", "doi": "10.1115/icone31-135368"}
+{"id": "good-bad-exploring-2024", "title": "The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)", "authors": ["Shenglai Zeng", "Jiankun Zhang", "Pengfei He", "Yue Xing", "Yiding Liu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.16893", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has dem", "arxiv_id": "2402.16893", "doi": "10.48550/arXiv.2402.16893"}
+{"id": "hypergraphrag-retrievalaugmented-generation-2025", "title": "HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation", "authors": ["Haoran Luo", "E. Haihong", "Guanting Chen", "Yandan Zheng", "Xiaobao Wu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.21322", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Standard Retrieval-Augmented Generation (RAG) relies on chunk-based retrieval, whereas GraphRAG advances this approach by graph-based knowledge representation. However, existing graph-based RAG approa", "arxiv_id": "2503.21322"}
+{"id": "lissa-generic-traceability-2025", "title": "LiSSA: Toward Generic Traceability Link Recovery Through Retrieval- Augmented Generation", "authors": ["Dominik Fuchß", "Tobias Hey", "Jan Keim", "Haoyu Liu", "Niklas Ewald"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE55347.2025.00186", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: There are a multitude of software artifacts which need to be handled during the development and maintenance of a software system. These artifacts interrelate in multiple, complex ways. Therefore, many", "doi": "10.1109/ICSE55347.2025.00186"}
+{"id": "cast-enhancing-code-2025", "title": "cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree", "authors": ["Yilin Zhang", "Xinran Zhao", "Z. Z. Wang", "Chenyang Yang", "Jiayi Wei"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2506.15655", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) has become essential for large-scale code generation, grounding predictions in external code corpora to improve actuality. However, a critical yet underexplored as", "arxiv_id": "2506.15655", "doi": "10.48550/arXiv.2506.15655"}
+{"id": "from-code-generation-2025", "title": "From Code Generation to Software Testing: AI Copilot With Context-Based Retrieval-Augmented Generation", "authors": ["Yuchen Wang", "Shangxin Guo", "C. Tan"], "year": 2025, "venue": "IEEE Software", "source_url": "https://arxiv.org/abs/2504.01866", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid pace of large-scale software development places increasing demands on traditional testing methodologies. We propose a novel perspective on software testing, highlighting the transformative p", "arxiv_id": "2504.01866", "doi": "10.1109/MS.2025.3549628"}
+{"id": "prattack-coordinated-promptrag-2025", "title": "PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization", "authors": ["Yang Jiao", "Xiaodong Wang", "Kai Yang"], "year": 2025, "venue": "Annual International ACM SIGIR Conference on Research and Development in Information Retrieval", "source_url": "https://arxiv.org/abs/2504.07717", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications, e.g., medical question-answering, mathematical sciences, and code generation. However, they a", "arxiv_id": "2504.07717", "doi": "10.1145/3726302.3730058"}
+{"id": "empirical-study-retrievalaugmented-2025", "title": "An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities", "authors": ["Zezhou Yang", "Sirong Chen", "Cuiyun Gao", "Zhenhao Li", "Xing Hu"], "year": 2025, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2501.13742", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre", "arxiv_id": "2501.13742", "doi": "10.1145/3717061"}
+{"id": "recode-improving-llmbased-2025", "title": "ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation", "authors": ["Yicong Zhao", "Shisong Chen", "Jiacheng Zhang", "Zhixu Li"], "year": 2025, "venue": "International Conference on Information and Knowledge Management", "source_url": "https://arxiv.org/abs/2509.02330", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models (LLMs) have demonstrated impressive capabilities in code-related tasks such as code generation and automated program repair. Despite their promising performanc", "arxiv_id": "2509.02330", "doi": "10.1145/3746252.3761035"}
+{"id": "codepromptzip-codespecific-prompt-2025", "title": "CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs", "authors": ["Pengfei He", "Shaowei Wang", "T. Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.14925", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. However, lengthy prompts, often exceeding tens of thousands of tokens, introduce chall", "arxiv_id": "2502.14925", "doi": "10.48550/arXiv.2502.14925"}
+{"id": "deep-dive-into-2025", "title": "A Deep Dive into Retrieval-Augmented Generation for Code Completion: Experience on WeChat", "authors": ["Zezhou Yang", "Ting Peng", "Cuiyun Gao", "Chaozheng Wang", "Hailiang Huang"], "year": 2025, "venue": "IEEE International Conference on Software Maintenance and Evolution", "source_url": "https://arxiv.org/abs/2507.18515", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, a crucial task in software engineering that enhances developer productivity, has seen substantial improvements with the rapid advancement of large language models (LLMs). In recent ye", "arxiv_id": "2507.18515", "doi": "10.1109/ICSME64153.2025.00062"}
+{"id": "planrag-planthenretrieval-augmented-2024", "title": "PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers", "authors": ["Myeong-ae Lee", "Seonho An", "Min-Soo Kim"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.12430", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define **Decision QA** as the task of answering the best decision, d_{best},", "arxiv_id": "2406.12430", "doi": "10.48550/arXiv.2406.12430"}
+{"id": "ares-automated-evaluation-2023", "title": "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems", "authors": ["Jon Saad-Falcon", "O. Khattab", "Christopher Potts", "Matei Zaharia"], "year": 2023, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2311.09476", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG", "arxiv_id": "2311.09476", "doi": "10.48550/arXiv.2311.09476"}
+{"id": "investigating-retrieval-augmented-2025", "title": "Investigating Retrieval Augmented Generation for LLM-Based Code Generation", "authors": ["Paraskevi Kivroglou", "Tim Schlippe", "Simon Martin"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FLLM67465.2025.11391177", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates the effectiveness of Retrieval Augmented Generation (RAG) in enhancing program code generation capabilities of Large Language Models (LLMs). We conduct a systematic analysis us", "doi": "10.1109/FLLM67465.2025.11391177"}
+{"id": "enhancing-ability-llms-2025", "title": "Enhancing the ability of LLMs for spaceborne equipment code generation via retrieval-augmented generation and contrastive learning", "authors": ["Rui He", "Liang Zhang", "Liangqing Lyu", "Changbin Xue"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10515-025-00545-1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10515-025-00545-1"}
+{"id": "cotrag-integrating-chain-2025", "title": "CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models", "authors": ["Feiyang Li", "Peng Fang", "Zhan Shi", "Arijit Khan", "Fang Wang"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2504.13534", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Chain-of-thought (CoT) reasoning boosts large language models'(LLMs) performance on complex tasks but faces two key limitations: a lack of reliability when solely relying on LLM-generated reasoning ch", "arxiv_id": "2504.13534", "doi": "10.48550/arXiv.2504.13534"}
+{"id": "promptbased-code-completion-2024", "title": "Prompt-Based Code Completion via Multi-Retrieval Augmented Generation", "authors": ["Hanzhuo Tan", "Qi Luo", "Lingixao Jiang", "Zizheng Zhan", "Jing Li"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2405.07530", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated code completion, aiming at generating subsequent tokens from unfinished code, has significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these mod", "arxiv_id": "2405.07530", "doi": "10.1145/3725812"}
+{"id": "retrievalaugmented-generation-approach-2025", "title": "A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks", "authors": ["Waleed Khalid", "Dmitry Ignatov", "R. Timofte"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.04329", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. W", "arxiv_id": "2512.04329", "doi": "10.48550/arXiv.2512.04329"}
+{"id": "multiagent-onboarding-assistant-2025", "title": "A Multi-agent Onboarding Assistant based on Large Language Models, Retrieval Augmented Generation, and Chain-of-Thought", "authors": ["A. Ionescu", "Sergey Titov", "M. Izadi"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2503.23421", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Effective onboarding in software engineering is crucial but difficult due to the fast-paced evolution of technologies. Traditional methods, like exploration and workshops, are costly, time-consuming, ", "arxiv_id": "2503.23421", "doi": "10.1145/3696630.3728611"}
+{"id": "shifting-from-ranking-2025", "title": "Shifting from Ranking to Set Selection for Retrieval Augmented Generation", "authors": ["Dahyun Lee", "Yongrae Jo", "Haeju Park", "Moontae Lee"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2507.06838", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval in Retrieval-Augmented Generation(RAG) must ensure that retrieved passages are not only individually relevant but also collectively form a comprehensive set. Existing approaches primarily re", "arxiv_id": "2507.06838", "doi": "10.18653/v1/2025.acl-long.861"}
+{"id": "enhancing-code-translation-2024", "title": "Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation", "authors": ["Manish Bhattarai", "Javier E. Santos", "Shawn Jones", "Ayan Biswas", "Boian Alexandrov"], "year": 2024, "venue": "IEEE Conference on High Performance Extreme Computing", "source_url": "https://arxiv.org/abs/2407.19619", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of large language models (LLMs) has revolutionized the field of code translation, enabling automated translation between programming languages. Despite these advancements, the accuracy and ", "arxiv_id": "2407.19619", "doi": "10.1109/HPEC62836.2024.10938485"}
+{"id": "mixofgranularity-optimize-chunking-2024", "title": "Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation", "authors": ["Zijie Zhong", "Hanwen Liu", "Xiao Cui", "Xiaofan Zhang", "Zengchang Qin"], "year": 2024, "venue": "International Conference on Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.00456", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrating information from various reference databases is a major challenge for Retrieval-Augmented Generation (RAG) systems because each knowledge source adopts a unique data structure and follows ", "arxiv_id": "2406.00456", "doi": "10.48550/arXiv.2406.00456"}
+{"id": "hypergraphrag-retrievalaugmented-generation-2025-2", "title": "HyperGraphRAG: Retrieval-Augmented Generation with Hypergraph-Structured Knowledge Representation", "authors": ["Haoran Luo", "E. Haihong", "Guanting Chen", "Yandan Zheng", "Xiaobao Wu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2503.21322", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2503.21322"}
+{"id": "gitbugs-bug-reports-2025", "title": "GitBugs: Bug Reports for Duplicate Detection, Retrieval Augmented Generation, Triage, and More", "authors": ["Avinash Patil"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.09651", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these li", "arxiv_id": "2504.09651", "doi": "10.48550/arXiv.2504.09651"}
+{"id": "telecomrag-taming-telecom-2024", "title": "TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs", "authors": ["G. M. Yilma", "J. Ayala-Romero", "A. Garcia-Saavedra", "Xavier Pérez Costa"], "year": 2024, "venue": "Computer communication review", "source_url": "https://arxiv.org/abs/2406.07053", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have immense potential to transform the telecommunications industry. They could help professionals understand complex standards, generate code, and accelerate development.", "arxiv_id": "2406.07053", "doi": "10.1145/3711992.3711996"}
+{"id": "what-retrieve-effective-2025", "title": "What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond", "authors": ["Wenchao Gu", "Juntao Chen", "Yanlin Wang", "Tianyue Jiang", "Xing Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.20589", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generat", "arxiv_id": "2503.20589", "doi": "10.48550/arXiv.2503.20589"}
+{"id": "code-review-automation-2025", "title": "Code Review Automation using Retrieval Augmented Generation", "authors": ["Qianru Meng", "Xiao Zhang", "Zhaochen Ren", "Joost Visser"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.05302", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review is essential for maintaining software quality but is labor-intensive. Automated code review generation offers a promising solution to this challenge. Both deep learning-based generative te", "arxiv_id": "2511.05302", "doi": "10.48550/arXiv.2511.05302"}
+{"id": "modeldriven-quantum-code-2025", "title": "Model-Driven Quantum Code Generation Using Large Language Models and Retrieval-Augmented Generation", "authors": ["Nazanin Siavash", "Armin Moin"], "year": 2025, "venue": "ACM/IEEE International Conference on Model Driven Engineering Languages and Systems", "source_url": "https://arxiv.org/abs/2508.21097", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces a novel research direction for model-to-text/code transformations by leveraging Large Language Models (LLMs) that can be enhanced with Retrieval-Augmented Generation (RAG) pipeli", "arxiv_id": "2508.21097", "doi": "10.1109/MODELS67397.2025.00031"}
+{"id": "hierarchical-document-refinement-2025", "title": "Hierarchical Document Refinement for Long-context Retrieval-augmented Generation", "authors": ["Jiajie Jin", "Xiaoxi Li", "Guanting Dong", "Yuyao Zhang", "Yutao Zhu"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2505.10413", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Real-world RAG applications often encounter long-context input scenarios, where redundant information and noise results in higher inference costs and reduced performance. To address these challenges, ", "arxiv_id": "2505.10413", "doi": "10.48550/arXiv.2505.10413"}
+{"id": "ultrarag-modular-automated-2025", "title": "UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation", "authors": ["Yuxuan Chen", "Dewen Guo", "Senkun Mei", "Xinze Li", "Hao Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.08761", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) significantly enhances the performance of large language models (LLMs) in downstream tasks by integrating external knowledge. To facilitate researchers in deployin", "arxiv_id": "2504.08761", "doi": "10.48550/arXiv.2504.08761"}
+{"id": "retrievalaugmented-generation-framework-2025", "title": "Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms", "authors": ["S. Luis", "D. Reina", "Sergio L. Toral Marín"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/educsci15060706", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Project-Based Learning is central to modern engineering education, but assessing the originality of student work poses significant challenges, particularly when previous project repositories are acces", "doi": "10.3390/educsci15060706"}
+{"id": "library-llm-intrinsics-2025", "title": "A Library of LLM Intrinsics for Retrieval-Augmented Generation", "authors": ["Marina Danilevsky", "Kristjan H. Greenewald", "Chulaka Gunasekara", "Maeda F Hanafi", "Lihong He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.11704", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the developer community for large language models (LLMs), there is not yet a clean pattern analogous to a software library, to support very large scale collaboration. Even for the commonplace use c", "arxiv_id": "2504.11704", "doi": "10.48550/arXiv.2504.11704"}
+{"id": "reversum-multistaged-retrievalaugmented-2025", "title": "REVERSUM: A Multi-staged Retrieval-Augmented Generation Method to Enhance Wikipedia Tail Biographies through Personal Narratives", "authors": ["Sayantan Adak", "Pauras Mangesh Meher", "Paramita Das", "Animesh Mukherjee"], "year": 2025, "venue": "International Conference on Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.12137", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Wikipedia is an invaluable resource for factual information about a wide range of entities. However, the quality of articles on less-known entities often lags behind that of the well-known ones. This ", "arxiv_id": "2502.12137", "doi": "10.48550/arXiv.2502.12137"}
+{"id": "empowering-lowresource-languages-2025", "title": "Empowering Low-Resource Languages: TraSe Architecture for Enhanced Retrieval-Augmented Generation in Bangla", "authors": ["Atia Shahnaz Ipa", "M. Rony", "M. Islam"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.lm4uc-1.2", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Research on Retrieval-Augmented Generation for low-resource languages has been sparse because of limited resources. To address this, we focus on Bangla, a low-resource language, and have created a dat", "doi": "10.18653/v1/2025.lm4uc-1.2"}
+{"id": "cyberbot-ontologygrounded-retrieval-2025", "title": "CyberBOT: Ontology-Grounded Retrieval Augmented Generation for Reliable Cybersecurity Education", "authors": ["Chengshuai Zhao", "Riccardo De Maria", "Tharindu Kumarage", "Kumar Satvik Chaudhary", "Garima Agrawal"], "year": 2025, "venue": "International Conference on Information and Knowledge Management", "source_url": "https://arxiv.org/abs/2504.00389", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Advancements in large language models (LLMs) have enabled the development of intelligent educational tools that support inquiry-based learning across technical domains. In cybersecurity education, whe", "arxiv_id": "2504.00389", "doi": "10.1145/3746252.3761478"}
+{"id": "regaining-control-enabling-2025", "title": "Regaining Control: Enabling Educators to Build Specialized AI Chat Bots with Retrieval Augmented Generation", "authors": ["Barbara Pampel", "Simon Martin", "Ulrike Padó"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.5220/0013425500003932", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : Conversational AI (chat) bots are powerful and helpful tools, but are not suited for the unrestricted use in many classrooms: They may hallucinate, easily veer from the topic of instruction, and are", "doi": "10.5220/0013425500003932"}
+{"id": "beyond-correctness-rewarding-2025", "title": "Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation", "authors": ["Zhichao Xu", "Zongyu Wu", "Yun Zhou", "Aosong Feng", "Kang Zhou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.13272", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inspired by the success of reinforcement learning (RL) in Large Language Model (LLM) training for domains like math and code, recent works have begun exploring how to train LLMs to use search engines ", "arxiv_id": "2510.13272", "doi": "10.48550/arXiv.2510.13272"}
+{"id": "retrievalaugmented-generation-electrocardiogramlanguage-2025", "title": "Retrieval-Augmented Generation for Electrocardiogram-Language Models", "authors": ["Xiaoyu Song", "William Jongwon Han", "Tony Chen", "Chaojing Duan", "Michael A. Rosenberg"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.00261", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Interest in generative Electrocardiogram-Language Models (ELMs) is growing, as they can produce textual responses conditioned on ECG signals and textual queries. Unlike traditional classifiers that ou", "arxiv_id": "2510.00261", "doi": "10.48550/arXiv.2510.00261"}
+{"id": "automated-vulnerability-repair-2025-2", "title": "Automated Vulnerability Repair Based on Retrieval-Augmented Generation", "authors": ["Shengyi Cheng", "Qiao Yu", "Yi Zhu", "Zirui Huang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ISEAE64934.2025.11041756", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As software scale continues to grow and complexity increases, vulnerabilities have become a significant issue affecting system security and stability. In recent years, advancements in machine learning", "doi": "10.1109/ISEAE64934.2025.11041756"}
+{"id": "advancing-engineering-research-2025", "title": "Advancing engineering research through context-aware and knowledge graph–based retrieval-augmented generation", "authors": ["Soham Ghosh", "Gaurav Mittal"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3389/frai.2025.1697169", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are powerful in language understanding and content generation but frequently fall short of technical accuracy when they are applied to engineering code, standards, and des", "doi": "10.3389/frai.2025.1697169"}
+{"id": "enhancing-android-malware-2025", "title": "Enhancing Android Malware Detection with Retrieval-Augmented Generation", "authors": ["S. Saraga", "S. AnaghaM.", "Dincy R. Arikkat", "A. RafidhaRehimanK.", "S. Nicolazzo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.22750", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The widespread use of Android applications has made them a prime target for cyberattacks, significantly increasing the risk of malware that threatens user privacy, security, and device functionality. ", "arxiv_id": "2506.22750", "doi": "10.48550/arXiv.2506.22750"}
+{"id": "retrievalaugmented-generation-interpreting-2025", "title": "Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models", "authors": ["S. Nanua", "Raven Steward", "Benjamin Neely", "Michael Datto", "Kenneth Youens"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.jpi.2025.100520", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated strong performance on general knowledge tasks, but they have important limitations as standalone tools for question answering in specialized domains wher", "doi": "10.1016/j.jpi.2025.100520"}
+{"id": "agentic-ai-retrievalaugmented-2025", "title": "Agentic AI with retrieval-augmented generation for automated compliance assistance in finance", "authors": ["Varun Pandey"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.30574/ijsra.2025.15.2.1522", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Maintaining compliance with complex Know Your Customer (KYC) and Anti-Money Laundering (AML) regulations is a resource-intensive challenge for financial institutions. This paper presents an agentic AI", "doi": "10.30574/ijsra.2025.15.2.1522"}
+{"id": "beyond-chunks-graphs-2025", "title": "Beyond Chunks and Graphs: Retrieval-Augmented Generation through Triplet-Driven Thinking", "authors": ["Shengbo Gong", "Xianfeng Tang", "Carl Yang", "Wei Jin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.02435", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) is critical for reducing hallucinations and incorporating external knowledge into Large Language Models (LLMs). However, advanced RAG systems face a trade-off betw", "arxiv_id": "2508.02435", "doi": "10.48550/arXiv.2508.02435"}
+{"id": "marag-automating-role-2025", "title": "MA-RAG: Automating Role Engineering for RESTful APIs with Multi-Head Attention and Retrieval-Augmented Generation", "authors": ["Yang Luo", "Qingni Shen", "Zhonghai Wu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.24963/ijcai.2025/846", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper addresses the role engineering problem for RESTful applications and proposes a role engineering method based on multi-head attention and Retrieval Augmented Generation called MA-RAG. The me", "doi": "10.24963/ijcai.2025/846"}
+{"id": "classifying-addressing-diversity-2025", "title": "Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems", "authors": ["Kin Kwan Leung", "Mouloud Belbahri", "Yi Sui", "Alex Labach", "Xueying Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.13975", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-wo", "arxiv_id": "2510.13975", "doi": "10.48550/arXiv.2510.13975"}
+{"id": "xgenq-explainable-domainadaptive-2025", "title": "XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security", "authors": ["Hamed Jelodar", "Mohammad Meymani", "R. Razavi-Far", "Ali A. Ghorbani"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19006", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI and large language models (LLMs) have shown strong capabilities in code understanding, but their use in cybersecurity, particularly for malware detection and analysis, remains limited. E", "arxiv_id": "2510.19006", "doi": "10.48550/arXiv.2510.19006"}
+{"id": "seakr-selfaware-knowledge-2024", "title": "SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation", "authors": ["Zijun Yao", "Weijian Qi", "Liangming Pan", "S. Cao", "Linmei Hu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.19215", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLM", "arxiv_id": "2406.19215", "doi": "10.48550/arXiv.2406.19215"}
+{"id": "optimizing-code-runtime-2025", "title": "Optimizing Code Runtime Performance Through Context-Aware Retrieval-Augmented Generation", "authors": ["Manish Acharya", "Yifan Zhang", "Kevin Leach", "Yu Huang"], "year": 2025, "venue": "IEEE International Conference on Program Comprehension", "source_url": "https://arxiv.org/abs/2501.16692", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Optimizing software performance through automated code refinement offers a promising avenue for enhancing execution speed and efficiency. Despite recent advancements in LLMs, a significant gap remains", "arxiv_id": "2501.16692", "doi": "10.1109/ICPC66645.2025.00028"}
+{"id": "enhancing-crosslanguage-code-2025", "title": "Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation", "authors": ["Manish Bhattarai", "Minh N. Vu", "Javier E. Santos", "Ismael Ismael", "Dan O'Malley"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.knowledgenlp-1.8", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.18653/v1/2025.knowledgenlp-1.8"}
+{"id": "developing-computerbased-tutor-2024", "title": "Developing a computer-based tutor utilizing Generative Artificial Intelligence (GAI) and Retrieval-Augmented Generation (RAG)", "authors": ["Youngjin Lee"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10639-024-13129-5", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10639-024-13129-5"}
+{"id": "give-llms-security-2025", "title": "Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection", "authors": ["Bo Lin", "Shangwen Wang", "Yihao Qin", "Liqian Chen", "Xiaoguang Mao"], "year": 2025, "venue": "Conference on Computer and Communications Security", "source_url": "https://arxiv.org/abs/2504.16429", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Code Generation (RACG) leverages external knowledge to enhance Large Language Models (LLMs) in code synthesis, improving the functional correctness of the generated code. However, ", "arxiv_id": "2504.16429", "doi": "10.1145/3719027.3765049"}
+{"id": "exploring-security-threats-2025", "title": "Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation", "authors": ["Bo Lin", "Shangwen Wang", "Liqian Chen", "Xiaoguang Mao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.03233", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into software development has revolutionized the field, particularly through the use of Retrieval-Augmented Code Generation (RACG) systems that enhance ", "arxiv_id": "2502.03233", "doi": "10.48550/arXiv.2502.03233"}
+{"id": "augmenting-code-sequencing-2024", "title": "Augmenting Code Sequencing with Retrieval-Augmented Generation (RAG) for Context-Aware Code Synthesis", "authors": ["S. Rani", "S. G. Deepika", "D. Devdharshini", "Harini Ravindran"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SSITCON62437.2024.10796587", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The growing demand for efficient code generation has driven research into improving Large Language Models (LLMs). This project presents a novel system designed to enhance code generation by leveraging", "doi": "10.1109/SSITCON62437.2024.10796587"}
+{"id": "esapiens-platform-secure-2025", "title": "eSapiens: A Platform for Secure and Auditable Retrieval-Augmented Generation", "authors": ["Isaac Shi", "Zeyuan Li", "Fan Liu", "Wenli Wang", "Lewei He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.09588", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present eSapiens, an AI-as-a-Service (AIaaS) platform engineered around a business-oriented trifecta: proprietary data, operational workflows, and any major agnostic Large Language Model (LLM). eSa", "arxiv_id": "2507.09588", "doi": "10.48550/arXiv.2507.09588"}
+{"id": "timesensitve-retrievalaugmented-generation-2024", "title": "Time-Sensitve Retrieval-Augmented Generation for Question Answering", "authors": ["Feifan Wu", "Lingyuan Liu", "Wentao He", "Ziqi Liu", "Zhiqiang Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3627673.3679800", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by accessing external data sources, offering a promising way to improve accuracy and reliability. Despite its potential, conv", "doi": "10.1145/3627673.3679800"}
+{"id": "droidcoder-enhanced-android-2024", "title": "DroidCoder: Enhanced Android Code Completion with Context-Enriched Retrieval-Augmented Generation", "authors": ["Xinran Yu", "Chun Li", "Minxue Pan", "Xuandong Li"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3691620.3695063", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Android is the most popular mobile operating system. However, Android development requires extensive coding, especially for unique features such as lifecycle callbacks and UI widgets. Existing code co", "doi": "10.1145/3691620.3695063"}
+{"id": "repogenreflex-enhancing-repositorylevel-2024", "title": "RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation", "authors": ["Jicheng Wang", "Yifeng He", "Hao Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.13122", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In real-world software engineering tasks, solving a problem often requires understanding and modifying multiple functions, classes, and files across a large codebase. Therefore, on the repository leve", "arxiv_id": "2409.13122", "doi": "10.48550/arXiv.2409.13122"}
+{"id": "analysing-codebased-retrieval-2025", "title": "Analysing Code-Based Retrieval Augmented Generation Methods for Knowledge Retention", "authors": ["A. P.", "Jerit Joshy", "Mohammed Farhan T M", "Praneeth C F", "Dimple Elizabeth Baby"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS65134.2025.11135582", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Traditional Retrieval Augemented Systems (RAG) fails to capture the intricate and complex contextual information within a code repository. This is mainly due to the presence of multiple files, functio", "doi": "10.1109/ACCESS65134.2025.11135582"}
+{"id": "localized-opensource-llm-2024", "title": "Localized Open-Source LLM Aware Retrieval Augmented Generation of Legal Documents: A Case Study on Indian Constitution and Penal Code", "authors": ["Pratiksha Phukon", "Yogesh Lokhar", "P. Ray"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/BITCON63716.2024.10985396", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are advanced artificial intelligence systems which are designed to generate human-like language based on their training data. Retrieval- Augmented Generation (RAG) is a fr", "doi": "10.1109/BITCON63716.2024.10985396"}
+{"id": "optimizing-microservice-deployment-2024", "title": "Optimizing Microservice Deployment in Edge Computing with Large Language Models: Integrating Retrieval Augmented Generation and Chain of Thought Techniques", "authors": ["Kan Feng", "Lijun Luo", "Yongjun Xia", "Bin Luo", "Xingfeng He"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/sym16111470", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in autogenerating code based on natural language instructions provided by humans. We observed that in the microservice models of ", "doi": "10.3390/sym16111470"}
+{"id": "codewisp-ast-guided-2025", "title": "CodeWisp: AST Guided Retrieval Augmented Generation for Code Generation and Completion", "authors": ["Hamza El Atrassi", "Yasmina El Idrissi", "Yahya Benkaouz"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/WINCOM65874.2025.11313399", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the context of software development, code completion has become an essential functionality that helps speed up coding and reduce syntax errors. Most previously proposed code completion modules are ", "doi": "10.1109/WINCOM65874.2025.11313399"}
+{"id": "enhancing-source-code-2024", "title": "Enhancing Source Code Comment Generation via Retrieval-Augmented Generation with Design Document Term Dictionary", "authors": ["Kazumi Nishikawa", "Genta Koreki", "Hideyuki Kanuka"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/APSEC65559.2024.00061", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Effective software development depends on clear code comments for better understanding. We introduce a method for generating automated source-code comments using retrieval-augmented generation (RAG) w", "doi": "10.1109/APSEC65559.2024.00061"}
+{"id": "importsnare-directed-code-2025", "title": "ImportSnare: Directed 'Code Manual' Hijacking in Retrieval-Augmented Code Generation", "authors": ["Kai Ye", "Liangcai Su", "Chenxiong Qian"], "year": 2025, "venue": "Conference on Computer and Communications Security", "source_url": "https://arxiv.org/abs/2509.07941", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation has emerged as a pivotal capability of Large Language Models (LLMs), revolutionizing development efficiency for programmers of all skill levels. However, the complexity of data structu", "arxiv_id": "2509.07941", "doi": "10.1145/3719027.3765161"}
+{"id": "retrievalaugmented-code-review-2025", "title": "Retrieval-Augmented Code Review Comment Generation", "authors": ["Hyunsun Hong", "Jong-Chan Baik"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.11591", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated code review comment generation (RCG) aims to assist developers by automatically producing natural language feedback for code changes. Existing approaches are primarily either generation-base", "arxiv_id": "2506.11591", "doi": "10.48550/arXiv.2506.11591"}
+{"id": "rescue-retrieval-augmented-2025", "title": "RESCUE: Retrieval Augmented Secure Code Generation", "authors": ["Jiahao Shi", "Tianyi Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.18204", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to enhance LLMs for secure code generation by incorporating", "arxiv_id": "2510.18204", "doi": "10.48550/arXiv.2510.18204"}
+{"id": "across-programming-language-2025", "title": "Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation", "authors": ["Qiming Zhu", "Jialun Cao", "Xuanang Chen", "Yaojie Lu", "Hongyu Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.03535", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) mainly focuses on single-language settings, leaving cross-lingual effectiveness and security unexplored", "arxiv_id": "2506.03535", "doi": "10.48550/arXiv.2506.03535"}
+{"id": "fidelitygpt-correcting-decompilation-2025", "title": "FidelityGPT: Correcting Decompilation Distortions with Retrieval Augmented Generation", "authors": ["Zhiping Zhou", "Xiaohong Li", "Ruitao Feng", "Yao Zhang", "Yuekang Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19615", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Decompilation converts machine code into human-readable form, enabling analysis and debugging without source code. However, fidelity issues often degrade the readability and semantic accuracy of decom", "arxiv_id": "2510.19615", "doi": "10.14722/ndss.2026.230989"}
+{"id": "accessible-reliable-ai-2025", "title": "Accessible and Reliable AI Coding Tutors: Augmenting Large Language Models with Retrieval-Augmented Generation for Java Programming", "authors": ["Guiu Puigcercos i Vilar", "Parvez Rashid", "N. Tonekaboni"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EDUCON62633.2025.11016497", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper addresses the challenge of improving the reliability and accuracy of Large Language Models (LLMs) for assisting students in learning Java programming, a critical component of object-oriente", "doi": "10.1109/EDUCON62633.2025.11016497"}
+{"id": "effective-emulation-pss-2025", "title": "Effective Emulation of PSS Systems via Retrieval Augmented Generation", "authors": ["Xiangwei Zhou", "Guanzhong Wang", "Ningning Zhang", "A. Wulamu", "Ao He"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICCBDAI66607.2025.11388283", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a training-free retrieval-augmented generation (RAG) framework that emulates the behavioral interface of legacy Unisys mainframes used in airline Passenger Service Systems (PSS). By dynamic", "doi": "10.1109/ICCBDAI66607.2025.11388283"}
+{"id": "retrievalaugmented-generation-reliable-2025", "title": "Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations", "authors": ["Zakaria El Kassimi", "Fares Fourati", "M. Alouini"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.09651", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We study question answering in the domain of radio regulations, a legally sensitive and high-stakes area. We propose a telecom-specific Retrieval-Augmented Generation (RAG) pipeline and introduce, to ", "arxiv_id": "2509.09651", "doi": "10.48550/arXiv.2509.09651"}
+{"id": "leveraging-lecture-content-2024", "title": "Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation", "authors": ["Sven Jacobs", "Steffen Jaschke"], "year": 2024, "venue": "Conference on Software Engineering Education and Training", "source_url": "https://arxiv.org/abs/2405.06681", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents the use of Retrieval Augmented Generation (RAG) to improve the feedback generated by Large Language Models for programming tasks. For this purpose, corresponding lecture recordings", "arxiv_id": "2405.06681", "doi": "10.1109/CSEET62301.2024.10663001"}
+{"id": "seer-selfaligned-evidence-2024", "title": "SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation", "authors": ["Xinping Zhao", "Dongfang Li", "Yan Zhong", "Boren Hu", "Yibin Chen"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2410.11315", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent studies in Retrieval-Augmented Generation (RAG) have investigated extracting evidence from retrieved passages to reduce computational costs and enhance the final RAG performance, yet it remains", "arxiv_id": "2410.11315", "doi": "10.48550/arXiv.2410.11315"}
+{"id": "general-instructionfollowing-alignment-2024", "title": "Toward General Instruction-Following Alignment for Retrieval-Augmented Generation", "authors": ["Guanting Dong", "Xiaoshuai Song", "Yutao Zhu", "Runqi Qiao", "Zhicheng Dou"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.09584", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assess", "arxiv_id": "2410.09584", "doi": "10.48550/arXiv.2410.09584"}
+{"id": "retrieval-augmented-generation-2025", "title": "Retrieval Augmented Generation for HPC Code Optimization", "authors": ["Shalini Mutyala"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.31274/cc-20260223-58", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.31274/cc-20260223-58"}
+{"id": "understanding-design-decisions-2024", "title": "Understanding the Design Decisions of Retrieval-Augmented Generation Systems", "authors": ["Shengming Zhao", "Yuchen Shao", "Yuheng Huang", "Jiayang Song", "Zhijie Wang"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2411.19463", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) has emerged as a critical technique for enhancing large language model (LLM) capabilities. However, practitioners face significant challenges when making RAG deplo", "arxiv_id": "2411.19463"}
+{"id": "bioragent-retrievalaugmented-generation-2024", "title": "BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A", "authors": ["Samy Ateia", "Udo Kruschwitz"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.12358", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present BioRAGent, an interactive web-based retrieval-augmented generation (RAG) system for biomedical question answering. The system uses large language models (LLMs) for query expansion, snippet ", "arxiv_id": "2412.12358", "doi": "10.48550/arXiv.2412.12358"}
+{"id": "morag-multifusion-retrieval-2024", "title": "Morag - Multi-Fusion Retrieval Augmented Generation for Human Motion", "authors": ["Kalakonda Sai Shashank", "Shubham Maheshwari", "Ravi Kiran Sarvadevabhatla"], "year": 2024, "venue": "IEEE Workshop/Winter Conference on Applications of Computer Vision", "source_url": "https://arxiv.org/abs/2409.12140", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce MoRAG, a novel multi-part fusion based retrieval-augmented generation strategy for text-based human motion generation. The method enhances motion diffusion models by leveraging additional", "arxiv_id": "2409.12140", "doi": "10.1109/WACV61041.2025.00448"}
+{"id": "retrieval-augmented-generation-2024", "title": "Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup", "authors": ["Tristan Kenneweg", "Philip Kenneweg", "Barbara Hammer"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.00820", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-Language Model (LLM) outputs with domain specific and time sensitive data. Very recently a shift is happening", "arxiv_id": "2403.00820", "doi": "10.48550/arXiv.2403.00820"}
+{"id": "ragviz-diagnose-visualize-2024", "title": "RAGViz: Diagnose and Visualize Retrieval-Augmented Generation", "authors": ["Tevin Wang", "Jingyuan He", "Chenyan Xiong"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2411.01751", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) combines knowledge from domain-specific sources into large language models to ground answer generation. Current RAG systems lack customizable visibility on the con", "arxiv_id": "2411.01751", "doi": "10.18653/v1/2024.emnlp-demo.33"}
+{"id": "codegrag-bridging-gap-2024", "title": "CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation", "authors": ["Kounianhua Du", "Jizheng Chen", "Renting Rui", "Huacan Chai", "Lingyue Fu"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2405.02355", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the large language models, their specificity in code", "arxiv_id": "2405.02355"}
+{"id": "llavacode-compressed-code-2025", "title": "LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation", "authors": ["Daria Cherniuk", "Nikita Sukhorukov", "Nikita Sushko", "Daniil Gusak", "Danil Sivtsov"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19644", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, particularly when context from a surrounding repository is essential. However, incorporating con", "arxiv_id": "2510.19644", "doi": "10.48550/arXiv.2510.19644"}
+{"id": "retrievalaugmented-code-generation-2025-2", "title": "Retrieval-Augmented Code Generation of Low-Resource Programming Languages", "authors": ["Jianbo Lin", "Yuan Zhang", "Chuanyi Li", "Wentao Zou", "Jidong Ge"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/APSEC66846.2025.00106", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The performance of Large Language Models degrades substantially when generating code for low-resource programming languages. While Retrieval-Augmented Generation (RAG) offers a solution, applying it t", "doi": "10.1109/APSEC66846.2025.00106"}
+{"id": "exploring-security-threats-2025-2", "title": "Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation", "authors": ["Tian Li", "Bo Lin", "Shangwen Wang", "Yusong Tan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.21681", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security implications remain dangerously underexplored. This paper", "arxiv_id": "2512.21681", "doi": "10.48550/arXiv.2512.21681"}
+{"id": "gracg-graph-retrieval-2025", "title": "GRACG: Graph Retrieval Augmented Code Generation", "authors": ["K. Fedorov", "Boris Zarubin", "Vladimir Ivanov"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASEW67777.2025.00060", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While file-level context is an effective basis for many modern code generation tools powered by Large Language Models (LLMs), it may be insufficient to fully capture the structural and semantic depend", "doi": "10.1109/ASEW67777.2025.00060"}
+{"id": "finetuning-retrieval-augmented-2024", "title": "Fine-Tuning and Retrieval Augmented Generation for Question Answering Using Affordable Large Language Models", "authors": ["Tiberiu Boros", "Radu Chivereanu", "S. Dumitrescu", "Octavian Purcaru"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/618c3c383b0042c7239db09da0dd53a471a6b14e", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "integrating-retrievalaugmented-generation-2024", "title": "Integrating Retrieval-Augmented Generation for Enhanced Code Reuse: A Comprehensive Framework for Efficient Software Development", "authors": ["Kai Wang", "Yujie Ding", "Shuai Jia", "Tianyi Ma", "Yin Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SWC62898.2024.00205", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid development of ubiquitous computing, the demand for efficient software development is growing stronger. Code reuse is an effective means to enhance software development efficiency, sign", "doi": "10.1109/SWC62898.2024.00205"}
+{"id": "irsc-zeroshot-evaluation-2024", "title": "IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios", "authors": ["Hai Lin", "Shaoxiong Zhan", "Junyou Su", "Hai-Tao Zheng", "Hui Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.15763", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In Retrieval-Augmented Generation (RAG) tasks using Large Language Models (LLMs), the quality of retrieved information is critical to the final output. This paper introduces the IRSC benchmark for eva", "arxiv_id": "2409.15763", "doi": "10.48550/arXiv.2409.15763"}
+{"id": "enhancing-gpt35s-proficiency-2024", "title": "Enhancing GPT-3.5's Proficiency in Netlogo Through Few-Shot Prompting and Retrieval-Augmented Generation", "authors": ["Joseph Martínez", "Brian Llinás", "Jhon G. Botello", "Jose J. Padilla", "Erika F. Frydenlund"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/WSC63780.2024.10838967", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recognizing the limited research on Large Language Models (LLMs) capabilities with low-resource languages, this study evaluates and increases the proficiency of the LLM GPT-3.5 in generating interface", "doi": "10.1109/WSC63780.2024.10838967"}
+{"id": "easyrag-efficient-retrievalaugmented-2024", "title": "EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations", "authors": ["Zhangchi Feng", "Dongdong Kuang", "Zhongyuan Wang", "Zhijie Nie", "Yaowei Zheng"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.10315", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents EasyRAG, a simple, lightweight, and efficient retrieval-augmented generation framework for automated network operations. Our framework has three advantages. The first is accurate q", "arxiv_id": "2410.10315", "doi": "10.48550/arXiv.2410.10315"}
+{"id": "laura-enhancing-code-2025", "title": "LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM", "authors": ["Yuxia Zhang", "Yuxia Zhang", "Zeyu Sun", "Yanjie Jiang", "Hui Liu"], "year": 2025, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2512.01356", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review is critical for ensuring software quality and maintainability. With the rapid growth in software scale and complexity, code review has become a bottleneck in the development process becaus", "arxiv_id": "2512.01356", "doi": "10.1109/ASE63991.2025.00245"}
+{"id": "retrieval-augmented-generation-2024-2", "title": "Retrieval Augmented Generation in Large Language Models: Development of AI Chatbot for Student Support", "authors": ["D. Oreški", "Dino Vlahek"], "year": 2024, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/addeb94f6618041809684b5e775f49268e00cd85", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "enhancing-scientific-reproducibility-2024", "title": "Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications", "authors": ["Sean Kim", "Raja Mazumder"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.15076", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The exponential growth in computational power and accessibility has transformed the complexity and scale of bioinformatics research, necessitating standardized documentation for transparency, reproduc", "arxiv_id": "2409.15076", "doi": "10.48550/arXiv.2409.15076"}
+{"id": "how-build-ai-2023", "title": "How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation", "authors": ["Chenxi Dong"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2311.17696", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2311.17696"}
+{"id": "preferenceguided-refactored-tuning-2024", "title": "Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation", "authors": ["Xinyu Gao", "Yun Xiong", "Deze Wang", "Zhenhan Guan", "Zejian Shi"], "year": 2024, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2409.15895", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented code generation utilizes Large Language Models as the generator and significantly expands their code generation capabilities by providing relevant code, documentation, and more via", "arxiv_id": "2409.15895", "doi": "10.1145/3691620.3694987"}
+{"id": "retroli-smallscale-retrieval-2024", "title": "RETRO-LI: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization", "authors": ["Gentiana Rashiti", "G. Karunaratne", "Mrinmaya Sachan", "Abu Sebastian", "Abbas Rahimi"], "year": 2024, "venue": "European Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2410.00004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-param", "arxiv_id": "2410.00004", "doi": "10.3233/FAIA240837"}
+{"id": "improving-llmassisted-secure-2026", "title": "Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback", "authors": ["Vidyut Sriram", "Sawan Pandita", "Achintya Lakshmanan", "Aneesh Shamraj", "Suman Saha"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.00509", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) can generate code but often introduce security vulnerabilities, logical inconsistencies, and compilation errors. Prior work demonstrates that LLMs benefit substantially fr", "arxiv_id": "2601.00509", "doi": "10.48550/arXiv.2601.00509"}
+{"id": "retrievalaugmented-generation-elevate-2023", "title": "Using retrieval-augmented generation to elevate low-code developer skills", "authors": ["Nakhod O"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.15407/jai2023.03.126", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article proposes applying retrieval-augmented generation (RAG) to improve the skills of low-code developers by augmenting large language models with up-to-date domain-specific knowledge. As low-c", "doi": "10.15407/jai2023.03.126"}
+{"id": "improving-retrievalaugmented-code-2024", "title": "Improving Retrieval-Augmented Code Comment Generation by Retrieving for Generation", "authors": ["Hanzhen Lu", "Zhongxin Liu"], "year": 2024, "venue": "IEEE International Conference on Software Maintenance and Evolution", "source_url": "https://arxiv.org/abs/2408.03623", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code comment generation aims to generate high-quality comments from source code automatically and has been studied for years. Recent studies proposed to integrate information retrieval techniques with", "arxiv_id": "2408.03623", "doi": "10.1109/ICSME58944.2024.00040"}
+{"id": "rar-retrievalaugmented-retrieval-2024", "title": "RAR: Retrieval-augmented retrieval for code generation in low resource languages", "authors": ["Avik Dutta", "Mukul Singh", "Gust Verbruggen", "Sumit Gulwani", "Vu Le"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2024.emnlp-main.1199", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language models struggle in generating code for low-resource programming languages, since these are underrepresented in training data. Either examples or documentation are commonly used for improved c", "doi": "10.18653/v1/2024.emnlp-main.1199"}
+{"id": "predict-retrieval-test-2026", "title": "Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation", "authors": ["Xin Sun", "Zhongqi Chen", "Q. Liu", "Shu Wu", "Bowen Song"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.11443", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for enhancing large language models'question-answering capabilities through the integration of external knowledge. However, when", "arxiv_id": "2601.11443", "doi": "10.48550/arXiv.2601.11443"}
+{"id": "assessing-answerability-queries-2024", "title": "Assessing the Answerability of Queries in Retrieval-Augmented Code Generation", "authors": ["Geonmin Kim", "Jaeyeon Kim", "Hancheol Park", "Wooksu Shin", "Tae-Ho Kim"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.05547", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Thanks to unprecedented language understanding and generation capabilities of large language model (LLM), Retrieval-augmented Code Generation (RaCG) has recently been widely utilized among software de", "arxiv_id": "2411.05547", "doi": "10.48550/arXiv.2411.05547"}
+{"id": "retrievalaugmented-code-generation-2023", "title": "Retrieval-Augmented Code Generation for Universal Information Extraction", "authors": ["Yucan Guo", "Zixuan Li", "Xiaolong Jin", "Yantao Liu", "Yutao Zeng"], "year": 2023, "venue": "Natural Language Processing and Chinese Computing", "source_url": "https://arxiv.org/abs/2311.02962", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schem", "arxiv_id": "2311.02962", "doi": "10.48550/arXiv.2311.02962"}
+{"id": "dynamic-retrievalaugmented-generation-2023", "title": "Dynamic Retrieval-Augmented Generation", "authors": ["Anton Shapkin", "D. Litvinov", "Yaroslav Zharov", "Egor Bogomolov", "Timur Galimzyanov"], "year": 2023, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2312.08976", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current state-of-the-art large language models are effective in generating high-quality text and encapsulating a broad spectrum of world knowledge. These models, however, often hallucinate and lack lo", "arxiv_id": "2312.08976"}
+{"id": "retrieval-augmented-generation-2023", "title": "Retrieval Augmented Generation of Symbolic Music with LLMs", "authors": ["Nicolas Jonason", "Luca Casini", "Carl Thom'e", "Bob L. T. Sturm"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.10384", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We explore the use of large language models (LLMs) for music generation using a retrieval system to select relevant examples. We find promising initial results for music generation in a dialogue with ", "arxiv_id": "2311.10384", "doi": "10.48550/arXiv.2311.10384"}
+{"id": "syntaxaware-retrieval-augmented-2023", "title": "Syntax-Aware Retrieval Augmented Code Generation", "authors": ["Xiangyu Zhang", "Yu Zhou", "Guang Yang", "Taolue Chen"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2023.findings-emnlp.90", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural code generation models are nowadays widely adopted to generate code from natural language descriptions automatically. Recently, pre-trained neural models equipped with token-level retrieval cap", "doi": "10.18653/v1/2023.findings-emnlp.90"}
+{"id": "retrievalaugmented-code-generation-2024", "title": "Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft", "authors": ["Chalamalasetti Kranti", "Sherzod Hakimov", "David Schlangen"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.17553", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks. In this work, we inves", "arxiv_id": "2406.17553", "doi": "10.48550/arXiv.2406.17553"}
+{"id": "ai-can-look-2023", "title": "AI Can Look Up StackOverflow too: Retrieval-Augmented Code Generation", "authors": ["Shreyas Vinayakumar", "Swagata Ashwani", "Minh-Tue Vo-Thanh"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/ff9545fb640071eadbfb40111318d7bc9c083e07", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "bashexplainer-retrievalaugmented-bash-2022", "title": "BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT", "authors": ["Chi Yu", "Guang Yang", "Xiang Chen", "Ke Liu", "Yanlin Zhou"], "year": 2022, "venue": "IEEE International Conference on Software Maintenance and Evolution", "source_url": "https://arxiv.org/abs/2206.13325", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Developers use shell commands for many tasks, such as file system management, network control, and process management. Bash is one of the most commonly used shells and plays an important role in Linux", "arxiv_id": "2206.13325", "doi": "10.1109/ICSME55016.2022.00016"}
+{"id": "rat-retrieval-augmented-2024", "title": "RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation", "authors": ["Zihao Wang", "Anji Liu", "Haowei Lin", "Jiaqi Li", "Xiaojian Ma"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.05313", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation t", "arxiv_id": "2403.05313", "doi": "10.48550/arXiv.2403.05313"}
+{"id": "arcs-agentic-retrievalaugmented-2025", "title": "ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement", "authors": ["Manish Bhattarai", "Miguel Cordova", "Javier E. Santos", "Dan O'Malley"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.20434", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present Agentic Retrieval-Augmented Code Synthesis (ARCS), a system that improves LLM-based code generation without fine-tuning. ARCS operates through a budgeted synthesize-execute-repair loop over", "arxiv_id": "2504.20434", "doi": "10.48550/arXiv.2504.20434"}
+{"id": "improving-deep-assertion-2025", "title": "Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-Trained Language Models", "authors": ["Quanjun Zhang", "Chunrong Fang", "Yi Zheng", "Yaxin Zhang", "Yuan Zhao"], "year": 2025, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2502.16071", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing u", "arxiv_id": "2502.16071", "doi": "10.1145/3721128"}
+{"id": "hyracc-hybrid-retrievalaugmented-2025", "title": "HyRACC: A Hybrid Retrieval-Augmented Framework for More Efficient Code Completion", "authors": ["Chuanyi Li", "Jiwei Shang", "Yi Feng", "Bin Luo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/Forge66646.2025.00013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) approaches have significantly advanced code completion tasks, addressing limitations like the use of updated third-party libraries and new project dependencies. Ho", "doi": "10.1109/Forge66646.2025.00013"}
+{"id": "retrievalaugmented-finetuning-preference-2025", "title": "Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation", "authors": ["Deokhyung Kang", "Jeonghun Cho", "Yejin Jeon", "Sunbin Jang", "Minsub Lee"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.16529", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Visual programming languages (VPLs) allow users to create programs through graphical interfaces, which results in easier accessibility and their widespread usage in various domains. To further enhance", "arxiv_id": "2502.16529", "doi": "10.48550/arXiv.2502.16529"}
+{"id": "referencebased-retrievalaugmented-unit-2025", "title": "Reference-Based Retrieval-Augmented Unit Test Generation", "authors": ["Zhe Zhang", "Xingyu Liu", "Yuanzhang Lin", "Xiang Gao", "Hailong Sun"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3765758", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. LLMs like GPT-4, trained in vast text and code data, excel in various ", "doi": "10.1145/3765758"}
+{"id": "p4omp-retrievalaugmented-prompting-2025", "title": "P4OMP: Retrieval-Augmented Prompting for OpenMP Parallelism in Serial Code", "authors": ["Wali Mohammad Abdullah", "Azmain Kabir"], "year": 2025, "venue": "IEEE Conference on High Performance Extreme Computing", "source_url": "https://arxiv.org/abs/2506.22703", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present P4OMP, a retrieval-augmented framework for transforming serial C/C++ code into OpenMP-annotated parallel code using large language models (LLMs). To our knowledge, this is the first system ", "arxiv_id": "2506.22703", "doi": "10.1109/HPEC67600.2025.11196284"}
+{"id": "enhancing-code-transformation-2025", "title": "Enhancing Code Transformation in Large Language Models Through Retrieval-Augmented Fine-Tuning", "authors": ["Jing-Ming Guo", "Po-Yang Liu", "Yi-Chong Zeng", "Ting Chen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TCE.2025.3565294", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have made substantial advancements in knowledge reasoning and are increasingly utilized in specialized domains such as code completion, legal analysis, and medical transcr", "doi": "10.1109/TCE.2025.3565294"}
+{"id": "propertygpt-llmdriven-formal-2024", "title": "PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation", "authors": ["Ye Liu", "Yue Xue", "Daoyuan Wu", "Yuqiang Sun", "Yi Li"], "year": 2024, "venue": "Network and Distributed System Security Symposium", "source_url": "https://arxiv.org/abs/2405.02580", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs,such as GPT-4, to transfer existing human-written properties (e.g.,those fro", "arxiv_id": "2405.02580", "doi": "10.14722/ndss.2025.241357"}
+{"id": "retrieval-augmented-comic-2025", "title": "Retrieval Augmented Comic Image Generation", "authors": ["Yunhao Shui", "Xuekuan Wang", "Feng Qiu", "Yuqiu Huang", "Jinzhu Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.12517", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present RaCig, a novel system for generating comic-style image sequences with consistent characters and expressive gestures. RaCig addresses two key challenges: (1) maintaining character identity a", "arxiv_id": "2506.12517", "doi": "10.48550/arXiv.2506.12517"}
+{"id": "automated-smart-contract-2025", "title": "Towards Automated Smart Contract Generation: Evaluation, Benchmarking, and Retrieval-Augmented Repair", "authors": ["Zaoyu Chen", "Haoran Qin", "Nuo Chen", "Xiangyu Zhao", "Lei Xue"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.01098", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Smart contracts, predominantly written in Solidity and deployed on blockchains such as Ethereum, are immutable after deployment, making functional correctness critical. However, existing evaluations o", "arxiv_id": "2503.01098"}
+{"id": "retrievalaugmented-code-completion-2024", "title": "Retrieval-augmented code completion for local projects using large language models", "authors": ["Marko Hostnik", "Marko Robnik-Sikonja"], "year": 2024, "venue": "Expert systems with applications", "source_url": "https://arxiv.org/abs/2408.05026", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The use of large language models (LLMs) is becoming increasingly widespread among software developers. However, privacy and computational requirements are problematic with commercial solutions and the", "arxiv_id": "2408.05026", "doi": "10.1016/j.eswa.2025.128596"}
+{"id": "rapgen-retrievalaugmented-patch-2023", "title": "RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair", "authors": ["Weishi Wang", "Yue Wang", "Shafiq R. Joty", "Steven C. H. Hoi"], "year": 2023, "venue": "ESEC/SIGSOFT FSE", "source_url": "https://arxiv.org/abs/2309.06057", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rul", "arxiv_id": "2309.06057", "doi": "10.1145/3611643.3616256"}
+{"id": "mitigating-prompt-dependency-2026", "title": "Mitigating Prompt Dependency in Large Language Models: A Retrieval-Augmented Framework for Intelligent Code Assistance", "authors": ["Saja Abufarha", "A. Marouf", "J. Rokne", "Reda Alhajj"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.3390/software5010004", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background: The implementation of Large Language Models (LLMs) in software engineering has provided new and improved approaches to code synthesis, testing, and refactoring. However, even with these ne", "doi": "10.3390/software5010004"}
+{"id": "race-retrievalaugmented-commit-2022", "title": "RACE: Retrieval-augmented Commit Message Generation", "authors": ["Ensheng Shi", "Yanlin Wang", "Wei Tao", "Lun Du", "Hongyu Zhang"], "year": 2022, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2203.02700", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Commit messages are important for software development and maintenance. Many neural network-based approaches have been proposed and shown promising results on automatic commit message generation. Howe", "arxiv_id": "2203.02700", "doi": "10.18653/v1/2022.emnlp-main.372"}
+{"id": "hashrag-bridging-deep-2025", "title": "HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented Generation", "authors": ["Jinyu Guo", "Xun Chen", "Qiyang Xia", "Zhaokun Wang", "Jie Ou"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2505.16133", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) encounters efficiency challenges when scaling to massive knowledge bases while preserving contextual relevance. We propose Hash-RAG, a framework that integrates de", "arxiv_id": "2505.16133", "doi": "10.48550/arXiv.2505.16133"}
+{"id": "openfoamgpt-retrievalaugmented-large-2025", "title": "OpenFOAMGPT: A retrieval-augmented large language model (LLM) agent for OpenFOAM-based computational fluid dynamics", "authors": ["Sandeep Pandey", "Ran Xu", "Wenkang Wang", "Xu Chu"], "year": 2025, "venue": "The Physics of Fluids", "source_url": "https://arxiv.org/abs/2501.06327", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work presents a large language model (LLM)-based agent OpenFOAMGPT tailored for OpenFOAM-centric computational fluid dynamics (CFD) simulations, leveraging two foundation models from OpenAI: the ", "arxiv_id": "2501.06327", "doi": "10.1063/5.0257555"}
+{"id": "coderagbench-can-retrieval-2024", "title": "CodeRAG-Bench: Can Retrieval Augment Code Generation?", "authors": ["Z. Z. Wang", "Akari Asai", "Xinyan Velocity Yu", "Frank F. Xu", "Yiqing Xie"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.14497", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While language models (LMs) have proven remarkably adept at generating code, many programs are challenging for LMs to generate using their parametric knowledge alone. Providing external contexts such ", "arxiv_id": "2406.14497", "doi": "10.48550/arXiv.2406.14497"}
+{"id": "refine-medical-diagnosis-2025", "title": "Refine Medical Diagnosis Using Generation Augmented Retrieval and Clinical Practice Guidelines", "authors": ["Wenhao Li", "Hongkuan Zhang", "Hongwei Zhang", "Zheng Li", "Z. Dong"], "year": 2025, "venue": "IEEE journal of biomedical and health informatics", "source_url": "https://arxiv.org/abs/2506.21615", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current medical language models, adapted from large language models, typically predict ICD code-based diagnosis from electronic health records (EHRs) because these labels are readily available. Howeve", "arxiv_id": "2506.21615", "doi": "10.48550/arXiv.2506.21615"}
+{"id": "sacl-understanding-combating-2025", "title": "SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization", "authors": ["Dhruv Gupta", "Gayathri Ganesh Lakshmy", "Yiqing Xie"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2506.20081", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Code Generation (RACG) is a critical technique for enhancing code generation by retrieving relevant information. In this work, we conduct an in-depth analysis of code retrieval by ", "arxiv_id": "2506.20081", "doi": "10.48550/arXiv.2506.20081"}
+{"id": "when-llms-meet-2025", "title": "When LLMs Meet API Documentation: Can Retrieval Augmentation Aid Code Generation Just as It Helps Developers?", "authors": ["Jingyi Chen", "Songqiang Chen", "Jialun Cao", "Jiasi Shen", "S. Cheung"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.15231", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) has increasingly shown its power in extending large language models' (LLMs') capability beyond their pre-trained knowledge. Existing works have shown that RAG can ", "arxiv_id": "2503.15231", "doi": "10.48550/arXiv.2503.15231"}
+{"id": "evor-evolving-retrieval-2024", "title": "EvoR: Evolving Retrieval for Code Generation", "authors": ["Hongjin Su", "Shuyang Jiang", "Yuhang Lai", "Haoyuan Wu", "Boao Shi"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2402.12317", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently the retrieval-augmented generation (RAG) has been successfully applied in code generation. However, existing pipelines for retrieval-augmented code generation (RACG) employ static knowledge b", "arxiv_id": "2402.12317", "doi": "10.18653/v1/2024.findings-emnlp.143"}
+{"id": "racgdrt-retrieval-augumented-2025", "title": "RACG-DRT: Retrieval Augumented Code Generation Based on Dynamic Revision of Thoughts", "authors": ["Jingkun Shang", "Chao Zhang", "Yusong Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICBDSE65491.2025.11220101", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large models have the problems of hallucinations in the field of code generation. The causes of these hallucinations include errors in the model’s reasoning process, inadequate overall knowledge, and ", "doi": "10.1109/ICBDSE65491.2025.11220101"}
+{"id": "test-code-generation-2025", "title": "Test code generation at Ericsson using Program Analysis Augmented Fine Tuned LLMs", "authors": ["S. Krishna", "Balvinder Singh", "Sujoy Roychowdhury", "G. Sridhara", "Sourav Mazumdar"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.11006", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We describe test code generation using Large Language Models (LLMs) in Ericsson. Our input is a test step in natural language (English) and our output is code (Java) which accomplishes the test step. ", "arxiv_id": "2506.11006", "doi": "10.48550/arXiv.2506.11006"}
+{"id": "arks-active-retrieval-2024", "title": "ARKS: Active Retrieval in Knowledge Soup for Code Generation", "authors": ["Hongjin Su", "Shuyang Jiang", "Yuhang Lai", "Haoyuan Wu", "Boao Shi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2402.12317", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2402.12317"}
+{"id": "generationaugmented-query-expansion-2022", "title": "Generation-Augmented Query Expansion For Code Retrieval", "authors": ["Dong Li", "Yelong Shen", "Ruoming Jin", "Yi Mao", "Kuan Wang"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2212.10692", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, exi", "arxiv_id": "2212.10692", "doi": "10.48550/arXiv.2212.10692"}
+{"id": "perc-planasquery-example-2024", "title": "PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation", "authors": ["Jaeseok Yoo", "Hojae Han", "Youngwon Lee", "Jaejin Kim", "Seung-won Hwang"], "year": 2024, "venue": "International Conference on Computational Linguistics", "source_url": "https://arxiv.org/abs/2412.12447", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code generation with large language models has shown significant promise, especially when employing retrieval-augmented generation (RAG) with few-shot examples. However, selecting effective examples t", "arxiv_id": "2412.12447", "doi": "10.48550/arXiv.2412.12447"}
+{"id": "enhancing-code-generation-2024", "title": "Enhancing Code Generation Through Retrieval of Cross-Lingual Semantic Graphs", "authors": ["Zhijie Jiang", "Zejian Shi", "Xinyu Gao", "Yun Xiong"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/APSEC65559.2024.00026", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the field of software engineering automation, code language models have made significant strides in code generation tasks. However, due to the cost of updating knowledge and the issue of hallucinat", "doi": "10.1109/APSEC65559.2024.00026"}
+{"id": "verirag-retrievalaugmented-framework-2025", "title": "VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair", "authors": ["Haomin Qi", "Yuyang Du", "Lihao Zhang", "S. Liew", "Kexin Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.15664", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated immense potential in computer-aided design (CAD), particularly for automated debugging and verification within electronic design automation (EDA) tools. ", "arxiv_id": "2507.15664", "doi": "10.48550/arXiv.2507.15664"}
+{"id": "automated-formalization-conceptual-2025", "title": "Automated Formalization via Conceptual Retrieval-Augmented LLMs", "authors": ["Wangyue Lu", "Lun Du", "Sirui Li", "Ke Weng", "Haozhe Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.06931", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Interactive theorem provers (ITPs) require manual formalization, which is labor-intensive and demands expert knowledge. While automated formalization offers a potential solution, it faces two major ch", "arxiv_id": "2508.06931", "doi": "10.48550/arXiv.2508.06931"}
+{"id": "rallmpoi-retrievalaugmented-llm-2025", "title": "RALLM-POI: Retrieval-Augmented LLM for Zero-shot Next POI Recommendation with Geographical Reranking", "authors": ["Kunrong Li", "Kwan Hui Lim"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.17066", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Next point-of-interest (POI) recommendation predicts a user's next destination from historical movements. Traditional models require intensive training, while LLMs offer flexible and generalizable zer", "arxiv_id": "2509.17066", "doi": "10.48550/arXiv.2509.17066"}
+{"id": "oiassistant-retrieval-augmented-2025", "title": "OI-Assistant: A Retrieval Augmented System for Similar Problem Discovery and Interactive Learning in Competitive Programming", "authors": ["Yu Su", "Ping Nie", "Xin Meng"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.15388/ioi.2025.09", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Competitive programming (CP) often requires quickly identifying relevant problems\nand solutions, yet current online judge (OJ) platforms offer only limited keyword or tag-based search.This makes it di", "doi": "10.15388/ioi.2025.09"}
+{"id": "generative-retrievalaugmented-ontologic-2024", "title": "Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design", "authors": ["Markus J. Buehler"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1021/acsengineeringau.3c00058", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design, and manufacturing, including their capacity to work effectively with human language, symb", "doi": "10.1021/acsengineeringau.3c00058"}
+{"id": "raml-retrievalaugmented-localization-2025", "title": "RAML: Toward Retrieval-Augmented Localization of Malicious Payloads in Android Apps", "authors": ["Tiezhu Sun", "Marco Alecci", "Yewei Song", "Xunzhu Tang", "Kisub Kim"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ASE63991.2025.00351", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Android malware detection and family classification have been extensively studied, yet localizing the exact malicious payloads within a detected sample remains a challenging and labor-intensive task. ", "doi": "10.1109/ASE63991.2025.00351"}
+{"id": "contextaugmented-code-generation-2024", "title": "Context-Augmented Code Generation Using Programming Knowledge Graphs", "authors": ["Iman Saberi", "Fatemeh Fard"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.18251", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) excel at code generation but struggle with complex problems. Retrieval-Augmented Generation (RAG) mitigates this issue by integrating external knowledge, yet retrieval mod", "arxiv_id": "2410.18251", "doi": "10.48550/arXiv.2410.18251"}
+{"id": "building-coding-assistant-2024", "title": "Building a Coding Assistant via the Retrieval-Augmented Language Model", "authors": ["Xinze Li", "Hanbin Wang", "Zhenghao Liu", "Shi Yu", "Shuo Wang"], "year": 2024, "venue": "ACM Trans. Inf. Syst.", "source_url": "https://arxiv.org/abs/2410.16229", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pretrained language models have shown strong effectiveness in code-related tasks, such as code retrieval, code generation, code summarization, and code completion tasks. In this article, we propose CO", "arxiv_id": "2410.16229", "doi": "10.1145/3695868"}
+{"id": "retrievalaugmented-instruction-tuning-2024", "title": "Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection", "authors": ["Sakhinana Sagar Srinivas", "Geethan Sannidhi", "Venkataramana Runkana"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.15866", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The current technology landscape lacks a foundational AI model for solving process engineering calculations. In this work, we introduce a novel autonomous agent framework leveraging Retrieval-Augmente", "arxiv_id": "2408.15866", "doi": "10.48550/arXiv.2408.15866"}
+{"id": "satyrn-platform-analytics-2024", "title": "Satyrn: A Platform for Analytics Augmented Generation", "authors": ["Marko Sterbentz", "Cameron Barrie", "Shubham Shahi", "Abhratanu Dutta", "Donna Hooshmand"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.12069", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. H", "arxiv_id": "2406.12069", "doi": "10.48550/arXiv.2406.12069"}
+{"id": "goodtriever-adaptive-toxicity-2023", "title": "Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models", "authors": ["Luiza Pozzobon", "B. Ermiş", "Patrick Lewis", "Sara Hooker"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2310.07589", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Fu", "arxiv_id": "2310.07589", "doi": "10.48550/arXiv.2310.07589"}
+{"id": "ralle-framework-developing-2023", "title": "RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models", "authors": ["Yasuto Hoshi", "D. Miyashita", "Youyang Ng", "Kento Tatsuno", "Yasuhiro Morioka"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2308.10633", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented large language models (R-LLMs) combine pre-trained large language models (LLMs) with information retrieval systems to improve the accuracy of factual question-answering. However, c", "arxiv_id": "2308.10633", "doi": "10.48550/arXiv.2308.10633"}
+{"id": "repoformer-selective-retrieval-2024", "title": "Repoformer: Selective Retrieval for Repository-Level Code Completion", "authors": ["Di Wu", "W. Ahmad", "Dejiao Zhang", "M. K. Ramanathan", "Xiaofei Ma"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2403.10059", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in bot", "arxiv_id": "2403.10059", "doi": "10.48550/arXiv.2403.10059"}
+{"id": "synthetic-data-generation-2025", "title": "Synthetic Data Generation Using Large Language Models: Advances in Text and Code", "authors": ["Mihai Nadǎş", "Laura Dioşan", "Andreea Tomescu"], "year": 2025, "venue": "IEEE Access", "source_url": "https://arxiv.org/abs/2503.14023", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This survey reviews how large language models (LLMs) are transforming synthetic training data generation in both natural language and code domains. By producing artificial but task-relevant examples, ", "arxiv_id": "2503.14023", "doi": "10.1109/ACCESS.2025.3589503"}
+{"id": "graphcoder-enhancing-repositorylevel-2024", "title": "GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model", "authors": ["Wei Liu", "Ailun Yu", "Daoguang Zan", "Bo Shen", "Wei Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.07003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general cod", "arxiv_id": "2406.07003", "doi": "10.48550/arXiv.2406.07003"}
+{"id": "rtlrepocoder-repositorylevel-rtl-2025", "title": "RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation", "authors": ["Peiyang Wu", "Nan Guo", "Junliang Lv", "Xiao Xiao", "Xiaochun Ye"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.08862", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As an essential part of modern hardware design, manually writing Register Transfer Level (RTL) code such as Verilog is often labor-intensive. Following the tremendous success of large language models ", "arxiv_id": "2504.08862", "doi": "10.48550/arXiv.2504.08862"}
+{"id": "graphcoder-enhancing-repositorylevel-2024-2", "title": "GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph", "authors": ["Wei Liu", "Ailun Yu", "Daoguang Zan", "Bo Shen", "Wei Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3691620.3695054", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general cod", "doi": "10.1145/3691620.3695054"}
+{"id": "generative-retrievalaugmented-ontologic-2023", "title": "Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design", "authors": ["Markus J. Buehler"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.19998", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing, including their capacity to work effectively with both human language, ", "arxiv_id": "2310.19998", "doi": "10.48550/arXiv.2310.19998"}
+{"id": "rewriting-code-simple-2024", "title": "Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search", "authors": ["Haochen Li", "Xin Zhou", "Zhiqi Shen"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2401.04514", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In code search, the Generation-Augmented Retrieval (GAR) framework, which generates exemplar code snippets to augment queries, has emerged as a promising strategy to address the principal challenge of", "arxiv_id": "2401.04514", "doi": "10.48550/arXiv.2401.04514"}
+{"id": "chorus-zeroshot-hierarchical-2025", "title": "CHORUS: Zero-shot Hierarchical Retrieval and Orchestration for Generating Linear Programming Code", "authors": ["Tasnim Ahmed", "Salimur Choudhury"], "year": 2025, "venue": "Learning and Intelligent Optimization", "source_url": "https://arxiv.org/abs/2505.01485", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Linear Programming (LP) problems aim to find the optimal solution to an objective under constraints. These problems typically require domain knowledge, mathematical skills, and programming ability, pr", "arxiv_id": "2505.01485", "doi": "10.48550/arXiv.2505.01485"}
+{"id": "combining-large-language-2025", "title": "Combining Large Language Models with Static Analyzers for Code Review Generation", "authors": ["Imen Jaoua", "O. Sghaier", "H. Sahraoui"], "year": 2025, "venue": "IEEE Working Conference on Mining Software Repositories", "source_url": "https://arxiv.org/abs/2502.06633", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code review is a crucial but often complex, subjective, and time-consuming activity in software development. Over the past decades, significant efforts have been made to automate this process. Early a", "arxiv_id": "2502.06633", "doi": "10.1109/MSR66628.2025.00038"}
+{"id": "suffix-retrievalaugmented-language-2022", "title": "Suffix Retrieval-Augmented Language Modeling", "authors": ["Zecheng Wang", "Yik-Cheung Tam"], "year": 2022, "venue": "IEEE International Conference on Acoustics, Speech, and Signal Processing", "source_url": "https://arxiv.org/abs/2211.03053", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Causal language modeling (LM) uses word history to predict the next word. BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions. Whil", "arxiv_id": "2211.03053", "doi": "10.1109/ICASSP49357.2023.10096450"}
+{"id": "lightweight-framework-adaptive-2024", "title": "A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model", "authors": ["Wenrui Zhang", "Tiehang Fu", "Ting Yuan", "Ge Zhang", "Dong Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.10263", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in Retrieval-Augmented Generation have significantly enhanced code completion at the repository level. Various RAG-based code completion systems are proposed based on different des", "arxiv_id": "2406.10263", "doi": "10.48550/arXiv.2406.10263"}
+{"id": "singrag-translationaugmented-framework-2024", "title": "SingRAG: A Translation-Augmented Framework for Code-Mixed Singlish Processing", "authors": ["S.M.M. Rukshan J. Senanayaka", "A.W.A.D. Nethmin Dulsara Abeysekara", "M.G. Nipuni Nikeshala Premadasa"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICITR64794.2024.10857714", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces SingRAG, a novel framework for processing Singlish (Sinhala-English code-mixed language) that combines translation capabilities with retrieval-augmented generation. Built on the ", "doi": "10.1109/ICITR64794.2024.10857714"}
+{"id": "csfir-leveraging-codespecific-2024", "title": "CSFIR: Leveraging Code-Specific Features to Augment Information Retrieval in Low-Resource Code Datasets", "authors": ["Zhenyu Tong", "Chenxi Luo", "Tiejian Luo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SMC54092.2024.10831541", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: From search engines like Google to advanced applications such as Retrieval Augmented Generation integrating Large Language Model (LLM), Information Retrieval (IR) serves a crucial role. To facilitate ", "doi": "10.1109/SMC54092.2024.10831541"}
+{"id": "autovcoder-systematic-framework-2024", "title": "AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs", "authors": ["Mingzhe Gao", "Jieru Zhao", "Zhe Lin", "Wenchao Ding", "Xiaofeng Hou"], "year": 2024, "venue": "ICCD", "source_url": "https://arxiv.org/abs/2407.18333", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, the use of large language models (LLMs) for software code generation, e.g., C/C++ and Python, has proven a great success. However, LLMs still suffer from low syntactic and functional correct", "arxiv_id": "2407.18333", "doi": "10.1109/ICCD63220.2024.00033"}
+{"id": "sosecure-safer-code-2025", "title": "SOSecure: Safer Code Generation with RAG and StackOverflow Discussions", "authors": ["Manisha Mukherjee", "Vincent J. Hellendoorn"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.13654", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are widely used for automated code generation. Their reliance on infrequently updated pretraining data leaves them unaware of newly discovered vulnerabilities and evolving", "arxiv_id": "2503.13654", "doi": "10.48550/arXiv.2503.13654"}
+{"id": "gllm-selfcorrective-gcode-2025", "title": "GLLM: Self-Corrective G-Code Generation using Large Language Models with User Feedback", "authors": ["Mohamed Abdelaal", "Samuel Lokadjaja", "Gilbert Engert"], "year": 2025, "venue": "Datenbanksysteme für Business, Technologie und Web", "source_url": "https://arxiv.org/abs/2501.17584", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces GLLM, an innovative tool that leverages Large Language Models (LLMs) to automatically generate G-code from natural language instructions for Computer Numerical Control (CNC) mach", "arxiv_id": "2501.17584", "doi": "10.18420/BTW2025-29"}
+{"id": "retrieve-refine-both-2025", "title": "Retrieve, Refine, or Both? Using Task-Specific Guidelines for Secure Python Code Generation", "authors": ["Catherine Tony", "Emanuele Iannone", "Riccardo Scandariato"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSME64153.2025.00041", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly used for code generation, but they often produce code with security vulnerabilities. While techniques like fine-tuning and instruction tuning can improve ", "doi": "10.1109/ICSME64153.2025.00041"}
+{"id": "qhackbench-benchmarking-large-2025", "title": "QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges", "authors": ["Abdul Basit", "Minghao Shao", "Haider Asif", "Nouhaila Innan", "Muhammad Kashif"], "year": 2025, "venue": "2025 IEEE International Conference on Quantum Artificial Intelligence (QAI)", "source_url": "https://arxiv.org/abs/2506.20008", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in Large Language Models (LLMs) have demonstrated strong potential in code generation, yet their effectiveness in quantum computing remains underexplored. This paper benchmarks LLMs fo", "arxiv_id": "2506.20008", "doi": "10.1109/QAI63978.2025.00056"}
+{"id": "complexvcoder-llmdriven-framework-2025", "title": "ComplexVCoder: An LLM-Driven Framework for Systematic Generation of Complex Verilog Code", "authors": ["Jian Zuo", "Junzhe Liu", "Xianyong Wang", "Yicheng Liu", "Navya Goli"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.20653", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances have demonstrated the promising capabilities of large language models (LLMs) in generating register-transfer level (RTL) code, such as Verilog. However, existing LLM-based frameworks s", "arxiv_id": "2504.20653", "doi": "10.48550/arXiv.2504.20653"}
+{"id": "guarddualagent-based-backdoor-2025", "title": "GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation", "authors": ["Naizhu Jin", "Zhong Li", "Tian Zhang", "Qingkai Zeng"], "year": 2025, "venue": "International Conference on Software Engineering and Knowledge Engineering", "source_url": "https://arxiv.org/abs/2505.21425", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the widespread application of large language models in code generation, recent studies demonstrate that employing additional Chain-of-Thought generation models can significantly enhance code gene", "arxiv_id": "2505.21425", "doi": "10.18293/SEKE2025-018"}
+{"id": "secure-code-generation-2025", "title": "Towards Secure Code Generation With LLMs: A Study on Common Weakness Enumeration", "authors": ["Jianguo Zhao", "Yuqiang Sun", "Cheng Huang", "Chengwei Liu", "YaoHui Guan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2025.3619281", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated code generation has revolutionized software development, enabling developers to accelerate project timelines and reduce manual coding errors significantly. As reliance on these technologies ", "doi": "10.1109/TSE.2025.3619281"}
+{"id": "geocolab-llmbased-multiagent-2025", "title": "GeoColab: an LLM-based multi-agent collaborative framework for geospatial code generation", "authors": ["Huayi Wu", "Haoyue Jiao", "Shuyang Hou", "Jianyuan Liang", "Zhangxiao Shen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1080/17538947.2025.2569405", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: ABSTRACT Automated geospatial code generation using large language models (LLMs) faces challenges in requirement parsing, syntax adaptation, path retrieval, code validation, and spatial recognition, o", "doi": "10.1080/17538947.2025.2569405"}
+{"id": "beyond-synthetic-benchmarks-2025", "title": "Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation", "authors": ["Musfiqur Rahman", "S. Khatoonabadi", "Emad Shihab"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.26130", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated strong performance on function-level code generation benchmarks, yet real-world software development increasingly demands class-level implementations tha", "arxiv_id": "2510.26130", "doi": "10.48550/arXiv.2510.26130"}
+{"id": "sagehls-syntaxaware-astguided-2025", "title": "SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation", "authors": ["M. Zafir", "Sadik Khan", "Nowfel Mashnoor", "Mohammad Akyash", "K. Azar"], "year": 2025, "venue": "ICCD", "source_url": "https://arxiv.org/abs/2508.03558", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In today's rapidly evolving field of electronic design automation (EDA), the complexity of hardware designs is increasing, necessitating more sophisticated automation solutions. High-level synthesis (", "arxiv_id": "2508.03558", "doi": "10.1109/ICCD65941.2025.00088"}
+{"id": "modigen-large-language-2025", "title": "ModiGen: A Large Language Model-Based Workflow for Multi-Task Modelica Code Generation", "authors": ["Jiahui Xiang", "Tong Ye", "Peiyu Liu", "Yinan Zhang", "Wenhai Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.18460", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modelica is a widely adopted language for simulating complex physical systems, yet effective model creation and optimization require substantial domain expertise. Although large language models (LLMs)", "arxiv_id": "2503.18460", "doi": "10.48550/arXiv.2503.18460"}
+{"id": "paracoder-parallel-code-2025", "title": "ParaCoder: Parallel Code Generation with Large Language Model", "authors": ["Xiaowen Huang", "Xu Zhang", "Lvfang Tao", "Renjie Mao", "Nan Zhou"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3711708.3723442", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: High-performance parallel code generation is a complex and fascinating area in computer science that focuses on producing code that executes as quickly and efficiently as possible. In our paper, we de", "doi": "10.1145/3711708.3723442"}
+{"id": "apicoder-multirole-large-2025", "title": "APICoder: A Multi-Role Large Language Model Framework for API Service Call Code Generation", "authors": ["Conghui Yang", "Lei Yu", "Huafeng Su", "Xiang Zhou"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICWS67624.2025.00109", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of large language models (LLMs) in code generation, the task of API service invocation code generation faces increasing challenges such as inaccurate context understanding, ", "doi": "10.1109/ICWS67624.2025.00109"}
+{"id": "automated-code-generation-2025", "title": "Automated Code Generation and Validation for Software Components of Microcontrollers", "authors": ["Sebastian Haug", "Christoph Böhm", "D. Mayer"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.18905", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper proposes a method for generating software components for embedded systems, integrating seamlessly into existing implementations without developer intervention. We demonstrate this by automa", "arxiv_id": "2502.18905", "doi": "10.48550/arXiv.2502.18905"}
+{"id": "completion-by-comprehension-2025", "title": "Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding", "authors": ["Xinkui Zhao", "Rongkai Liu", "Yifan Zhang", "Chen Zhi", "Lufei Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.04538", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As code completion task from function-level to repository-level, leveraging contextual information from large-scale codebases becomes a core challenge. However, existing retrieval-augmented generation", "arxiv_id": "2512.04538", "doi": "10.48550/arXiv.2512.04538"}
+{"id": "llmempowered-eventchain-driven-2025", "title": "LLM-Empowered Event-Chain Driven Code Generation for ADAS in SDV systems", "authors": ["Nenad Petrovic", "Norbert Kroth", "Axel Torschmied", "Yinglei Song", "F. Pan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.21877", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents an event-chain-driven, LLM-empowered workflow for generating validated, automotive code from natural-language requirements. A Retrieval-Augmented Generation (RAG) layer retrieves r", "arxiv_id": "2511.21877", "doi": "10.48550/arXiv.2511.21877"}
+{"id": "growing-your-embodied-2025", "title": "Growing with Your Embodied Agent: A Human-in-the-Loop Lifelong Code Generation Framework for Long-Horizon Manipulation Skills", "authors": ["Yuan Meng", "Zhenguo Sun", "Max Fest", "Xukun Li", "Zhenshan Bing"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.18597", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs)-based code generation for robotic manipulation has recently shown promise by directly translating human instructions into executable code, but existing methods remain nois", "arxiv_id": "2509.18597", "doi": "10.48550/arXiv.2509.18597"}
+{"id": "empowering-ai-generate-2025", "title": "Empowering AI to Generate Better AI Code: Guided Generation of Deep Learning Projects with LLMs", "authors": ["Chengxing Xie", "Mingsheng Jiao", "Xiaodong Gu", "Beijun Shen"], "year": 2025, "venue": "Annual International Computer Software and Applications Conference", "source_url": "https://arxiv.org/abs/2504.15080", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) have been widely applied to code generation, they struggle with generating entire deep learning projects, which are characterized by complex structures, longer funct", "arxiv_id": "2504.15080", "doi": "10.1109/COMPSAC65507.2025.00175"}
+{"id": "ragpull-imperceptible-attacks-2025", "title": "RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation", "authors": ["Vasilije Stambolic", "Aritra Dhar", "Lukas Cavigelli"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.11195", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) increases the reliability and trustworthiness of the LLM response and reduces hallucination by eliminating the need for model retraining. It does so by adding exte", "arxiv_id": "2510.11195", "doi": "10.48550/arXiv.2510.11195"}
+{"id": "llmpowered-code-generation-2025", "title": "LLM-Powered Code Generation Using RAG Framework with LLaMA 3", "authors": ["V. Vethika", "Jenifer", "R. Rohitha", "M. Revathi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CSITSS67709.2025.11294234", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As the need for rapid and efficient software development increases, developers often encounter difficulties in creating code that is both accurate and suited to its context. Conventional code generati", "doi": "10.1109/CSITSS67709.2025.11294234"}
+{"id": "reactnex-modular-component-2025", "title": "React-Nex – A Modular Component Library with AI-Driven Code Generation", "authors": ["S. Wagh", "Smit Vadhel"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.55041/ijsrem44477", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: React-Nex is a modular and efficient NPM package designed to streamline the development process by enabling selective installation of only the necessary components. The project comprises three main re", "doi": "10.55041/ijsrem44477"}
+{"id": "banglaforge-llm-collaboration-2025", "title": "BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation", "authors": ["Mahir Labib Dihan", "Sadif Ahmed", "Md Nafiu Rahman"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.19122", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bangla is a low-resource language for code generation, lacking large-scale annotated datasets and tools to transform natural language specifications into executable programs. This makes Bangla-to-code", "arxiv_id": "2512.19122", "doi": "10.48550/arXiv.2512.19122"}
+{"id": "repomincoder-improving-repositorylevel-2024", "title": "RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening", "authors": ["Yifan Li", "Ensheng Shi", "Dewu Zheng", "Kefeng Duan", "Jiachi Chen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3671016.3674819", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Repository-level code generation task involves generating code at a specified location based on unfinished code with repository context. Existing research mainly rely on retrieval-augmented generation", "doi": "10.1145/3671016.3674819"}
+{"id": "cover-me-mitigating-2025", "title": "Cover Me: Mitigating Multi-Agent System Failure through Reinforcement Learning—A Technology Demonstration", "authors": ["Eric M. S. P. Veith", "Arlena Wellßow", "Torben Logemann", "Emilie Frost"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3679240.3735501", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agent systems are common subjects of investigation in power systems research. Multi-agent systems are robust and can provide guarantees with regards to convergence and solution quality, but cannot sho", "doi": "10.1145/3679240.3735501"}
+{"id": "clinnoteagents-llm-multiagent-2025", "title": "ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes", "authors": ["Rongjia Zhou", "Chengzhuo Li", "Carl Yang", "Jiaying Lu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.07081", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Heart failure (HF) is one of the leading causes of rehospitalization among older adults in the United States. Although clinical notes contain rich, detailed patient information and make up a large por", "arxiv_id": "2512.07081", "doi": "10.48550/arXiv.2512.07081"}
+{"id": "solving-iot-cascading-2023", "title": "Solving the IoT Cascading Failure Dilemma Using a Semantic Multi-agent System", "authors": ["Amal Guittoum", "François Aïssaoui", "Sébastien Bolle", "F. Boyer", "N. D. Palma"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/b378d606403aa26bae73bd9b583a86b331374e42", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "visual-multiagent-system-2025", "title": "Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow", "authors": ["Xinlei Yu", "Chengming Xu", "Gui-Min Zhang", "Yongbo He", "Zhangquan Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21789", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations ar", "arxiv_id": "2509.21789", "doi": "10.48550/arXiv.2509.21789"}
+{"id": "optimizing-llmbased-multiagent-2025", "title": "Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development", "authors": ["Ming Shen", "Raphael Shu", "Anurag Pratik", "James Gung", "Yubin Ge"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.16086", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We have seen remarkable progress in large language models (LLMs) empowered multi-agent systems solving complex tasks necessitating cooperation among experts with diverse skills. However, optimizing LL", "arxiv_id": "2505.16086", "doi": "10.48550/arXiv.2505.16086"}
+{"id": "stratus-multiagent-system-2025", "title": "STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds", "authors": ["Yinfang Chen", "Jiaqi Pan", "Jackson Clark", "Yiming Su", "Noah Zheutlin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.02009", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In cloud-scale systems, failures are the norm. A distributed computing cluster exhibits hundreds of machine failures and thousands of disk failures; software bugs and misconfigurations are reported to", "arxiv_id": "2506.02009", "doi": "10.48550/arXiv.2506.02009"}
+{"id": "profileaware-maneuvering-dynamic-2025", "title": "Profile-Aware Maneuvering: A Dynamic Multi-Agent System for Robust GAIA Problem Solving by AWorld", "authors": ["Zhitian Xie", "Qintong Wu", "Chengyue Yu", "Chenyi Zhuang", "Jinjie Gu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.09889", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) has empowered intelligent agents to leverage diverse external tools for solving complex real-world problems. However, this reliance introduces new", "arxiv_id": "2508.09889", "doi": "10.48550/arXiv.2508.09889"}
+{"id": "decentralized-multiagent-system-2025", "title": "Decentralized Multi-Agent System with Trust-Aware Communication", "authors": ["Yepeng Ding", "Ahmed Twabi", "Junwei Yu", "Lingfeng Zhang", "Tohru Kondo"], "year": 2025, "venue": "International Symposium on Image and Signal Processing and Analysis", "source_url": "https://arxiv.org/abs/2512.02410", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of Large Language Models (LLMs) is rapidly accelerating the development of autonomous multiagent systems (MAS), paving the way for the Internet of Agents. However, traditional centralize", "arxiv_id": "2512.02410", "doi": "10.1109/ISPA67752.2025.00198"}
+{"id": "survagent-hierarchical-cotenhanced-2025", "title": "SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction", "authors": ["Guolin Huang", "Wenting Chen", "Jiaqi Yang", "Xinheng Lyu", "Xiaoling Luo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.16635", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Survival analysis is critical for cancer prognosis and treatment planning, yet existing methods lack the transparency essential for clinical adoption. While recent pathology agents have demonstrated e", "arxiv_id": "2511.16635", "doi": "10.48550/arXiv.2511.16635"}
+{"id": "chatofthought-collaborative-multiagent-2025", "title": "Chat-of-Thought: Collaborative Multi-Agent System for Generating Domain Specific Information", "authors": ["Christodoulos Constantinides", "Shuxin Lin", "Nianjun Zhou", "Dhaval C Patel"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.10086", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a novel multi-agent system called Chat-of-Thought, designed to facilitate the generation of Failure Modes and Effects Analysis (FMEA) documents for industrial assets. Chat-of-Thoug", "arxiv_id": "2506.10086", "doi": "10.48550/arXiv.2506.10086"}
+{"id": "tiered-agentic-oversight-2025", "title": "Tiered Agentic Oversight: A Hierarchical Multi-Agent System for Healthcare Safety", "authors": ["Y. Kim", "H. Jeong", "Chanwoo Park", "Eugene Park", "Haipeng Zhang"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2506.12482", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) deployed as agents introduce significant safety risks in clinical settings due to their potential for error and single points of failure. We introduce Tiered Agentic Overs", "arxiv_id": "2506.12482"}
+{"id": "tiered-agentic-oversight-2025-2", "title": "Tiered Agentic Oversight: A Hierarchical Multi-Agent System for AI Safety in Healthcare", "authors": ["Y. Kim", "H. Jeong", "Chanwoo Park", "Eugene Park", "Haipeng Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2506.12482", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2506.12482"}
+{"id": "deep-reinforcement-learningenhanced-2025", "title": "A Deep Reinforcement Learning-Enhanced Multi-Agent System for Ontology-Based Health Management in Nanotechnology", "authors": ["Azanu Mirolgn Mequanenit", "Eyerusalem Alebachew Nibret", "Pilar Herrero-Martín", "Rodrigo Martínez-Béjar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/electronics14234580", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study provides a novel approach to the field of prognostics and health management (PHM) in nanotechnology: multi-agent systems integrated with ontology-based knowledge representation and Deep Rei", "doi": "10.3390/electronics14234580"}
+{"id": "abstract-4364623-development-2025", "title": "Abstract 4364623: Development of a Multi-Agent System for Cardiovascular Diagnostic", "authors": ["Sampson Kontomah", "Tamanna Nahar"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1161/circ.152.suppl_3.4364623", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a multi-agent AI system designed to provide accurate diagnostic and personalized treatment recommendations for heart attack, heart failure, cardiac arrhythmia, coronary artery dise", "doi": "10.1161/circ.152.suppl_3.4364623"}
+{"id": "netmas-efficient-network-2025", "title": "NetMAS: Efficient Network Configuration Translation with Multi-Agent System", "authors": ["Chenyang Liu", "Fuliang Li", "Naigong Zheng", "Xingwei Wang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IWQoS65803.2025.11143253", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/IWQoS65803.2025.11143253"}
+{"id": "effective-approach-reducing-2024", "title": "An effective approach for reducing data redundancy in multi-agent system communication", "authors": ["Awais Qasim", "A. Ghouri", "Adeel Munawar"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3233/MGS-230089", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The redundancy of the data is an active research topic. While an agent works in a multi-agent system, the number of messages between them increases. This is due to the fact that the functionalities da", "doi": "10.3233/MGS-230089"}
+{"id": "enhancing-multiagent-system-2024", "title": "Enhancing Multi-agent System Testing with Diversity-Guided Exploration and Adaptive Critical State Exploitation", "authors": ["Xuyan Ma", "Yawen Wang", "Junjie Wang", "Xiaofei Xie", "Boyu Wu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3650212.3680376", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems (MASs) have achieved remarkable success in multi-robot control, intelligent transportation, and multiplayer games, etc. Thorough testing for MAS is urgently needed to ensure its ro", "doi": "10.1145/3650212.3680376"}
+{"id": "adaptive-output-consensus-2024", "title": "Adaptive output consensus of heterogeneous nonlinear multi‐agent system under random link failures with partially unknown transition rates", "authors": ["Yaning Liu", "H. Fan", "Lei Liu", "Bo Wang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1002/rnc.7367", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Considering the complexity of communication environment in reality, this paper investigates the adaptive output consensus problem for heterogeneous nonlinear multi‐agent system (MAS) under random link", "doi": "10.1002/rnc.7367"}
+{"id": "multiagent-system-approach-2024", "title": "A Multi-Agent System Approach for Mitigating Partial Display Failures", "authors": ["Jacob Cappi", "Jacob D. Hauenstein"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3603287.3651189", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Digital displays are often used to display mission-critical and safety-critical data, and even just a partial failure (or obstruction) of a digital display can make some or all such data unreadable. H", "doi": "10.1145/3603287.3651189"}
+{"id": "faulttolerant-control-random-2024", "title": "Fault-tolerant control of random switching topology multi-agent system based on event triggering", "authors": ["Lingcong Ouyang", "Kaijun Yang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.15770", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, the formation control of multi-agent systems in random switching communication topology is studied, and the problem of excessive bandwidth and low control efficiency among multi-agents ", "arxiv_id": "2406.15770", "doi": "10.48550/arXiv.2406.15770"}
+{"id": "eventtriggered-adaptive-consensus-2024", "title": "Event-triggered adaptive consensus of heterogeneous multi-agent system under communication and actuator faults", "authors": ["Leyi Zheng", "Yimin Zhou"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2401.13492", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, a heterogeneous leader-followers multiagent system is studied under simultaneous time-varying communication faults and actuator faults. First, the state of the leader is modelled as the", "arxiv_id": "2401.13492"}
+{"id": "who-deserves-reward-2026", "title": "Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System", "authors": ["Yanming Li", "Xuelin Zhang", "Wenjie Lu", "Ziye Tang", "Maodong Wu"], "year": 2026, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2602.08335", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remai", "arxiv_id": "2602.08335"}
+{"id": "who-introducing-failure-2025", "title": "Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis", "authors": ["Yu Ge", "Linna Xie", "Zhong Li", "Yu Pei", "Tian Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.13782", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs a", "arxiv_id": "2509.13782", "doi": "10.48550/arXiv.2509.13782"}
+{"id": "resilient-formation-control-2025", "title": "Resilient formation control in multi-agent systems considering leader failure", "authors": ["Takuya Murakami", "Toru Namerikawa"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1080/18824889.2025.2510766", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This research investigates how to control multi-agent systems to maintain formation even when any agent, including the leader fails. We introduce cooperative control approach to solve the formation pr", "doi": "10.1080/18824889.2025.2510766"}
+{"id": "timevarying-topology-formation-2023", "title": "Time-Varying Topology Formation Reconfiguration Control of the Multi-Agent System Based on the Improved Hungarian Algorithm", "authors": ["Yingxue Zhang", "Meng Chen", "Jinbao Chen", "Chuanzhi Chen", "Hongzhi Yu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.3390/app132011581", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Distributed time-varying formation technology for multi-agent systems is recently become a research hotspot in formation control field. However, the formation reconfiguration control technology for ag", "doi": "10.3390/app132011581"}
+{"id": "neural-networkbased-optimal-2023", "title": "Neural network–based optimal fault compensation control of the nonlinear multi-agent system and its application to UAVs formation flight", "authors": ["D. Duan", "Chunsheng Liu", "Jiao Dai", "Jingliang Sun"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1177/09596518231162759", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article investigates the optimal consensus problem for unmanned aerial vehicle formation systems with actuator faults based on nonlinear multi-agent systems. Initially, for fault-free multi-agent", "doi": "10.1177/09596518231162759"}
+{"id": "divergent-thoughts-one-2025", "title": "Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation", "authors": ["Haoyuan Wu", "Haisheng Zheng", "Zhuolun He", "Bei Yu"], "year": 2025, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.10857", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, with the development of tool-calling capabilities in large language models (LLMs), these models have demonstrated significant potential for automating electronic design automation (EDA) flow", "arxiv_id": "2502.10857", "doi": "10.48550/arXiv.2502.10857"}
+{"id": "experimental-evaluation-multiagent-2023", "title": "Experimental Evaluation of Multi-Agent System Collision-Free Coordination Using Ideal Fluid-Flow Models", "authors": ["Harshvardhan Uppaluru", "H. Rastgoftar"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2301.05833", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2301.05833"}
+{"id": "optimized-leaderfollower-formation-2025", "title": "Optimized Leader‐Follower Formation Fault‐Tolerant Control Using Reinforcement Learning for a Class of Nonlinear Multi‐Agent Systems Having Actuator Failure", "authors": ["Lingyu Zhang", "Guanlong Li", "Rong-Yue Liu", "Guoxing Wen"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/acs.4045", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work aims to address the optimized formation fault‐tolerant control issue by utilizing reinforcement learning (RL) for the single integral dynamic multi‐agent system (MAS) having actuator faults.", "doi": "10.1002/acs.4045"}
+{"id": "robust-replanning-multiagent-2025", "title": "Robust Replanning for Multi-Agent SmallSat Inspection in Failure Scenarios", "authors": ["Markus Iversflaten", "Alexander Hansson", "David C. Sternberg", "Oliver Jia-Richards", "Keenan Albee"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.2514/6.2025-0183", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Future space structures like the proposed Lunar Gateway may benefit from autonomous inspection and maintenance, especially during extended uncrewed periods. Small satellites (SmallSats) offer a low-co", "doi": "10.2514/6.2025-0183"}
+{"id": "adaptive-faulttolerant-control-2025", "title": "Adaptive Fault-Tolerant Control of Multi-Agent Systems Under Actuator Failure", "authors": ["Hongyan Xie", "Zhao Kun", "Dong Zhao", "Wenjing Ren"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/RCAE66389.2025.11355294", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article discusses the cooperative control problem of multi-agent systems (MASs) under actuator failures, proposing a distributed fault-tolerant control strategy based on an adaptive control archi", "doi": "10.1109/RCAE66389.2025.11355294"}
+{"id": "multiagentbased-failure-modeling-2025", "title": "Multi-agent-based failure modeling for uncrewed swarm systems considering cross-layer diffusion characteristics", "authors": ["Xing Guo", "Qiang Feng", "Zeyu Wu", "Meng Liu", "Yi Ren"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.ress.2025.110831", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.ress.2025.110831"}
+{"id": "semantic-communications-partially-2025", "title": "Semantic Communications for Partially Observable Multi-Agent Reinforcement Learning-Based Unmanned Aerial Vehicles Monitoring System", "authors": ["Tiange Xiang", "Seungwoo Seo", "Sungwon Yi", "Minseok Choi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICC52391.2025.11161345", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In multi-agent reinforcement learning (MARL) environments with partial observability, agents lack access to a global state, limiting their decision-making capabilities. To address this, agents can sha", "doi": "10.1109/ICC52391.2025.11161345"}
+{"id": "comprehensive-survey-transplantmatch-2025", "title": "Comprehensive Survey on TransplantMatch Agent: A Multi-Agent ML System for Kidney and Liver Transplant Compatibility Prediction", "authors": ["Deepak Na", "N. Harshitha Reddy", "Revanth Reddy", "Sahana P Jain", "Tanisha Sharma"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICoICI65217.2025.11253079", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Organ transplantation is a important treatment choice for patients with end-stage kidney and liver failure. However, powerful donor-recipient matching and long-term transplant success continue to be h", "doi": "10.1109/ICoICI65217.2025.11253079"}
+{"id": "datadriven-adaptive-distributed-2024", "title": "Data-Driven Adaptive Distributed Localization of Multi-Agent Systems With Sensor Failure", "authors": ["Yunkai Lv", "Hongliang Ren", "Hao Zhang", "Zhuping Wang", "Huaicheng Yan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TIE.2023.3342272", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work solves the localization estimation of dynamic multi-agent systems (MASs) with sensor multiplicative failures, which is more general yet challenging to address than static sensor networks wit", "doi": "10.1109/TIE.2023.3342272"}
+{"id": "research-formation-selfhealing-2022", "title": "Research on Formation Self-healing Method of Multi-agent System Based on Manta Ray Foraging Optimization Algorithm", "authors": ["Yong-Han Guo", "Zonglei Mou", "Youqing Wang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CAC57257.2022.10055355", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems are composed of a series of independent or interacting individuals. Faults are inevitable due to the fact that they often work in complex environments. Fault of agents in a formati", "doi": "10.1109/CAC57257.2022.10055355"}
+{"id": "faulttolerant-control-multiagent-2023", "title": "Fault‐tolerant control of multi‐agent systems with input delay and sensor failure", "authors": ["M. Syed Ali", "M. Mubeen Tajudeen", "G. Rajchakit", "Porpattama Hammachukiattikul", "Jinde Cao"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1002/asjc.3157", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The problem of fault‐tolerant control (FTC) for a multi‐agent system (MAS) with input delay and sensor failures is addressed in this study. The topology of communication is an undirected subgraph havi", "doi": "10.1002/asjc.3157"}
+{"id": "mtfiltersbased-eventtriggered-adaptive-2023", "title": "MT‐filters‐based event‐triggered adaptive prescribed performance tracking control of multi‐agent systems with unknown direction actuator failure", "authors": ["Penghao Chen", "X. Luan", "Fei Liu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1002/rnc.6817", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this article, to ensure better performance and simultaneously save resources, an event‐triggered adaptive prescribed performance cooperative dynamic surface control (DSC) strategy is proposed for a", "doi": "10.1002/rnc.6817"}
+{"id": "dmaidps-distributed-multiagent-2022", "title": "DMAIDPS: a distributed multi-agent intrusion detection and prevention system for cloud IoT environments", "authors": ["A. Javadpour", "P. Pinto", "F. Ja’fari", "Weizhe Zhang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10586-022-03621-3", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10586-022-03621-3"}
+{"id": "dynamic-charging-path-2025", "title": "Dynamic Charging and Path Planning for UAV-Powered Rechargeable WSNs Using Multi-Agent Deep Reinforcement Learning", "authors": ["Mesfin Leranso Betalo", "Supeng Leng", "Abegaz Mohammed Seid", "Hayla Nahom Abishu", "Aiman M. Erbad"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TASE.2025.3558945", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unmanned Aerial Vehicle (UAV)-powered 5G/6G networks integrated with rechargeable wireless sensor networks (RWSNs) offer promising solutions for extending system lifetime, collecting data, and providi", "doi": "10.1109/TASE.2025.3558945"}
+{"id": "visual-analysis-failure-2023", "title": "Visual analysis on failure mitigation in multi-agent systems", "authors": ["Shu Liu", "Jingyi Xue", "Yingkang Zhang", "Yibo Guo", "Ming Xu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MDM58254.2023.00059", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In many safety-critical applications, the response to emergencies is one of the most important components in maintaining the stability of workshop production and transportation. However, a robust emer", "doi": "10.1109/MDM58254.2023.00059"}
+{"id": "observability-largescale-multiagent-2026", "title": "Observability in Large-Scale Multi-Agent Ecosystems: Coordination, Emergence, and Failure Modes", "authors": ["Kailash Thiyagarajan", "Chirag Agrawal", "Udaya Veeramreddygari"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICAIC67076.2026.11395863", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large scale multi-agent systems like autonomous vehicle fleets, smart grids etc. adds an extra whoop in system complexity. But beneath that something that is sort of I think decentralized and adaptive", "doi": "10.1109/ICAIC67076.2026.11395863"}
+{"id": "cascade-failure-management-2022", "title": "Cascade Failure Management in Distributed Smart Grid Using Multi-Agent Control", "authors": ["Muhammad Ikram", "Salman Ahmed", "Safdar N. Khan"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICET56601.2022.10004671", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The interdependent network of centralized power systems makes the system more vulnerable to failures. The existing cyber-physical architecture of power systems possesses single-point contingency i.e. ", "doi": "10.1109/ICET56601.2022.10004671"}
+{"id": "supply-chain-information-2022", "title": "Supply Chain Information Collaborative Simulation Model Integrating Multi-Agent and System Dynamics", "authors": ["Ning Yang", "Ying-Jan Ding", "Junge Leng", "Lei Zhang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.7307/ptt.v34i5.4092", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Supply chain collaboration management is a systematic, integrated and agile advanced management mode, which helps to improve the competitiveness of enterprises and the entire supply chain. In order to", "doi": "10.7307/ptt.v34i5.4092"}
+{"id": "robustness-cloud-manufacturing-2022", "title": "Robustness of Cloud Manufacturing System Based on Complex Network and Multi-Agent Simulation", "authors": ["Xin Zheng", "Xiaodong Zhang"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.3390/e25010045", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Cloud manufacturing systems (CMSs) are networked, distributed and loosely coupled, so they face great uncertainty and risk. This paper combines the complex network model with multi-agent simulation in", "doi": "10.3390/e25010045"}
+{"id": "faulttolerant-consensus-control-2022", "title": "Fault-Tolerant Consensus Control for Nonlinear Multi-Agent Systems Based on PDE With Actuator Failure", "authors": ["Q. Qi", "Chuanhai Yang", "Xu Yan", "Chengdong Yang", "Zhaodong Liu"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/YAC57282.2022.10023576", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A class of nonlinear multi-agent systems (MASs) with Spatio-temporal properties is described using partial differential equation(PDE) in this paper, and fault-tolerant control of such systems is furth", "doi": "10.1109/YAC57282.2022.10023576"}
+{"id": "preventing-rogue-agents-2025", "title": "Preventing Rogue Agents Improves Multi-Agent Collaboration", "authors": ["Ohav Barbi", "Ori Yoran", "Mor Geva"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.05986", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems, where specialized agents collaborate to solve a shared task hold great potential, from increased modularity to simulating complex environments. However, they also have a major cav", "arxiv_id": "2502.05986", "doi": "10.48550/arXiv.2502.05986"}
+{"id": "exploring-education-as-2024", "title": "Exploring Education as a Complex System: Computational Educational Research with Multi-Level Agent-Based Modeling", "authors": ["John Vulic", "Michael J. Jacobson", "James A. Levin"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/educsci14050551", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Our study employs multi-level agent-based modeling and computational techniques to explore education as a complex system. With an underlying focus that education should be underpinned by a scientific ", "doi": "10.3390/educsci14050551"}
+{"id": "agentfm-roleaware-failure-2025", "title": "AgentFM: Role-Aware Failure Management for Distributed Databases with LLM-Driven Multi-Agents", "authors": ["Lingzhe Zhang", "Yunpeng Zhai", "Tong Jia", "Xiaosong Huang", "Chiming Duan"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2504.06614", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Distributed databases are critical infrastructures for today's large-scale software systems, making effective failure management essential to ensure software availability. However, existing approaches", "arxiv_id": "2504.06614", "doi": "10.1145/3696630.3728492"}
+{"id": "impact-charging-infrastructure-2025", "title": "Impact of charging infrastructure construction on electric vehicle diffusion based on a multi-agent model", "authors": ["Yingying Zheng", "Donghui Liu", "Feng An", "Jian Wang", "Xiangyun Gao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.isci.2025.112257", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Summary To explore the impact of charging infrastructure on electric vehicles (EVs) diffusion, a multi-agent model of EVs-charging infrastructure construction (EV-CIC) is established based on complex ", "doi": "10.1016/j.isci.2025.112257"}
+{"id": "aidriven-traffic-flow-2025", "title": "AI‐Driven Traffic Flow Prediction and Anomaly Detection in Smart Cities: A Multi‐Agent Approach", "authors": ["M. Shabaz", "K. N. Raju"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/ett.70279", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In smart cities, AI‐driven traffic flow prediction and anomaly detection play a crucial role in optimizing urban mobility and reducing congestion. Traditional traffic management systems (TMS) often st", "doi": "10.1002/ett.70279"}
+{"id": "evomarl-coevolutionary-multiagent-2025", "title": "Evo-MARL: Co-Evolutionary Multi-Agent Reinforcement Learning for Internalized Safety", "authors": ["Zhenyu Pan", "Yiting Zhang", "Yutong Zhang", "Jianshu Zhang", "Haozheng Luo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.03864", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems (MAS) built on multimodal large language models exhibit strong collaboration and performance. However, their growing openness and interaction complexity pose serious risks, notably", "arxiv_id": "2508.03864", "doi": "10.48550/arXiv.2508.03864"}
+{"id": "matrix-multiagent-simulation-2025", "title": "MATRIX: Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation", "authors": ["Ernest Lim", "Yajie He", "Jared Joselowitz", "K. Preston", "Mohita Chowdhury"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.19163", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite the growing use of large language models (LLMs) in clinical dialogue systems, existing evaluations focus on task completion or fluency, offering little insight into the behavioral and risk man", "arxiv_id": "2508.19163", "doi": "10.48550/arXiv.2508.19163"}
+{"id": "streamlining-resilient-kubernetes-2025", "title": "Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework", "authors": ["Julien Soul'e", "Jean-Paul Jamont", "M. Occello", "Louis-Marie Traonouez", "Paul Th'eron"], "year": 2025, "venue": "IEEE International Conference on Cloud Computing", "source_url": "https://arxiv.org/abs/2505.21559", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In cloud-native systems, Kubernetes clusters with interdependent services often face challenges to their operational resilience due to poor workload management issues such as resource blocking, bottle", "arxiv_id": "2505.21559", "doi": "10.1109/CLOUD67622.2025.00015"}
+{"id": "architecture-design-multiagent-2025", "title": "Architecture Design of Multi-Agent LLM Systems in Railway Data Governance", "authors": ["Yiyan Cui", "Yanmei Guo", "Chenying Ren", "Chao Zhang", "Shi Shu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/HPCC67675.2025.00184", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: To address the challenges of manual dependency and complex management in railway data governance, and to promote data value realization, we propose DGMAS, a Data Governance Multi-Agent System based on", "doi": "10.1109/HPCC67675.2025.00184"}
+{"id": "unifying-quantitative-security-2025", "title": "Towards Unifying Quantitative Security Benchmarking for Multi Agent Systems", "authors": ["G. Sharma", "Vidhi Kulkarni", "Miles King", "Ken Huang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2507.21146", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evolving AI systems increasingly deploy multi-agent architectures where autonomous agents collaborate, share information, and delegate tasks through developing protocols. This connectivity, while powe", "arxiv_id": "2507.21146", "doi": "10.48550/arXiv.2507.21146"}
+{"id": "from-mas-mars-2025", "title": "From MAS to MARS: Coordination Failures and Reasoning Trade-offs in Hierarchical Multi-Agent Robotic Systems within a Healthcare Scenario", "authors": ["Yuanchen Bai", "Zijian Ding", "Shaoyue Wen", "X. Chang", "Angelique Taylor"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.04691", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent robotic systems (MARS) build upon multi-agent systems by integrating physical and task-related constraints, increasing the complexity of action execution and agent coordination. However, d", "arxiv_id": "2508.04691", "doi": "10.48550/arXiv.2508.04691"}
+{"id": "distributionally-robust-cascading-2025", "title": "Distributionally Robust Cascading Risk Quantification in Multi-Agent Rendezvous: Effects of Time Delay and Network Connectivity", "authors": ["Vivek Pandey", "N. Motee"], "year": 2025, "venue": "IEEE Conference on Decision and Control", "source_url": "https://arxiv.org/abs/2507.23489", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Achieving safety in autonomous multi-agent systems is a critical challenge. In this paper, we propose a distributionally robust risk framework for analyzing cascading failures in multi-agent rendezvou", "arxiv_id": "2507.23489", "doi": "10.1109/CDC57313.2025.11312158"}
+{"id": "faulttolerant-timevarying-formation-2025", "title": "Fault‐Tolerant Time‐Varying Formation Tracking for Multi‐Agent Systems With Varying Number of Agents and Mixed Cyber Attacks", "authors": ["Kunzhong Miao", "Chang Wang", "Yifeng Niu", "Huangzhi Yu", "Tianqing Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1002/rnc.8048", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper proposes a fault‐tolerant time‐varying formation tracking control scheme for second‐order multi‐agent systems with varying numbers of agents and mixed cyber attacks. Initially, compared to ", "doi": "10.1002/rnc.8048"}
+{"id": "adaptive-visionbased-coverage-2025", "title": "Adaptive Vision-Based Coverage Optimization in Mobile Wireless Sensor Networks: A Multi-Agent Deep Reinforcement Learning Approach", "authors": ["Parham Soltani", "Mehrshad Eskandarpour", "Sina Heidari", "Farnaz Alizadeh", "Hossein Soleimani"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.14676", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Traditional Wireless Sensor Networks (WSNs) typically rely on pre-analysis of the target area, network size, and sensor coverage to determine initial deployment. This often results in significant over", "arxiv_id": "2508.14676", "doi": "10.48550/arXiv.2508.14676"}
+{"id": "framework-autonomous-crosscloud-2025", "title": "A Framework for Autonomous, Cross-Cloud Threat Mitigation Using Multi-Agent Reinforcement Learning", "authors": ["Akshay Mittal"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.63412/kb44xf51", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \nThe rapid enterprise adoption of multi-cloud, microservice architectures introduces unprecedented complexity and security challenges. Traditional, reactive security models are proving inadequate, as ", "doi": "10.63412/kb44xf51"}
+{"id": "advevomarl-shaping-internalized-2025", "title": "AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning", "authors": ["Zhenyu Pan", "Yiting Zhang", "Zhuo Liu", "Yolo Yunlong Tang", "Zeliang Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.01586", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based multi-agent systems excel at planning, tool use, and role coordination, but their openness and interaction complexity also expose them to jailbreak, prompt-injection, and adversarial collabo", "arxiv_id": "2510.01586", "doi": "10.48550/arXiv.2510.01586"}
+{"id": "digital-twinbased-cooperative-2025", "title": "Digital Twin-based Cooperative Autonomous Driving in Smart Intersections: A Multi-Agent Reinforcement Learning Approach", "authors": ["Tao Yu", "Kui Wang", "Zongdian Li", "Tao Yu", "Kei Sakaguchi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.15099", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unsignalized intersections pose safety and efficiency challenges due to complex traffic flows and blind spots. In this paper, a digital twin (DT)-based cooperative driving system with roadside unit (R", "arxiv_id": "2509.15099", "doi": "10.48550/arXiv.2509.15099"}
+{"id": "diagnose-localize-align-2025", "title": "Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts", "authors": ["Guancheng Wan", "Lei Sun", "Longxu Dou", "Zitong Shi", "Fang Wu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23188", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM)-powered multi-agent systems (MAS) have rapidly advanced collaborative reasoning, tool use, and role-specialized coordination in complex tasks. However, reliability-critical ", "arxiv_id": "2509.23188", "doi": "10.48550/arXiv.2509.23188"}
+{"id": "fault-tolerant-multiagent-2025", "title": "Fault Tolerant Multi-Agent Learning with Adversarial Budget Constraints", "authors": ["D. Mguni", "Yaqi Sun", "Haojun Chen", "Amir Darabi", "Larry Olanrewaju Orimoloye"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2508.08800", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2508.08800"}
+{"id": "multiagent-reinforcement-learningbased-2025", "title": "Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections", "authors": ["Tao Yu", "Kui Wang", "Zongdian Li", "Tao Yu", "Kei Sakaguchi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.04231", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unsignalized intersections pose significant safety and efficiency challenges due to complex traffic flows. This paper proposes a novel roadside unit (RSU)-centric cooperative driving system leveraging", "arxiv_id": "2505.04231", "doi": "10.48550/arXiv.2505.04231"}
+{"id": "evaluating-multiagent-ai-2025", "title": "Evaluating Multi-Agent AI Systems for Automated Bug Detection and Code Refactoring", "authors": ["Tanveer Aamina", "Mohammed Zaid", "S. Huda"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.22214/ijraset.2025.74423", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper evaluates multi-agent AI systems for automating software bug detection and code refactoring. We design a\ncooperative architecture in which specialized agents—static-analysis, test-generatio", "doi": "10.22214/ijraset.2025.74423"}
+{"id": "agentask-multiagent-systems-2025", "title": "AgentAsk: Multi-Agent Systems Need to Ask", "authors": ["Bohan Li", "Kuo Yang", "Ying Lai", "Yudong Zhang", "Chen Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.07593", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent systems (MAS) built on large language models promise improved problem-solving through collaboration, yet they often fail to consistently outperform strong single-agent baselines due to err", "arxiv_id": "2510.07593", "doi": "10.48550/arXiv.2510.07593"}
+{"id": "distributionally-robust-cascading-2025-2", "title": "Distributionally Robust Cascading Risk in Multi-Agent Rendezvous: Extended Analysis of Parameter-Induced Ambiguity", "authors": ["Vivek Pandey", "N. Motee"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.20914", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Ensuring safety in autonomous multi-agent systems during time-critical tasks such as rendezvous is a fundamental challenge, particularly under communication delays and uncertainty in system parameters", "arxiv_id": "2511.20914", "doi": "10.48550/arXiv.2511.20914"}
+{"id": "multiagent-trustworthy-consensus-2025", "title": "Multi-Agent Trustworthy Consensus under Random Dynamic Attacks", "authors": ["Orhan Eren Akgün", "Sarper Aydin", "Stephanie Gil", "Angelia Nedi'c"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.07189", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we study the consensus problem in which legitimate agents send their values over an undirected communication network in the presence of an unknown subset of malicious or faulty agents. I", "arxiv_id": "2504.07189", "doi": "10.48550/arXiv.2504.07189"}
+{"id": "sqlfixagent-semanticaccurate-texttosql-2024", "title": "SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration", "authors": ["Jipeng Cen", "Jiaxin Liu", "Zhixu Li", "Jingjing Wang"], "year": 2024, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2406.13408", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While fine-tuned large language models (LLMs) excel in generating grammatically valid SQL in Text-to-SQL parsing, they often struggle to ensure semantic accuracy in queries, leading to user confusion ", "arxiv_id": "2406.13408", "doi": "10.1609/aaai.v39i1.31979"}
+{"id": "neural-networkbased-hierarchical-2024", "title": "Neural Network-Based Hierarchical Fault-Tolerant Affine Formation Control for Heterogeneous Nonlinear Multi-Agent Systems", "authors": ["Haiqing Wang", "Jiuxiang Dong"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TITS.2023.3322689", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Under a premise of universal rigidity, the affine formation control method based on stress matrix can solve the formation maneuvering problem well. However, a failure of the agent in the system can ea", "doi": "10.1109/TITS.2023.3322689"}
+{"id": "multiagent-vqa-exploring-2024", "title": "Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering", "authors": ["Bowen Jiang", "Zhijun Zhuang", "Shreyas S. Shivakumar", "Dan Roth", "C. J. Taylor"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2403.14783", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitatio", "arxiv_id": "2403.14783", "doi": "10.48550/arXiv.2403.14783"}
+{"id": "metmapf-metamorphic-testing-2024", "title": "MET-MAPF: A Metamorphic Testing Approach for Multi-Agent Path Finding Algorithms", "authors": ["Xiaoyu Zhang", "Yang Liu", "Paolo Arcaini", "Mingyue Jiang", "Zheng Zheng"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3669663", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Multi-Agent Path Finding (MAPF) problem, i.e., the scheduling of multiple agents to reach their destinations, has been widely investigated. Testing MAPF systems is challenging, due to the complexi", "doi": "10.1145/3669663"}
+{"id": "sqlfixagent-semanticaccurate-sql-2024", "title": "SQLFixAgent: Towards Semantic-Accurate SQL Generation via Multi-Agent Collaboration", "authors": ["Jipeng Cen", "Jiaxin Liu", "Zhixu Li", "Jingjing Wang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2406.13408", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2406.13408"}
+{"id": "faulttolerant-timevarying-formation-2024", "title": "Fault-Tolerant Time-Varying Formation Trajectory Tracking Control for Multi-Agent Systems with Time Delays and Semi-Markov Switching Topologies", "authors": ["Huangzhi Yu", "Kunzhong Miao", "Zhiqing He", "Hong Zhang", "Yifeng Niu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/drones8120778", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The fault-tolerant time-varying formation (TVF) trajectory tracking control problem is investigated in this paper for uncertain multi-agent systems (MASs) with external disturbances subject to time de", "doi": "10.3390/drones8120778"}
+{"id": "intelligent-elevator-control-2024", "title": "Intelligent Elevator Control Using Decentralized Multi-Agent Systems", "authors": ["Atef Gharbi", "Faheed A. F. Alrslani", "Mohamed Ayari", "Yamen El Touati", "Akil El Kamel"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.52783/jes.6482", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A multi-agent system (MAS) framework optimizes elevator operations in high-rise buildings with dynamic traffic patterns and fluctuating passenger demands. With the proposed MAS approach, which utilize", "doi": "10.52783/jes.6482"}
+{"id": "observerbased-resilient-consensus-2024", "title": "Observer-based Resilient Consensus for Multi-agent Systems Modeled by PDEs under DoS Attacks", "authors": ["Chuanhai Yang", "Qingshan Liu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICICIP60808.2024.10477848", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper addresses the problem of achieving resilient consensus in multi-agent systems (MASs) under denial- of-service (DoS) attacks. The system is described by semi-linear reaction-diffusion partia", "doi": "10.1109/ICICIP60808.2024.10477848"}
+{"id": "multi-robot-cooperative-2024", "title": "Multi Robot Cooperative Control Based on Edge Computing and Multi Agent Reinforcement Learning with Cooperative Planning", "authors": ["Huihong Yuan", "Chaoyang Xu", "Feng Tu", "Aijun Ma", "Hong Shi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICIICS63763.2024.10859967", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this era, multi-robot systems play vital role in the sector of environmental monitoring, disaster response, and smart manufacturing. Previous researchers have suggested various traditional methods ", "doi": "10.1109/ICIICS63763.2024.10859967"}
+{"id": "failuresensoriq-multichoice-qa-2025", "title": "FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes", "authors": ["Christodoulos Constantinides", "Dhaval Patel", "Shuxin Lin", "Claudio Guerrero", "Sunil Dagajirao Patil"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.03278", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce FailureSensorIQ, a novel Multi-Choice Question-Answering (MCQA) benchmarking system designed to assess the ability of Large Language Models (LLMs) to reason and understand complex, domain", "arxiv_id": "2506.03278", "doi": "10.48550/arXiv.2506.03278"}
+{"id": "contested-communication-c2-2024", "title": "Contested Communication in C2 Multi-agent Simulations", "authors": ["Huey Pretila", "Benjamin Campbell", "Claudia Szabo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3615979.3656057", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Mobile Ad-hoc Networks (MANETs) are frequently deployed in dynamic and contested military environments and it is critical that their behaviour is well understood. Existing MANET research is applicatio", "doi": "10.1145/3615979.3656057"}
+{"id": "adaptive-neural-network-2023", "title": "Adaptive neural network control of non‐affine multi‐agent systems with actuator fault and input saturation", "authors": ["Fengyi Yuan", "Yanjun Liu", "Lei Liu", "Jie Lan"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1002/rnc.7161", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: An adaptive neural network fault‐tolerant control method is proposed for the non‐affine multi‐agent systems with actuator failure and input saturation. Through the transformation of the original multi", "doi": "10.1002/rnc.7161"}
+{"id": "interval-typeii-fuzzy-2023", "title": "Interval Type-II Fuzzy Fault-Tolerant Control for Constrained Uncertain 2-DOF Robotic Multi-Agent Systems with Active Fault Detection", "authors": ["Wen Yan", "Haiyan Tu", "Peng Qin", "Tao Zhao"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.3390/s23104836", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This study proposed a novel adaptive interval Type-II fuzzy fault-tolerant control for constrained uncertain 2-DOF robotic multi-agent systems with an active fault-detection algorithm. This control me", "doi": "10.3390/s23104836"}
+{"id": "adaptive-selftriggered-control-2023", "title": "Adaptive Self-Triggered Control for Multi-Agent Systems with Actuator Failures and Time-Varying State Constraints", "authors": ["Jianhui Wang", "Zikai Hu", "Jiarui Liu", "Yuanqing Zhang", "Yixiang Gu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.3390/act12090364", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work focuses on the consensus problem for multi-agent systems (MASs) with actuator failures and time-varying state constraints, and presents a fixed-time self-triggered consensus control protocol", "doi": "10.3390/act12090364"}
+{"id": "multiagent-coordination-fluid-2023", "title": "Multi-Agent Coordination Fluid Flow Modeling and Experimental Evaluation", "authors": ["Harshvardhan Uppaluru", "Mohammad Ghuran", "H. Rastgoftar"], "year": 2023, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2301.05833", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reliability is a critical aspect of multi-agent system coordination as it ensures that the system functions correctly and consistently. If one agent in the system fails or behaves unexpectedly, it can", "arxiv_id": "2301.05833"}
+{"id": "forwardlooking-backwardlooking-responsibility-2023", "title": "Forward-Looking and Backward-Looking Responsibility Attribution in Multi-Agent Sequential Decision Making", "authors": ["Stelios Triantafyllou"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.5555/3545946.3599135", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As AI systems gain more and more agency in modern-day society, the problem of responsibility attribution in AI is no longer just a philosophically interesting one, but a practical one as well. The ris", "doi": "10.5555/3545946.3599135"}
+{"id": "decentralized-smart-charging-2023", "title": "Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits", "authors": ["Sharyal Zafar", "Raphaël Féraud", "A. Blavette", "G. Camilleri", "H. Ahmed"], "year": 2023, "venue": "27th International Conference on Electricity Distribution (CIRED 2023)", "source_url": "https://arxiv.org/abs/2307.10704", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be", "arxiv_id": "2307.10704", "doi": "10.48550/arXiv.2307.10704"}
+{"id": "leaderfollowing-mean-square-2023", "title": "Leader-Following Mean Square Scaled Bipartite Consensus for Second-Order Multi-Agent Systems with Communication Noise and Antagonistic Information", "authors": ["Chongyang Wang", "Yingxue Du", "Zhi Liu", "Jinxin Shang", "Tianwei Zhou"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.23919/CCC58697.2023.10239982", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we study the leader-follower scaled bipartite consensus problem for second-order multi-agent systems with communication noise and antagonistic information. In order to reduce the negati", "doi": "10.23919/CCC58697.2023.10239982"}
+{"id": "resilient-multiagent-collaborative-2023", "title": "Resilient Multi-Agent Collaborative Spacecraft Inspection", "authors": ["Changrak Choi", "Yashwanth Kumar Nakka", "A. Rahmani", "Soon-Jo Chung"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AERO55745.2023.10115886", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Distributed spacecraft systems (DSS) involving SmallSats in low Earth orbit are gaining significant interest both for Earth observation and on-orbit servicing purposes. However, the miniaturized low-c", "doi": "10.1109/AERO55745.2023.10115886"}
+{"id": "supply-chain-network-2023", "title": "Supply Chain Network Model using Multi-Agent Reinforcement Learning for COVID-19", "authors": ["Tomohito Okada", "Hiroshi Sato", "M. Kubo"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.14569/ijacsa.2023.0140208", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: —The COVID-19 vaccination management in Japan has revealed many problems. The number of vaccines available was clearly less than the number of people who wanted to be vaccinated. Initially, the system", "doi": "10.14569/ijacsa.2023.0140208"}
+{"id": "multiagent-symbiotic-evolution-2026", "title": "A Multi-Agent Symbiotic Evolution Model and Simulation Research of the Entrepreneurial Ecosystem", "authors": ["Xinyue Qin", "Haiqing Hu", "Tong Shi"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.3390/systems14010080", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The healthy evolution of an entrepreneurial ecosystem relies on the symbiotic relationships among its diverse internal actors. This study addresses a gap in entrepreneurial ecosystem research, which h", "doi": "10.3390/systems14010080"}
+{"id": "biotrouble-multiagent-workflow-2026", "title": "BioTrouble: A Multi-Agent Workflow for Troubleshooting Molecular Biology Techniques", "authors": ["M. Ameri", "Hannie Yousefabadi", "Amin Ramezani"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.64898/2025.12.30.697016", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.64898/2025.12.30.697016"}
+{"id": "hierarchical-preemptive-holistic-2026", "title": "Hierarchical Preemptive Holistic Collaborative Systems for Embodied Multi-Agent Systems: Framework, Hybrid Stability, and Scalability Analysis", "authors": ["Ting Peng"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.02779", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The coordination of Embodied Multi-Agent Systems in constrained physical environments requires a rigorous balance between safety, scalability, and efficiency. Traditional decentralized approaches, e.g", "arxiv_id": "2601.02779", "doi": "10.48550/arXiv.2601.02779"}
+{"id": "topology-matters-evaluating-2026", "title": "Topology Matters: Evaluating Multi-Agent Organizations for Resilient Flood Detection", "authors": ["Gaurav Avula", "Hung Du", "Nageswara Rao Pedasingu", "Srikanth Thudumu", "Suresh Vayira"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CCNC65079.2026.11366503", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Flood detection networks often fail when fixed infrastructure such as gateways or cellular towers is damaged. Multi-agent systems (MAS) operating over ad hoc peer-to-peer networks offer a resilient al", "doi": "10.1109/CCNC65079.2026.11366503"}
+{"id": "rise-agentic-testing-2026", "title": "The Rise of Agentic Testing: Multi-Agent Systems for Robust Software Quality Assurance", "authors": ["Saba Naqvi", "Mohammad Baqar", "Nawaz Ali Mohammad"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.02454", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software testing has progressed toward intelligent automation, yet current AI-based test generators still suffer from static, single-shot outputs that frequently produce invalid, redundant, or non-exe", "arxiv_id": "2601.02454", "doi": "10.48550/arXiv.2601.02454"}
+{"id": "improving-llm-reasoning-2024-2", "title": "Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification", "authors": ["Zhenwen Liang", "Ye Liu", "Tong Niu", "Xiangliang Zhang", "Yingbo Zhou"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.05318", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite significant advancements in the general capability of large language models (LLMs), they continue to struggle with consistent and accurate reasoning, especially in complex tasks such as mathem", "arxiv_id": "2410.05318", "doi": "10.48550/arXiv.2410.05318"}
+{"id": "survey-frontiers-llm-2025", "title": "A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems", "authors": ["Zixuan Ke", "Fangkai Jiao", "Yifei Ming", "Xuan-Phi Nguyen", "Austin Xu"], "year": 2025, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2504.09037", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as ", "arxiv_id": "2504.09037", "doi": "10.48550/arXiv.2504.09037"}
+{"id": "wider-deeper-scaling-2025", "title": "Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search", "authors": ["Kou Misaki", "Yuichi Inoue", "Yuki Imajuku", "So Kuroki", "Taishi Nakamura"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.04412", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances demonstrate that increasing inference-time computation can significantly boost the reasoning capabilities of large language models (LLMs). Although repeated sampling (i.e., generating ", "arxiv_id": "2503.04412", "doi": "10.48550/arXiv.2503.04412"}
+{"id": "characterizing-llm-inference-2025", "title": "Characterizing LLM Inference Energy-Performance Tradeoffs across Workloads and GPU Scaling", "authors": ["Paul Joe Maliakel", "Shashikant Ilager", "Ivona Brandić"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2501.08219", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM inference exhibits substantial variability across queries and execution phases, yet inference configurations are often applied uniformly. We present a measurement-driven characterization of worklo", "arxiv_id": "2501.08219"}
+{"id": "effect-sampling-diversity-2025", "title": "On the Effect of Sampling Diversity in Scaling LLM Inference", "authors": ["Tianchun Wang", "Zichuan Liu", "Yuanzhou Chen", "Jonathan Light", "Haifeng Chen"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.11027", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) scaling inference is key to unlocking greater performance, and leveraging diversity has proven an effective way to enhance it. Motivated by the observed relationship between", "arxiv_id": "2502.11027"}
+{"id": "disc-dynamic-decomposition-2025", "title": "DISC: Dynamic Decomposition Improves LLM Inference Scaling", "authors": ["Jonathan Light", "Wei Cheng", "Yue Wu", "Masafumi Oyamada", "Mengdi Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.16706", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inference scaling methods for LLMs often rely on decomposing problems into steps (or groups of tokens), followed by sampling and selecting the best next steps. However, these steps and their sizes are", "arxiv_id": "2502.16706", "doi": "10.48550/arXiv.2502.16706"}
+{"id": "eagle3-scaling-up-2025", "title": "EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test", "authors": ["Yuhui Li", "Fangyun Wei", "Chao Zhang", "Hongyang Zhang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.01840", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The sequential nature of modern LLMs makes them expensive and slow, and speculative sampling has proven to be an effective solution to this problem. Methods like EAGLE perform autoregression at the fe", "arxiv_id": "2503.01840", "doi": "10.48550/arXiv.2503.01840"}
+{"id": "scaling-llm-inference-2025", "title": "Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation", "authors": ["Kexun Zhang", "Shang Zhou", "Danqing Wang", "W. Wang", "Lei Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.naacl-long.404", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Sampling is a basic operation for large language models (LLMs). In reinforcement learning rollouts and meta generation algorithms such as Best-of-N, it is essential to sample high-quality trajectories", "doi": "10.18653/v1/2025.naacl-long.404"}
+{"id": "sfs-smarter-code-2025", "title": "SFS: Smarter Code Space Search improves LLM Inference Scaling", "authors": ["Jonathan Light", "Yue Wu", "Yiyou Sun", "Wenchao Yu", "Yanchi Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/72f5c9a58f5008b72af186df09df95e71c0eb983", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "can-1b-llm-2025", "title": "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling", "authors": ["Runze Liu", "Junqi Gao", "Jian Zhao", "Kaiyan Zhang", "Xiu Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.06703", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test-Time Scaling (TTS) is an important method for improving the performance of Large Language Models (LLMs) by using additional computation during the inference phase. However, current studies do not", "arxiv_id": "2502.06703", "doi": "10.48550/arXiv.2502.06703"}
+{"id": "scaling-llm-inference-2024", "title": "Scaling LLM Inference with Optimized Sample Compute Allocation", "authors": ["Kexun Zhang", "Shang Zhou", "Danqing Wang", "W. Wang", "Lei Li"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.22480", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Sampling is a basic operation in many inference-time algorithms of large language models (LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation f", "arxiv_id": "2410.22480", "doi": "10.48550/arXiv.2410.22480"}
+{"id": "llasa-scaling-traintime-2025", "title": "Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis", "authors": ["Zhen Ye", "Xinfa Zhu", "Chi-Min Chan", "Xinsheng Wang", "Xu Tan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.04128", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time comput", "arxiv_id": "2502.04128", "doi": "10.48550/arXiv.2502.04128"}
+{"id": "advancing-language-model-2025", "title": "Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling", "authors": ["Zhenyu Hou", "Xin Lv", "Rui Lu", "Jiajie Zhang", "Yujiang Li"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2501.11651", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, existing approaches mainly rely on imitation learning and struggle to achieve effective test", "arxiv_id": "2501.11651", "doi": "10.48550/arXiv.2501.11651"}
+{"id": "atom-thoughts-markov-2025", "title": "Atom of Thoughts for Markov LLM Test-Time Scaling", "authors": ["Fengwei Teng", "Zhaoyang Yu", "Quan Shi", "Jiayi Zhang", "Chenglin Wu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.12018", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have achieved significant performance gains through test-time scaling methods. However, existing approaches often incur redundant computations due to the accumulation of h", "arxiv_id": "2502.12018", "doi": "10.48550/arXiv.2502.12018"}
+{"id": "sloaware-gpu-frequency-2024", "title": "SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving", "authors": ["A. Kakolyris", "D. Masouros", "Petros Vavaroutsos", "Sotirios Xydis", "Dimitrios Soudris"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2408.05235", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2408.05235"}
+{"id": "break-sequential-dependency-2024", "title": "Break the Sequential Dependency of LLM Inference Using Lookahead Decoding", "authors": ["Yichao Fu", "Peter Bailis", "Ion Stoica", "Hao Zhang"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2402.02057", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators. Existing ", "arxiv_id": "2402.02057", "doi": "10.48550/arXiv.2402.02057"}
+{"id": "reasonflux-hierarchical-llm-2025", "title": "ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates", "authors": ["Ling Yang", "Zhaochen Yu", "Bin Cui", "Mengdi Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.06772", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present that hierarchical LLM reasoning via scaling thought templates can effectively optimize the reasoning search space and outperform the mathematical reasoning capabilities of powerful LLMs lik", "arxiv_id": "2502.06772", "doi": "10.48550/arXiv.2502.06772"}
+{"id": "inferencetime-computations-llm-2025", "title": "Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights", "authors": ["Shubham Parashar", "Blake Olson", "Sambhav Khurana", "Eric Li", "Hongyi Ling"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.12521", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We examine the reasoning and planning capabilities of large language models (LLMs) in solving complex tasks. Recent advances in inference-time techniques demonstrate the potential to enhance LLM reaso", "arxiv_id": "2502.12521", "doi": "10.48550/arXiv.2502.12521"}
+{"id": "autotom-scaling-modelbased-2025", "title": "AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling", "authors": ["Zhining Zhang", "Chuanyang Jin", "Mung Yao Jia", "Shunchi Zhang", "Tianmin Shu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.15676", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Theory of Mind (ToM), the ability to understand people's minds based on their behavior, is key to developing socially intelligent agents. Current approaches to ToM reasoning either rely on prompting L", "arxiv_id": "2502.15676"}
+{"id": "think-deep-think-2025", "title": "Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods", "authors": ["Junlin Wang", "Shang Zhu", "Jon Saad-Falcon", "Ben Athiwaratkun", "Qingyang Wu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.14047", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: There is intense interest in investigating how inference time compute (ITC) (e.g. repeated sampling, refinements, etc) can improve large language model (LLM) capabilities. At the same time, recent bre", "arxiv_id": "2504.14047", "doi": "10.48550/arXiv.2504.14047"}
+{"id": "dapo-opensource-llm-2025", "title": "DAPO: An Open-Source LLM Reinforcement Learning System at Scale", "authors": ["Qiying Yu", "Zheng Zhang", "Ruofei Zhu", "Yufeng Yuan", "Xiaochen Zuo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.14476", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art ", "arxiv_id": "2503.14476", "doi": "10.48550/arXiv.2503.14476"}
+{"id": "tabler1-inferencetime-scaling-2025", "title": "Table-R1: Inference-Time Scaling for Table Reasoning", "authors": ["Zheyuan Yang", "Lyuhao Chen", "Arman Cohan", "Yilun Zhao"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2505.23621", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we present the first study to explore inference-time scaling on table reasoning tasks. We develop and evaluate two post-training strategies to enable inference-time scaling: distillation", "arxiv_id": "2505.23621", "doi": "10.48550/arXiv.2505.23621"}
+{"id": "xlstm-7b-recurrent-2025", "title": "xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference", "authors": ["Maximilian Beck", "Korbinian Pöppel", "Phillip Lippe", "Richard Kurle", "P. Blies"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2503.13427", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent breakthroughs in solving reasoning, math and coding problems with Large Language Models (LLMs) have been enabled by investing substantial computation budgets at inference time. Therefore, infer", "arxiv_id": "2503.13427", "doi": "10.48550/arXiv.2503.13427"}
+{"id": "retrievalattention-accelerating-longcontext-2024", "title": "RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval", "authors": ["Di Liu", "Meng Chen", "Baotong Lu", "Huiqiang Jiang", "Zhenhua Han"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2409.10516", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Transformer-based Large Language Models (LLMs) have become increasingly important. However, due to the quadratic time complexity of attention computation, scaling LLMs to longer contexts incurs extrem", "arxiv_id": "2409.10516", "doi": "10.48550/arXiv.2409.10516"}
+{"id": "sageserve-optimizing-llm-2025", "title": "SAGESERVE: Optimizing LLM Serving on Cloud Data Centers with Forecast Aware Auto-Scaling", "authors": ["Shashwat Jaiswal", "Kunal Jain", "Yogesh L. Simmhan", "Anjaly Parayil", "Ankur Mallick"], "year": 2025, "venue": "Proceedings of the ACM on Measurement and Analysis of Computing Systems", "source_url": "https://arxiv.org/abs/2502.14617", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Global cloud service providers handle inference workloads for Large Language Models (LLMs) that span latency-sensitive (e.g., chatbots) and insensitive (e.g., report writing) tasks, resulting in diver", "arxiv_id": "2502.14617", "doi": "10.1145/3771576"}
+{"id": "aptserve-adaptive-request-2025", "title": "Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving", "authors": ["Shihong Gao", "Xin Zhang", "Yanyan Shen", "Lei Chen"], "year": 2025, "venue": "Proc. ACM Manag. Data", "source_url": "https://arxiv.org/abs/2504.07494", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) inference serving systems are essential to various LLM-based applications. As demand for LLM services continues to grow, scaling these systems to handle high request rates w", "arxiv_id": "2504.07494", "doi": "10.1145/3725394"}
+{"id": "rethinking-role-prompting-2025", "title": "Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory", "authors": ["Yexiang Liu", "Zekun Li", "Zhi Fang", "Nan Xu", "Ran He"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2505.10981", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, scaling test-time compute on Large Language Models (LLM) has garnered wide attention. However, there has been limited investigation of how various reasoning prompting strategies perform as s", "arxiv_id": "2505.10981", "doi": "10.18653/v1/2025.acl-long.1356"}
+{"id": "llm-inference-serving-2024", "title": "LLM Inference Serving: Survey of Recent Advances and Opportunities", "authors": ["Baolin Li", "Yankai Jiang", "V. Gadepally", "Devesh Tiwari"], "year": 2024, "venue": "IEEE Conference on High Performance Extreme Computing", "source_url": "https://arxiv.org/abs/2407.12391", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023. We specifically examine system-level enhance", "arxiv_id": "2407.12391", "doi": "10.1109/HPEC62836.2024.10938426"}
+{"id": "cumo-scaling-multimodal-2024", "title": "CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts", "authors": ["Jiachen Li", "Xinyao Wang", "Sijie Zhu", "Chia-Wen Kuo", "Lu Xu"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2405.05949", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. Howev", "arxiv_id": "2405.05949", "doi": "10.48550/arXiv.2405.05949"}
+{"id": "tereffic-highly-efficient-2025", "title": "TerEffic: Highly Efficient Ternary LLM Inference on FPGA", "authors": ["Chenyang Yin", "Zhenyu Bai", "Pranav Venkatram", "Shivam Aggarwal", "Zhaoying Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.16473", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deploying Large Language Models (LLMs) efficiently on edge devices is often constrained by limited memory capacity and high power consumption. Low-bit quantization methods, particularly ternary quanti", "arxiv_id": "2502.16473", "doi": "10.48550/arXiv.2502.16473"}
+{"id": "scaling-up-multiturn-2025", "title": "Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers", "authors": ["Ran Xin", "Zeyu Zheng", "Yanchen Nie", "Kun Yuan", "Xia Xiao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.06493", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into automated theorem proving has shown immense promise, yet is fundamentally constrained by challenges in scaling up both training-time reinforcement ", "arxiv_id": "2509.06493", "doi": "10.48550/arXiv.2509.06493"}
+{"id": "mixture-attention-spans-2024", "title": "Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths", "authors": ["Tianyu Fu", "Haofeng Huang", "Xuefei Ning", "Genghan Zhang", "Boju Chen"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2406.14909", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Sliding-window attention offers a hardware-efficient solution to the memory and throughput challenges of Large Language Models (LLMs) in long-context scenarios. Existing methods typically employ a sin", "arxiv_id": "2406.14909"}
+{"id": "scaling-enhancing-llmbased-2025", "title": "Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach", "authors": ["Umberto Cappellazzo", "Minsu Kim", "Stavros Petridis", "Daniele Falavigna", "A. Brutti"], "year": 2025, "venue": "Interspeech", "source_url": "https://arxiv.org/abs/2505.14336", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Audio-Visual Speech Recognition (AVSR) enhances robustness in noisy environments by integrating visual cues. While recent advances integrate Large Language Models (LLMs) into AVSR, their high computat", "arxiv_id": "2505.14336", "doi": "10.48550/arXiv.2505.14336"}
+{"id": "helios-adaptive-model-2025", "title": "HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving", "authors": ["Avinash Kumar", "Shashank Nag", "Jason Clemons", "L. John", "Poulami Das"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.10724", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Early-Exit Large Language Models (EE-LLMs) enable high throughput inference by allowing tokens to exit early at intermediate layers. However, their throughput is limited by the computational and memor", "arxiv_id": "2504.10724", "doi": "10.48550/arXiv.2504.10724"}
+{"id": "scaling-up-speeding-2025", "title": "Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling", "authors": ["Shengyin Sun", "Yiming Li", "Xing Li", "Yingzhao Lian", "Weizhe Lin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.04474", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test-time scaling has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs) by allocating additional computational resources during inference. However", "arxiv_id": "2509.04474", "doi": "10.48550/arXiv.2509.04474"}
+{"id": "blockdialect-blockwise-finegrained-2025", "title": "BlockDialect: Block-wise Fine-grained Mixed Format Quantization for Energy-Efficient LLM Inference", "authors": ["Wonsuk Jang", "Thierry Tambe"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2501.01144", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapidly increasing size of large language models (LLMs) presents significant challenges in memory usage and computational costs. Quantizing both weights and activations can address these issues, w", "arxiv_id": "2501.01144", "doi": "10.48550/arXiv.2501.01144"}
+{"id": "cmoe-converting-mixtureofexperts-2025", "title": "CMoE: Converting Mixture-of-Experts from Dense to Accelerate LLM Inference", "authors": ["Zehua Pei", "Lancheng Zou", "Hui-Ling Zhen", "Xianzhi Yu", "Wulong Liu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.04416", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling large language models (LLMs) improves performance but dramatically increases inference costs. The feed-forward network (FFN), consuming approximately 70\\% of inference compute, represents a cr", "arxiv_id": "2502.04416"}
+{"id": "mecla-memorycomputeefficient-llm-2024", "title": "MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition", "authors": ["Yubin Qin", "Yang Wang", "Zhiren Zhao", "Xiaolong Yang", "Yang Zhou"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ISCA59077.2024.00079", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have been showing surprising performance in processing language tasks, bringing a new prevalence to deploy LLM from cloud to edge. However, being a scaling auto-regressive", "doi": "10.1109/ISCA59077.2024.00079"}
+{"id": "lutllm-efficient-large-2025", "title": "LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs", "authors": ["Zifan He", "Shengyu Ye", "Rui Ma", "Yang Wang", "Jason Cong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.06174", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid progress of large language models (LLMs) has advanced numerous applications, yet efficient single-batch inference remains vital for on-device intelligence. While FPGAs offer fine-grained dat", "arxiv_id": "2511.06174", "doi": "10.48550/arXiv.2511.06174"}
+{"id": "frontier-simulating-next-2025", "title": "Frontier: Simulating the Next Generation of LLM Inference Systems", "authors": ["Yicheng Feng", "Xin Tan", "Kin Hang Sew", "Yimin Jiang", "Yibo Zhu"], "year": 2025, "venue": "PACMI@SOSP", "source_url": "https://arxiv.org/abs/2508.03148", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) inference is growing increasingly complex with the rise of Mixture-of-Experts (MoE) models and disaggregated architectures that decouple components like prefill/decode (PD) ", "arxiv_id": "2508.03148", "doi": "10.1145/3766882.3767173"}
+{"id": "scaling-llm-testtime-2025", "title": "Scaling LLM Test-Time Compute with Mobile NPU on Smartphones", "authors": ["Zixu Hao", "Jianyu Wei", "Tuowei Wang", "Minxing Huang", "Huiqiang Jiang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23324", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deploying Large Language Models (LLMs) on mobile devices faces the challenge of insufficient performance in smaller models and excessive resource consumption in larger ones. This paper highlights that", "arxiv_id": "2509.23324", "doi": "10.48550/arXiv.2509.23324"}
+{"id": "aladdin-joint-placement-2024", "title": "Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving", "authors": ["Chengyi Nie", "Rodrigo Fonseca", "Zhenhua Liu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.06856", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The demand for large language model (LLM) inference is gradually dominating the artificial intelligence workloads. Therefore, there is an urgent need for cost-efficient inference serving. Existing wor", "arxiv_id": "2405.06856", "doi": "10.48550/arXiv.2405.06856"}
+{"id": "scaling-graph-chainofthought-2025", "title": "Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving", "authors": ["Chengying Huan", "Ziheng Meng", "Yongchao Liu", "Zhengyi Yang", "Yun Zhu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01633", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge, but existing pipelines suffer from low accuracy, excessive to", "arxiv_id": "2511.01633", "doi": "10.48550/arXiv.2511.01633"}
+{"id": "database-perspective-llm-2025", "title": "Database Perspective on LLM Inference Systems", "authors": ["J. Pan", "Guoliang Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.14778/3750601.3750703", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are powering a new wave of language-based applications, including database applications, leading to new techniques and systems for dealing with the enormous compute and me", "doi": "10.14778/3750601.3750703"}
+{"id": "swedev-building-software-2025", "title": "SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling", "authors": ["Haoran Wang", "Zhenyu Hou", "Yao Wei", "Jie Tang", "Yuxiao Dong"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2506.07636", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered toolkit", "arxiv_id": "2506.07636", "doi": "10.48550/arXiv.2506.07636"}
+{"id": "sprout-green-generative-2024", "title": "Sprout: Green Generative AI with Carbon-Efficient LLM Inference", "authors": ["Baolin Li", "Yankai Jiang", "Vijay Gadepally", "Devesh Tiwari"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2024.emnlp-main.1215", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of generative AI has heightened environmental concerns, particularly regarding carbon emissions. Our framework, Sprout, addresses these challenges by reducing the carbon footprin", "doi": "10.18653/v1/2024.emnlp-main.1215"}
+{"id": "scaling-up-llm-2024", "title": "Scaling Up LLM Reviews for Google Ads Content Moderation", "authors": ["Wei Qiao", "Tushar Dogra", "Otilia Stretcu", "Y. Lyu", "Tiantian Fang"], "year": 2024, "venue": "Web Search and Data Mining", "source_url": "https://arxiv.org/abs/2402.14590", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository.", "arxiv_id": "2402.14590", "doi": "10.1145/3616855.3635736"}
+{"id": "dynscaling-efficient-verifierfree-2025", "title": "DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling", "authors": ["Fei Wang", "Xingchen Wan", "Ruoxi Sun", "Jiefeng Chen", "Sercan Ö. Arik"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.16043", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inference-time scaling has proven effective in boosting large language model (LLM) performance through increased test-time computation. Yet, its practical application is often hindered by reliance on ", "arxiv_id": "2506.16043", "doi": "10.48550/arXiv.2506.16043"}
+{"id": "efficiently-scaling-llm-2024", "title": "Efficiently Scaling LLM Reasoning with Certaindex", "authors": ["Yichao Fu", "Junda Chen", "Siqi Zhu", "Zheyu Fu", "Zhongdongming Dai"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2412.20993", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test-time reasoning algorithms such as chain-of-thought, self-consistency, and MCTS enhance LLM problem-solving but can wastefully generate many tokens without improving accuracy. At the same time, we", "arxiv_id": "2412.20993"}
+{"id": "deltom-inferencetime-scaling-2025", "title": "DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic", "authors": ["Yuheng Wu", "Jianwen Xie", "Denghui Zhang", "Zhaozhuo Xu"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2505.17348", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Theory-of-Mind (ToM) tasks pose a unique challenge for large language models (LLMs), which often lack the capability for dynamic logical reasoning. In this work, we propose DEL-ToM, a framework that i", "arxiv_id": "2505.17348", "doi": "10.48550/arXiv.2505.17348"}
+{"id": "solvedetectverify-inferencetime-scaling-2025", "title": "Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier", "authors": ["Jianyuan Zhong", "Zeju Li", "Zhijian Xu", "Xiangyu Wen", "Kezhi Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.11966", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) reasoning for complex tasks inherently involves a trade-off between solution accuracy and computational efficiency. The subsequent step of verification, while intended to im", "arxiv_id": "2505.11966", "doi": "10.48550/arXiv.2505.11966"}
+{"id": "locret-enhancing-eviction-2024", "title": "Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads", "authors": ["Yuxiang Huang", "Binhang Yuan", "Xu Han", "Chaojun Xiao", "Zhiyuan Liu"], "year": 2024, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2410.01805", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling the input context length of a large language model (LLM) incurs a significant increase in computation cost and memory footprint to maintain the attention key-value (KV) cache. Existing KV cach", "arxiv_id": "2410.01805", "doi": "10.48550/arXiv.2410.01805"}
+{"id": "throttllem-predictive-gpu-2024", "title": "throttLL’eM: Predictive GPU Throttling for Energy Efficient LLM Inference Serving", "authors": ["A. Kakolyris", "D. Masouros", "Petros Vavaroutsos", "Sotirios Xydis", "Dimitrios Soudris"], "year": 2024, "venue": "International Symposium on High-Performance Computer Architecture", "source_url": "https://arxiv.org/abs/2408.05235", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Large Language Models (LLMs) gain traction, their reliance on power-hungry GPUs places ever-increasing energy demands, raising environmental and monetary concerns. Inference dominates LLM workloads", "arxiv_id": "2408.05235", "doi": "10.1109/HPCA61900.2025.00103"}
+{"id": "sloaware-gpu-dvfs-2024", "title": "SLO-Aware GPU DVFS for Energy-Efficient LLM Inference Serving", "authors": ["A. Kakolyris", "D. Masouros", "Sotirios Xydis", "Dimitrios Soudris"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/LCA.2024.3406038", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing popularity of LLM-based chatbots combined with their reliance on power-hungry GPU infrastructure forms a critical challenge for providers: minimizing energy consumption under Service-Le", "doi": "10.1109/LCA.2024.3406038"}
+{"id": "llmeasyquant-scalable-quantization-2024", "title": "LLMEasyQuant: Scalable Quantization for Parallel and Distributed LLM Inference", "authors": ["Dong Liu", "Yanxuan Yu"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2406.19657", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As large language models (LLMs) grow in size and deployment scale, quantization has become an essential technique for reducing memory footprint and improving inference efficiency. However, existing qu", "arxiv_id": "2406.19657"}
+{"id": "scaling-up-membership-2024", "title": "Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models", "authors": ["Haritz Puerto", "Martin Gubri", "Sangdoo Yun", "Seong Joon Oh"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2411.00154", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Membership inference attacks (MIA) attempt to verify the membership of a given data sample in the training set for a model. MIA has become relevant in recent years, following the rapid development of ", "arxiv_id": "2411.00154", "doi": "10.48550/arXiv.2411.00154"}
+{"id": "no-request-left-2024", "title": "No Request Left Behind: Tackling Heterogeneity in Long-Context LLM Inference with Medha", "authors": ["Amey Agrawal", "Junda Chen", "'Inigo Goiri", "R. Ramjee", "Chaojie Zhang"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2409.17264", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deploying million-token Large Language Models (LLMs) is challenging because production workloads are highly heterogeneous, mixing short queries and long documents. This heterogeneity, combined with th", "arxiv_id": "2409.17264"}
+{"id": "focusllm-scaling-llms-2024", "title": "FocusLLM: Scaling LLM's Context by Parallel Decoding", "authors": ["Zhenyu Li", "Yike Zhang", "Tengyu Pan", "Yutao Sun", "Zhichao Duan"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2408.11745", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2408.11745"}
+{"id": "inference-scaling-bridging-2024", "title": "Inference Scaling for Bridging Retrieval and Augmented Generation", "authors": ["Youngwon Lee", "Seung-won Hwang", "Daniel F. Campos", "Filip Grali'nski", "Zhewei Yao"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.10684", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) has emerged as a popular approach to steering the output of a large language model (LLM) by incorporating retrieved contexts as inputs. However, existing work obse", "arxiv_id": "2412.10684", "doi": "10.48550/arXiv.2412.10684"}
+{"id": "beyond-chinchillaoptimal-accounting-2023", "title": "Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws", "authors": ["Nikhil Sardana", "Sasha Doubov", "Jonathan Frankle"], "year": 2023, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2401.00448", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter count and training data. However, these formulas, including th", "arxiv_id": "2401.00448", "doi": "10.48550/arXiv.2401.00448"}
+{"id": "eellm-largescale-training-2023", "title": "EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism", "authors": ["Yanxi Chen", "Xuchen Pan", "Yaliang Li", "Bolin Ding", "Jingren Zhou"], "year": 2023, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2312.04916", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting i", "arxiv_id": "2312.04916", "doi": "10.48550/arXiv.2312.04916"}
+{"id": "revisiting-blockbased-quantisation-2023", "title": "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?", "authors": ["Cheng Zhang", "Jianyi Cheng", "Ilia Shumailov", "G. Constantinides", "Yiren Zhao"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2310.05079", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation ", "arxiv_id": "2310.05079", "doi": "10.18653/v1/2023.emnlp-main.617"}
+{"id": "case-4bit-precision-2022", "title": "The case for 4-bit precision: k-bit Inference Scaling Laws", "authors": ["Tim Dettmers", "Luke Zettlemoyer"], "year": 2022, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2212.09720", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference latencies. However, the final model size de", "arxiv_id": "2212.09720", "doi": "10.48550/arXiv.2212.09720"}
+{"id": "surprising-effectiveness-negative-2025", "title": "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning", "authors": ["Xinyu Zhu", "Mengzhou Xia", "Zhepei Wei", "Wei-Lin Chen", "Danqi Chen"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.01347", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training language models (LMs) on reasoning tasks that elicit emergent long chains of thought (CoTs). Unlike supervise", "arxiv_id": "2506.01347", "doi": "10.48550/arXiv.2506.01347"}
+{"id": "zebralogic-scaling-limits-2025", "title": "ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning", "authors": ["Bill Yuchen Lin", "Ronan Le Bras", "Kyle Richardson", "Ashish Sabharwal", "Radha Poovendran"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.01100", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We investigate the logical reasoning capabilities of large language models (LLMs) and their scalability in complex non-monotonic reasoning. To this end, we introduce ZebraLogic, a comprehensive evalua", "arxiv_id": "2502.01100", "doi": "10.48550/arXiv.2502.01100"}
+{"id": "llm-posttraining-deep-2025", "title": "LLM Post-Training: A Deep Dive into Reasoning Large Language Models", "authors": ["Komal Kumar", "Tajamul Ashraf", "Omkar Thawakar", "R. Anwer", "Hisham Cholakkal"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.21321", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these m", "arxiv_id": "2502.21321", "doi": "10.48550/arXiv.2502.21321"}
+{"id": "quip-even-better-2024", "title": "QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks", "authors": ["Albert Tseng", "Jerry Chee", "Qingyao Sun", "Volodymyr Kuleshov", "Christopher De Sa"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2402.04396", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-th", "arxiv_id": "2402.04396", "doi": "10.48550/arXiv.2402.04396"}
+{"id": "evolving-deeper-llm-2025", "title": "Evolving Deeper LLM Thinking", "authors": ["Kuang-Huei Lee", "Ian Fischer", "Yueh-Hua Wu", "David Marwood", "S. Baluja"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.09891", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We explore an evolutionary search strategy for scaling inference time compute in Large Language Models. The proposed approach, Mind Evolution, uses a language model to generate, recombine and refine c", "arxiv_id": "2501.09891", "doi": "10.48550/arXiv.2501.09891"}
+{"id": "dont-overthink-it-2025", "title": "Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning", "authors": ["Michael Hassid", "Gabriele Synnaeve", "Yossi Adi", "Roy Schwartz"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.17813", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reasoning large language models (LLMs) heavily rely on scaling test-time compute to perform complex reasoning tasks by generating extensive\"thinking\"chains. While demonstrating impressive results, thi", "arxiv_id": "2505.17813", "doi": "10.48550/arXiv.2505.17813"}
+{"id": "coevolving-llm-coder-2025", "title": "Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning", "authors": ["Yinjie Wang", "Ling Yang", "Ye Tian", "Ke Shen", "Mengdi Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.03136", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We propose CURE, a novel reinforcement learning framework with a dedicated reward design that co-evolves coding and unit test generation capabilities based on their interaction outcomes, without any g", "arxiv_id": "2506.03136", "doi": "10.48550/arXiv.2506.03136"}
+{"id": "when-solve-when-2025", "title": "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning", "authors": ["Nishad Singhi", "Hritik Bansal", "Arian Hosseini", "Aditya Grover", "Kai-Wei Chang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.01005", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling test-time compute has emerged as a key strategy for enhancing the reasoning capabilities of large language models (LLMs), particularly in tasks like mathematical problem-solving. A traditional", "arxiv_id": "2504.01005", "doi": "10.48550/arXiv.2504.01005"}
+{"id": "learning-keep-promise-2025", "title": "Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding", "authors": ["Tian Jin", "Ellie Y. Cheng", "Zack Ankner", "Nikunj Saunshi", "Blake Elias"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.11517", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and ", "arxiv_id": "2502.11517", "doi": "10.48550/arXiv.2502.11517"}
+{"id": "overclocking-llm-reasoning-2025", "title": "Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs", "authors": ["Roy Eisenstadt", "Itamar Zimerman", "Lior Wolf"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.07240", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model's internal\"thinking\"process and the final res", "arxiv_id": "2506.07240", "doi": "10.48550/arXiv.2506.07240"}
+{"id": "talk-structurally-act-2025", "title": "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems", "authors": ["Zhao Wang", "Sota Moriyama", "Wei-Yao Wang", "Briti Gangopadhyay", "Shingo Takamatsu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.11098", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in LLM-based multi-agent (LLM-MA) systems have shown promise, yet significant challenges remain in managing communication and refinement when agents collaborate on complex tasks. I", "arxiv_id": "2502.11098", "doi": "10.48550/arXiv.2502.11098"}
+{"id": "linguistic-generalizability-testtime-2025", "title": "Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning", "authors": ["Guijin Son", "Jiwoo Hong", "Hyunwoo Ko", "James Thorne"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.17407", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling pre-training compute has proven effective for achieving mulitlinguality, but does the same hold for test-time scaling? In this work, we introduce MCLM, a multilingual math benchmark featuring ", "arxiv_id": "2502.17407", "doi": "10.48550/arXiv.2502.17407"}
+{"id": "videochatr15-visual-testtime-2025", "title": "VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception", "authors": ["Ziang Yan", "Xinhao Li", "Yinan He", "Zhengrong Yue", "Xiangyun Zeng"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.21100", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inducing reasoning in multimodal large language models (MLLMs) is critical for achieving human-level perception and understanding. Existing methods mainly leverage LLM reasoning to analyze parsed visu", "arxiv_id": "2509.21100", "doi": "10.48550/arXiv.2509.21100"}
+{"id": "cost-dynamic-reasoning-2025", "title": "The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective", "authors": ["Jiin Kim", "Byeong-Gon Shin", "Jin-Won Chung", "Minsoo Rhu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.04301", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large-language-model (LLM)-based AI agents have recently showcased impressive versatility by employing dynamic reasoning, an adaptive, multi-step process that coordinates with external tools. This shi", "arxiv_id": "2506.04301", "doi": "10.48550/arXiv.2506.04301"}
+{"id": "enhancing-llm-reasoning-2024", "title": "Enhancing LLM Reasoning with Reward-guided Tree Search", "authors": ["Jinhao Jiang", "Zhipeng Chen", "Yingqian Min", "Jie Chen", "Xiaoxue Cheng"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2411.11694", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recently, test-time scaling has garnered significant attention from the research community, largely due to the substantial advancements of the o1 model released by OpenAI. By allocating more computati", "arxiv_id": "2411.11694"}
+{"id": "route-reason-adaptive-2025", "title": "Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection", "authors": ["Zhihong Pan", "Kai Zhang", "Yuze Zhao", "Yupeng Han"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.19435", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The inherent capabilities of a language model (LM) and the reasoning strategies it employs jointly determine its performance in reasoning tasks. While test-time scaling is regarded as an effective app", "arxiv_id": "2505.19435", "doi": "10.48550/arXiv.2505.19435"}
+{"id": "arcmemo-abstract-reasoning-2025", "title": "ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory", "authors": ["Matthew Ho", "Chenglei Si", "Zhaoxiang Feng", "Fangxu Yu", "Yichi Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.04439", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While inference-time scaling enables LLMs to carry out increasingly long and capable reasoning traces, the patterns and insights uncovered during these traces are immediately discarded once the contex", "arxiv_id": "2509.04439", "doi": "10.48550/arXiv.2509.04439"}
+{"id": "dancing-critiques-enhancing-2025", "title": "Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique", "authors": ["Yansi Li", "Jiahao Xu", "Tian Liang", "Xingyu Chen", "Zhiwei He"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.17363", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Enhancing the reasoning capabilities of large language models (LLMs), particularly for complex tasks requiring multi-step logical deductions, remains a significant challenge. Traditional inference tim", "arxiv_id": "2503.17363", "doi": "10.13140/RG.2.2.27912.33289"}
+{"id": "llmcloud-complete-leveraging-2024", "title": "LLM-Cloud Complete: Leveraging Cloud Computing for Efficient Large Language Model-based Code Completion", "authors": ["Mingxuan Zhang", "Bo Yuan", "Hanzhe Li", "Kangming Xu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.60087/jaigs.v5i1.200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces LLM-CloudComplete, a novel cloud-based system for efficient and scalable code completion leveraging large language models (LLMs). We address the challenges of deploying LLMs for ", "doi": "10.60087/jaigs.v5i1.200"}
+{"id": "boosting-llm-reasoning-2025", "title": "Boosting LLM Reasoning via Spontaneous Self-Correction", "authors": ["Xutong Zhao", "Tengyu Xu", "Xuewei Wang", "Zhengxing Chen", "Di Jin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.06923", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) have demonstrated remarkable success on a broad range of tasks, math reasoning remains a challenging one. One of the approaches for improving math reasoning is self-", "arxiv_id": "2506.06923", "doi": "10.48550/arXiv.2506.06923"}
+{"id": "aegaeon-effective-gpu-2025", "title": "Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market", "authors": ["Yuxing Xiang", "Xue Li", "Kun Qian", "Yufan Yang", "Diwen Zhu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3731569.3764815", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Model markets (e.g., Hugging Face) feature a wide variety of models with unique characteristics and varying levels of popularity. Serving sporadic and unpredictable requests in concurrent inference wo", "doi": "10.1145/3731569.3764815"}
+{"id": "empirical-study-llm-2025", "title": "An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint", "authors": ["Yi Sun", "Han Wang", "Jiaqiang Li", "Jiacheng Liu", "Xiangyu Li"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2504.14350", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent work has demonstrated the remarkable potential of Large Language Models (LLMs) in test-time scaling. By making models think before answering, they are able to achieve much higher accuracy with ", "arxiv_id": "2504.14350", "doi": "10.18653/v1/2025.emnlp-main.389"}
+{"id": "cotbased-synthesizer-enhancing-2025", "title": "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis", "authors": ["Bohan Zhang", "Xiaokang Zhang", "Jing Zhang", "Jifan Yu", "Sijia Luo"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2501.01668", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Current inference scaling methods, such as Self-consistency and Best-of-N, have proven effective in improving the accuracy of LLMs on complex reasoning tasks. However, these methods rely heavily on th", "arxiv_id": "2501.01668", "doi": "10.48550/arXiv.2501.01668"}
+{"id": "from-drafts-answers-2025", "title": "From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning", "authors": ["Yafu Li", "Zhilin Wang", "Ting Fu", "Ganqu Cui", "Sen Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.11877", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scaling data and model size has been proven effective for boosting the performance of large language models. In addition to training-time scaling, recent studies have revealed that increasing test-tim", "arxiv_id": "2501.11877", "doi": "10.48550/arXiv.2501.11877"}
+{"id": "fptquant-functionpreserving-transforms-2025", "title": "FPTQuant: Function-Preserving Transforms for LLM Quantization", "authors": ["B. V. Breugel", "Yelysei Bondarenko", "Paul N. Whatmough", "Markus Nagel"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.04985", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) require substantial compute, and thus energy, at inference time. While quantizing weights and activations is effective at improving efficiency, naive quantization of LLMs ", "arxiv_id": "2506.04985", "doi": "10.48550/arXiv.2506.04985"}
+{"id": "evolution-thought-tracking-2025", "title": "The Evolution of Thought: Tracking LLM Overthinking via Reasoning Dynamics Analysis", "authors": ["Zihao Wei", "Liang Pang", "Jiahao Liu", "Jingcheng Deng", "Shicheng Xu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2508.17627", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Test-time scaling via explicit reasoning trajectories significantly boosts large language model (LLM) performance but often triggers overthinking. To explore this, we analyze reasoning through two len", "arxiv_id": "2508.17627"}
+{"id": "repograph-enhancing-ai-2024", "title": "RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph", "authors": ["Siru Ouyang", "Wenhao Yu", "Kaixin Ma", "Zi-Qiang Xiao", "Zhihan Zhang"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2410.14684", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) excel in code generation yet struggle with modern AI software engineering tasks. Unlike traditional function-level or file-level coding tasks, AI software engineering requ", "arxiv_id": "2410.14684", "doi": "10.48550/arXiv.2410.14684"}
+{"id": "autonomous-normative-multiagent-2025", "title": "Towards autonomous normative multi-agent systems for Human-AI software engineering teams", "authors": ["K. Dam", "Geeta Mahala", "Rashina Hoda", "Xi Zheng", "Cristina Conati"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.02329", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper envisions a transformative paradigm in software engineering, where Artificial Intelligence, embodied in fully autonomous agents, becomes the primary driver of the core software development ", "arxiv_id": "2512.02329", "doi": "10.48550/arXiv.2512.02329"}
+{"id": "unified-software-engineering-2025", "title": "Unified Software Engineering agent as AI Software Engineer", "authors": ["Leonhard Applis", "Yuntong Zhang", "Shanchao Liang", "Nan Jiang", "Lin Tan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.14683", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The growth of Large Language Model (LLM) technology has raised expectations for automated coding. However, software engineering is more than coding and is concerned with activities including maintenan", "arxiv_id": "2506.14683", "doi": "10.48550/arXiv.2506.14683"}
+{"id": "navigating-ai-frontier-2024", "title": "Navigating the AI Frontier: A Comprehensive Framework for Career Transition into AI Software Engineering", "authors": ["Subash C Patel"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.22214/ijraset.2024.64188", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract: The rapid proliferation of artificial intelligence (AI) technologies has created a significant demand for skilled professionals, particularly in AI software engineering. This article present", "doi": "10.22214/ijraset.2024.64188"}
+{"id": "developers-age-ai-2026", "title": "Developers in the Age of AI: Adoption, Policy, and Diffusion of AI Software Engineering Tools", "authors": ["Mark Looi", "J. Quinn"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21305", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advance of Generative AI into software development prompts this empirical investigation of perceptual effects on practice. We study the usage patterns of 147 professional developers, examini", "arxiv_id": "2601.21305", "doi": "10.48550/arXiv.2601.21305"}
+{"id": "future-aidriven-software-2025", "title": "The Future of AI-Driven Software Engineering", "authors": ["Valerio Terragni", "Annie Vella", "Partha S. Roop", "Kelly Blincoe"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3715003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A paradigm shift is underway in Software Engineering, with AI systems such as LLMs playing an increasingly important role in boosting software development productivity. This trend is anticipated to pe", "doi": "10.1145/3715003"}
+{"id": "rise-ai-teammates-2025", "title": "The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering", "authors": ["Hao Li", "Haoxiang Zhang", "Ahmed E. Hassan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2507.15003", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2507.15003"}
+{"id": "software-engineering-by-2025", "title": "Software Engineering by and for Humans in an AI Era", "authors": ["S. Abrahão", "John C. Grundy", "Mauro Pezzè", "M. Storey", "D. Tamburri"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3715111", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The landscape of software engineering is undergoing a transformative shift driven by advancements in machine learning, Artificial Intelligence (AI), and autonomous systems. This roadmap article explor", "doi": "10.1145/3715111"}
+{"id": "copiloting-future-how-2025", "title": "Copiloting the future: How generative AI transforms Software Engineering", "authors": ["Leonardo Banh", "Florian Holldack", "G. Strobel"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.infsof.2025.107751", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.infsof.2025.107751"}
+{"id": "how-developers-interact-2025", "title": "How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering", "authors": ["Christoph Treude", "M. Gerosa"], "year": 2025, "venue": "2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)", "source_url": "https://arxiv.org/abs/2501.08774", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI), including large language models and generative AI, is emerging as a significant force in software development, offering developers powerful tools that span the entire dev", "arxiv_id": "2501.08774", "doi": "10.1109/Forge66646.2025.00033"}
+{"id": "challenges-paths-ai-2025", "title": "Challenges and Paths Towards AI for Software Engineering", "authors": ["Alex Gu", "Naman Jain", "Wen-Ding Li", "Manish Shetty", "Yijia Shao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.22625", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI for software engineering has made remarkable progress recently, becoming a notable success within generative AI. Despite this, there are still many challenges that need to be addressed before autom", "arxiv_id": "2503.22625", "doi": "10.48550/arXiv.2503.22625"}
+{"id": "generative-ai-empirical-2025", "title": "Generative AI and Empirical Software Engineering: A Paradigm Shift", "authors": ["Christoph Treude", "M. Storey"], "year": 2025, "venue": "2025 2nd IEEE/ACM International Conference on AI-powered Software (AIware)", "source_url": "https://arxiv.org/abs/2502.08108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The adoption of large language models (LLMs) and autonomous agents in software engineering marks an enduring paradigm shift. These systems create new opportunities for tool design, workflow orchestrat", "arxiv_id": "2502.08108", "doi": "10.1109/AIware69974.2025.00033"}
+{"id": "democratizing-software-engineering-2025", "title": "Democratizing Software Engineering through Generative AI and Vibe Coding: The Evolution of No-Code Development", "authors": ["Akhilesh Gadde"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.32996/jcsts.2025.7.4.66", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of generative artificial intelligence (AI) into software development processes represents a paradigm shift in how individuals interact with technology creation tools. This article exam", "doi": "10.32996/jcsts.2025.7.4.66"}
+{"id": "agentic-ai-software-2025", "title": "Agentic AI for Software: thoughts from Software Engineering community", "authors": ["Abhik Roychoudhury"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.17343", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents have recently shown significant promise in software engineering. Much public attention has been transfixed on the topic of code generation from Large Language Models (LLMs) via a prompt. How", "arxiv_id": "2508.17343", "doi": "10.48550/arXiv.2508.17343"}
+{"id": "masai-modular-architecture-2024", "title": "MASAI: Modular Architecture for Software-engineering AI Agents", "authors": ["Daman Arora", "Atharv Sonwane", "Nalin Wadhwa", "Abhav Mehrotra", "Saiteja Utpala"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.11638", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A common method to solve complex problems in software engineering, is to divide the problem into multiple sub-problems. Inspired by this, we propose a Modular Architecture for Software-engineering AI ", "arxiv_id": "2406.11638", "doi": "10.48550/arXiv.2406.11638"}
+{"id": "agentic-ai-software-2025-2", "title": "Agentic AI Software Engineers: Programming with Trust", "authors": ["Abhik Roychoudhury", "C. Păsăreanu", "Michael Pradel", "Baishakhi Ray"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.13767", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown surprising proficiency in generating code snippets, promising to automate large parts of software engineering via artificial intelligence (AI). We argue that su", "arxiv_id": "2502.13767"}
+{"id": "greening-aienabled-systems-2025", "title": "Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices", "authors": ["Luis Cruz", "João Paulo Fernandes", "M. H. Kirkeby", "Silverio Mart'inez-Fern'andez", "June Sallou"], "year": 2025, "venue": "Software engineering notes", "source_url": "https://arxiv.org/abs/2506.01774", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The environmental impact of Artificial Intelligence (AI)-enabled systems is increasing rapidly, and software engineering plays a critical role in developing sustainable solutions. The ''Greening AI wi", "arxiv_id": "2506.01774", "doi": "10.1145/3743095.3743099"}
+{"id": "ainative-software-engineering-2024", "title": "Towards AI-Native Software Engineering (SE 3.0): A Vision and a Challenge Roadmap", "authors": ["Ahmed E. Hassan", "G. Oliva", "Dayi Lin", "Boyuan Chen", "Zhen Ming Jiang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.06107", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rise of AI-assisted software engineering (SE 2.0), powered by Foundation Models (FMs) and FM-powered coding assistants, has shown promise in improving developer productivity. However, it has also ", "arxiv_id": "2410.06107", "doi": "10.48550/arXiv.2410.06107"}
+{"id": "innovating-tomorrow-convergence-2025", "title": "Innovating for Tomorrow: The Convergence of Software Engineering and Green AI", "authors": ["Luís Cruz", "Xavier Franch", "Silverio Martínez-Fernández"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3712007", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The latest advancements in machine learning, specifically in foundation models, are revolutionizing the frontiers of existing software engineering (SE) processes. This is a bi-directional phenomenon, ", "doi": "10.1145/3712007"}
+{"id": "rethinking-autonomy-preventing-2025", "title": "Rethinking Autonomy: Preventing Failures in AI-Driven Software Engineering", "authors": ["Satyam Kumar Navneet", "Joydeep Chandra"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.11824", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Large Language Models (LLMs) into software engineering has revolutionized code generation, enabling unprecedented productivity through promptware and autonomous AI agents. However, ", "arxiv_id": "2508.11824", "doi": "10.48550/arXiv.2508.11824"}
+{"id": "when-prompt-engineering-2025", "title": "When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust \"APIs\" for Human-AI Interaction", "authors": ["Zhenchang Xing", "Yang Liu", "Zhuo Cheng", "Qing Huang", "Dehai Zhao"], "year": 2025, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2508.06942", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the growing capabilities of large language models (LLMs), they are increasingly applied in areas like intelligent customer service, code generation, and knowledge management. Natural language (NL", "arxiv_id": "2508.06942", "doi": "10.48550/arXiv.2508.06942"}
+{"id": "generative-ai-future-2025", "title": "Generative AI and the Future of Software Engineering in Saudi Arabia: Governance, Innovation, and Workforce Transformation", "authors": ["Elham Al-baroudi", "Taha Mansouri", "Mohammad Hatamleh", "Moustafa Elbehairy", "Ali Alameer"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.65278/ijtaci.2025.4", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Saudi Arabia advances its Vision 2030 agenda, Generative Artificial Intelligence (GenAI) has emerged as a transformative force in software engineering. This paper is based on socio-technical system", "doi": "10.65278/ijtaci.2025.4"}
+{"id": "exploring-generative-ai-2025", "title": "Exploring Generative AI in Automated Software Engineering", "authors": ["Miroslaw Staron", "S. Abrahão", "S. Abrahão"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2025.3533754", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This issue’s Practitioners' Digest examines the growing impact of generative AI on automated software engineering. It highlights how these AI models are shaping current research, with a particular foc", "doi": "10.1109/MS.2025.3533754"}
+{"id": "generative-ai-software-2025-2", "title": "Generative AI in Software Engineering: Revolutionizing Code Generation and Debugging", "authors": ["V. Saravanan", "S. Kavitha", "S. Ravi", "A. Seetha", "Ch. Rambabu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.22399/ijcesen.1718", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Artificial Intelligence (AI) is rapidly transforming the landscape of software engineering by automating critical development tasks such as code generation, debugging, and optimization. Thi", "doi": "10.22399/ijcesen.1718"}
+{"id": "development-aidriven-model-2025", "title": "Development of an AI-Driven Model for Advancing Software Engineering Practices", "authors": ["Aylin Güzel", "A. Egesoy"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.55524/ijircst.2025.13.1.1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work introduces the Fuzzy Specification Tree Model (FST), a general-purpose framework designed to enhance AI-assisted software engineering. The paper begins by examining the intricate interplay b", "doi": "10.55524/ijircst.2025.13.1.1"}
+{"id": "unpacking-organizational-change-2025", "title": "Unpacking Organizational Change in AI Transformations of Software Engineering", "authors": ["Theocharis Tavantzis", "Robert Feldt"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CHASE66643.2025.00026", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Artificial Intelligence (AI) becomes integral to software development, understanding the social and cooperative dynamics that affect AI-driven organizational change is important. Yet, despite AI's ", "doi": "10.1109/CHASE66643.2025.00026"}
+{"id": "future-generative-ai-2025", "title": "The Future of Generative AI in Software Engineering: A Vision From Industry and Academia in the European Genius Project", "authors": ["Robin Gröpler", "Steffen Klepke", "John E. Johns", "Andreas Dreschinski", "Klaus Schmid"], "year": 2025, "venue": "2025 2nd IEEE/ACM International Conference on AI-powered Software (AIware)", "source_url": "https://arxiv.org/abs/2511.01348", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (GenAI) has recently emerged as a groundbreaking force in Software Engineering, capable of generating code, identifying bugs, recommending fixes, and supporting quality assurance. While ", "arxiv_id": "2511.01348", "doi": "10.1109/AIware69974.2025.00026"}
+{"id": "empirical-study-decisionmaking-2025", "title": "An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AI", "authors": ["Lekshmi Murali Rani", "Faezeh Mohammadi", "R. Feldt", "Richard Berntsson-Svensson"], "year": 2025, "venue": "2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)", "source_url": "https://arxiv.org/abs/2501.15691", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Incorporating responsible practices into Software Engineering (SE) for Artificial Intelligence (AI) is essential to ensure ethical principles, societal impact, and accountability remain at the forefro", "arxiv_id": "2501.15691", "doi": "10.1109/ICSE-SEIP66354.2025.00056"}
+{"id": "compilernext-searchbased-compiler-2025", "title": "Compiler.next: A Search-Based Compiler to Power the AI-Native Future of Software Engineering", "authors": ["F. Côgo", "Gustavo Oliva", "Ahmed E. Hassan"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.24799", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of AI-assisted software engineering has brought transformative potential to the field of software engineering, but existing tools and paradigms remain limited by cognitive overlo", "arxiv_id": "2510.24799", "doi": "10.48550/arXiv.2510.24799"}
+{"id": "seamful-ai-creative-2025", "title": "Seamful AI for Creative Software Engineering: Use in Software Development Workflows", "authors": ["Sarah Inman", "Ambar Murillo", "Sarah D’Angelo", "Adam Brown", "Collin Green"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2025.3534085", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We explore the differences in goals for designing AI tools for productivity compared to creativity and propose strategies to elevate creativity in the software engineering workflow. Specifically, we c", "doi": "10.1109/MS.2025.3534085"}
+{"id": "fairness-ai-systematic-2025", "title": "Towards Fairness in AI: A Systematic Mapping Study on Software Engineering Solutions", "authors": ["K. Nepomuceno", "Fábio Petrillo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SANER-C66551.2025.00014", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Fairness in AI systems has become a crucial concern in software engineering, increasing attention since the field has evolved in the last few years. While some guidelines to address fairness have been", "doi": "10.1109/SANER-C66551.2025.00014"}
+{"id": "bridging-ai-robotics-2025", "title": "Bridging AI, Robotics, and Software Engineering: An Interdisciplinary Approach for Learning Emerging Technologies", "authors": ["Lorena B. Martínez Elizalde", "Carlos Astengo Noguez", "Maria Raquel Landa Cavazos", "Luis Ricardo Salgado Garza"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EDUCON62633.2025.11016335", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As the rapid evolution of emerging technologies continues to reshape industries, the need for an interdisciplinary approach to learning has become critical. Artificial intelligence (AI), robotics, and", "doi": "10.1109/EDUCON62633.2025.11016335"}
+{"id": "designing-fair-scalable-2025", "title": "Designing Fair and Scalable AI-Enhanced Software Engineering Performance Reviews", "authors": ["Aishwarya Babu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.37082/ijirmps.v13.i2.232444", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Performance evaluations in software engineering often struggle with fairness and consistency, particularly in capturing non-code contributions like mentorship and technical leadership. While AI and co", "doi": "10.37082/ijirmps.v13.i2.232444"}
+{"id": "humanmachine-teaming-team-2025", "title": "Human-Machine Teaming and Team Effectiveness in AI Tools for Software Engineering", "authors": ["Irum Rauf", "Helen Sharp", "Tamara Lopez", "Michel Wermelinger"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CHASE66643.2025.00017", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background: Artificial Intelligence (AI) is increasingly being used to support software engineering (SE), shifting the role of AI tools for SE (AI4SE) towards team members rather than simply tools. Hu", "doi": "10.1109/CHASE66643.2025.00017"}
+{"id": "first-look-at-2025", "title": "A First Look at AI Trends in Value-Aligned Software Engineering Publications: Human-LLM Insights", "authors": ["Davoud Mougouei", "Ahmad Azarnik", "M. Fahmideh", "Elahe Mougouei", "K. Dam"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE-SEIS66351.2025.00014", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent criticism of social media platforms by the U.S. Senate Judiciary Committee for neglecting child safety exemplifies how software can undermine human values. This is further complicated by the gr", "doi": "10.1109/ICSE-SEIS66351.2025.00014"}
+{"id": "integrating-ai-into-2025", "title": "Integrating AI into Software Engineering: A Critical Review and Future Directions", "authors": ["A. H", "Sharath K R", "Kavita Babalad", "G. M S", "C. S"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IC363308.2025.10957449", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The field of Artificial Intelligence (AI) has been witnessing a huge demand in the field of research, tools development, and applications of deployment. There are multiple software companies which are", "doi": "10.1109/IC363308.2025.10957449"}
+{"id": "transforming-software-engineering-2025", "title": "Transforming Software Engineering Processes Through Generative AI: A Framework for Integration and Implementation", "authors": ["Aybüke Yalçıner", "Ebru Gökalp", "Ahmet Dikici"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MC.2025.3539347", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In software engineering, integrating artificial intelligence (AI) technologies cuts costs and speeds up project timelines while improving workflow efficiency and software quality. We propose a framewo", "doi": "10.1109/MC.2025.3539347"}
+{"id": "software-reuse-generative-2025", "title": "Software Reuse in the Generative AI Era: From Cargo Cult Towards AI Native Software Engineering", "authors": ["T. Mikkonen", "A. Taivalsaari"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.17937", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software development is currently under a paradigm shift in which artificial intelligence and generative software reuse are taking the center stage in software creation. Consequently, earlier software", "arxiv_id": "2506.17937", "doi": "10.48550/arXiv.2506.17937"}
+{"id": "agile-software-engineering-2025", "title": "Agile Software Engineering in the Age of Artificial Intelligence: Tools and Techniques for AI Projects", "authors": ["Radhakrishnan P", "Margi Patel", "Stalin David", "Mrutyunjay Padhiary", "Ch. Mamatha"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/WorldSUAS66815.2025.11199145", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Traditional construction techniques have been altered by the incorporation of artificial intelligence (AI) within the software engineering process, necessitating the use of cutting-edge tools and meth", "doi": "10.1109/WorldSUAS66815.2025.11199145"}
+{"id": "why-you-shouldnt-2025", "title": "Why you shouldn't fully trust ChatGPT: A synthesis of this AI tool's error rates across disciplines and the software engineering lifecycle", "authors": ["Vahid Garousi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.18858", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: ChatGPT and other large language models (LLMs) are widely used across healthcare, business, economics, engineering, and software engineering (SE). Despite their popularity, concerns persist a", "arxiv_id": "2504.18858", "doi": "10.48550/arXiv.2504.18858"}
+{"id": "interacting-ai-reasoning-2025", "title": "Interacting with AI Reasoning Models: Harnessing \"Thoughts\" for AI-Driven Software Engineering", "authors": ["Christoph Treude", "R. Kula"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.00483", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in AI reasoning models provide unprecedented transparency into their decision-making processes, transforming them from traditional black-box systems into models that articulate step-by", "arxiv_id": "2503.00483", "doi": "10.48550/arXiv.2503.00483"}
+{"id": "case-study-transformative-2025", "title": "A case study on the transformative potential of AI in software engineering on LeetCode and ChatGPT", "authors": ["Manuel Merkel", "Jens Dörpinghaus"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.03639", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent surge in the field of generative artificial intelligence (GenAI) has the potential to bring about transformative changes across a range of sectors, including software engineering and educat", "arxiv_id": "2501.03639", "doi": "10.48550/arXiv.2501.03639"}
+{"id": "rethinking-software-engineering-2024", "title": "Rethinking Software Engineering in the Foundation Model Era: From Task-Driven AI Copilots to Goal-Driven AI Pair Programmers", "authors": ["Ahmed E. Hassan", "G. Oliva", "Dayi Lin", "Boyuan Chen", "Zhen Ming Jiang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.10225", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of Foundation Models (FMs) and AI-powered copilots has transformed the landscape of software development, offering unprecedented code completion capabilities and enhancing developer product", "arxiv_id": "2404.10225", "doi": "10.48550/arXiv.2404.10225"}
+{"id": "ai-agents-software-2025", "title": "AI Agents in Software Engineering Optimizing Software Development Processes and Enhancing Security Management in Learning Management Systems", "authors": ["R. Varma"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.22214/ijraset.2025.73299", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The use of AI agents in software engineering is an area of research that offers remarkable possibilities to improve\nsoftware development processes and security management in LMS. In this paper, we inv", "doi": "10.22214/ijraset.2025.73299"}
+{"id": "aiaugmented-software-engineering-2025", "title": "Towards AI-Augmented Software Engineering: A Theoretical Framework", "authors": ["Samia Akhtar", "Shabib Aftab"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.62762/jse.2025.407864", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software Engineering (SE) has traditionally relied on rule-based methods and human expertise to deliver reliable systems. As software systems grow more complex and the demand for intelligent and scala", "doi": "10.62762/jse.2025.407864"}
+{"id": "revolutionizing-code-role-2025", "title": "Revolutionizing Code: The Role of AI in Software Engineering", "authors": ["Rahul Sanjay Panchal"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.32628/ijsrset25121152", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Information regarding artificial intelligence in software engineering is presented in this document. Information about AI and its varieties can be found here. A sneak peek at AI agents, learning in AI", "doi": "10.32628/ijsrset25121152"}
+{"id": "ai-software-engineering-2025", "title": "AI in Software Engineering: Perceived Roles and Their Impact on Adoption", "authors": ["Ilya Zakharov", "Ekaterina Koshchenko", "Agnia Sergeyuk"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2504.20329", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates how developers conceptualize AI-powered Development Tools and how these role attributions influence technology acceptance. Through qualitative analysis of 38 interviews and a q", "arxiv_id": "2504.20329", "doi": "10.1145/3696630.3730563"}
+{"id": "trust-transparency-adoption-2025", "title": "Trust, transparency, and adoption in generative AI for software engineering: Insights from Twitter discourse", "authors": ["Manaal Basha", "Gema Rodríguez-Pérez"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.infsof.2025.107804", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.infsof.2025.107804"}
+{"id": "aitutoring-software-engineering-2024", "title": "AI-Tutoring in Software Engineering Education: Experiences with Large Language Models in Programming Assessments", "authors": ["Eduard Frankford", "Clemens Sauerwein", "Patrick Bassner", "Stephan Krusche", "Ruth Breu"], "year": 2024, "venue": "2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET)", "source_url": "https://arxiv.org/abs/2404.02548", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of artificial intelligence (AI) in various domains, the education sector is set for transformation. The potential of AI-driven tools in enhancing the learning experience, es", "arxiv_id": "2404.02548", "doi": "10.1145/3639474.3640061"}
+{"id": "software-engineering-education-2024", "title": "Software engineering education in the era of conversational AI: current trends and future directions", "authors": ["Cigdem Sengul", "Rumyana Neykova", "Giuseppe Destefanis"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3389/frai.2024.1436350", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The developments in conversational AI raised urgent questions about the future direction of many aspects of society, including computing education. The first reactions to the fast-paced evolution of c", "doi": "10.3389/frai.2024.1436350"}
+{"id": "navigating-complexity-generative-2024", "title": "Navigating the Complexity of Generative AI Adoption in Software Engineering—RCR Report", "authors": ["Daniel Russo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3680471", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This Replicated Computational Results (RCR) report complements the study “Navigating the Complexity of Generative AI Adoption in Software Engineering,” which examines the factors influencing the integ", "doi": "10.1145/3680471"}
+{"id": "bridging-mde-ai-2023", "title": "Bridging MDE and AI: a systematic review of domain-specific languages and model-driven practices in AI software systems engineering", "authors": ["Simon Rädler", "Luca Berardinelli", "Karolin Winter", "Abbas Rahimi", "Stefanie Rinderle-Ma"], "year": 2023, "venue": "Journal of Software and Systems Modeling", "source_url": "https://arxiv.org/abs/2307.04599", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Technical systems are becoming increasingly complex due to the increasing number of components, functions, and involvement of different disciplines. In this regard, model-driven engineering techniques", "arxiv_id": "2307.04599", "doi": "10.1007/s10270-024-01211-y"}
+{"id": "what-aiembracing-software-2024", "title": "What an AI-Embracing Software Engineering Curriculum Should Look Like: An Empirical Study", "authors": ["Natasha Randall", "Dennis Wäckerle", "Nils Stein", "Dennis Goßler", "Stefan Bente"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2023.3344682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: It is not possible to reliably prevent the use of artificial intelligence (AI) tools, nor would that be desirable as AI offers many benefits for students. We recommend that appropriate AI usage be tau", "doi": "10.1109/MS.2023.3344682"}
+{"id": "what-generative-ai-2025", "title": "What is Generative AI good for? Introduction to the special issue on Generative AI in software engineering", "authors": ["V. Stray", "G. Hanssen", "A. Barbala", "Darja Šmite", "Klaas-Jan Stol"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.infsof.2025.107857", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.infsof.2025.107857"}
+{"id": "aiintegrated-software-engineering-2025", "title": "AI-Integrated Software Engineering: Developing Systems that Evolve with Learning Capabilities", "authors": ["Snigdha Gaddam"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.52783/jisem.v10i63s.13893", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The study examines AI-native systems that can be developed with the help of a Random Forest classifier to replicate the method of intelligent decision-making. The data contained variables like \"user-b", "doi": "10.52783/jisem.v10i63s.13893"}
+{"id": "future-software-engineering-2024", "title": "The Future of Software Engineering in an AI-Driven World", "authors": ["Valerio Terragni", "Partha S. Roop", "Kelly Blincoe"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.07737", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A paradigm shift is underway in Software Engineering, with AI systems such as LLMs gaining increasing importance for improving software development productivity. This trend is anticipated to persist. ", "arxiv_id": "2406.07737", "doi": "10.48550/arXiv.2406.07737"}
+{"id": "beyond-log-parsers-2025", "title": "Beyond Log Parsers: A Scalable AI-Driven Framework for Efficient Log Anomaly Detection in Software Engineering", "authors": ["Yicheng Sun", "J. Keung", "Hi Kuen Yu", "Shuo Liu", "Yihan Liao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/COMPSAC65507.2025.00173", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Log anomaly detection is critical for ensuring software system reliability and security, yet challenges persist in log parser dependency, small-scale dataset applicability, and hyperparameter tuning e", "doi": "10.1109/COMPSAC65507.2025.00173"}
+{"id": "bringing-software-engineering-2024", "title": "Bringing Software Engineering Discipline to the Development of AI-Enabled Systems", "authors": ["Miroslaw Staron", "S. Abrahão", "Grace Lewis", "Henry Muccini", "Chetan Honnenahalli"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2024.3408388", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Engineering AI Software systems is starting to evolve from the pure development of machine learning (ML) models to a more structured discipline that treats ML components as part of much larger softwar", "doi": "10.1109/MS.2024.3408388"}
+{"id": "seamful-ai-creative-2025-2", "title": "Seamful AI for Creative Software Engineering", "authors": ["Unknown"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ms.2025.3551876", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/ms.2025.3551876"}
+{"id": "use-ai-software-2024", "title": "The Use of AI in Software Engineering: A Synthetic Knowledge Synthesis of the Recent Research Literature", "authors": ["Peter Kokol"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.3390/info15060354", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI) has witnessed an exponential increase in use in various applications. Recently, the academic community started to research and inject new AI-based approaches to provide so", "doi": "10.3390/info15060354"}
+{"id": "workshop-generative-neurosymbolic-2024", "title": "Workshop Generative and Neurosymbolic AI in Software Engineering (GenSE'2024)", "authors": ["Rubén Ruiz-Torrubiano", "Alois Haselböck", "Danilo Valerio"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.18420/sw2024_58", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.18420/sw2024_58"}
+{"id": "advancements-software-engineering-2024", "title": "Advancements in software engineering using AI", "authors": ["Hazem W. Marar"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.24294/csma.v6i1.3906", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Artificial Intelligence (AI) into the space of software engineering marks a transformative period that reshapes traditional development processes and propels the industry into a new", "doi": "10.24294/csma.v6i1.3906"}
+{"id": "generative-ai-software-2024-2", "title": "Generative AI in Software Engineering Must Be Human-Centered: The Copenhagen Manifesto", "authors": ["Daniel Russo", "Sebastian Baltes", "N. V. Berkel", "Paris Avgeriou", "Fabio Calefato"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1016/j.jss.2024.112115", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1016/j.jss.2024.112115"}
+{"id": "enhancing-software-engineering-2024", "title": "Enhancing Software Engineering Education through AI: An Empirical Study of Tree-Based Machine Learning for Defect Prediction", "authors": ["Ensaf Alhazeem", "Anas Alsobeh", "Bilal Al‐Ahmad"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3686852.3686881", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the rapidly evolving field of information technology education,integrating artificial intelligence (AI) and machine learning (ML) techniques presents opportunities and challenges. This empirical st", "doi": "10.1145/3686852.3686881"}
+{"id": "survey-reliability-engineering-2023", "title": "Survey on Reliability Engineering for AI Software Systems: An Extension Based on the IEEE 1633 Standard", "authors": ["Cong Pan", "Jun You", "Yan Gao"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AIIM60438.2023.10441228", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software reliability stands as a cornerstone in the development and deployment of dependable applications, with the advent of AI intensifying its significance. The intricacies introduced by AI softwar", "doi": "10.1109/AIIM60438.2023.10441228"}
+{"id": "from-triumph-uncertainty-2024", "title": "From Triumph to Uncertainty: The Journey of Software Engineering in the AI Era", "authors": ["A. Mastropaolo", "Camilo Escobar-Velásquez", "Mario Linares-Vásquez"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3709360", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Over the last 10 years, the realm of AI has experienced an explosion of revolutionary breakthroughs, transforming what seemed like a far-off dream into a reality that is now deeply embedded in our eve", "doi": "10.1145/3709360"}
+{"id": "some-things-never-2024", "title": "Some things never change: how far generative AI can really change software engineering practice", "authors": ["Aline de Campos", "Jorge Melegati", "Nicolas Nascimento", "R. Chanin", "Afonso Sales"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2406.09725", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Artificial Intelligence (GenAI) has become an emerging technology with the availability of several tools that could impact Software Engineering (SE) activities. As any other disruptive tech", "arxiv_id": "2406.09725", "doi": "10.48550/arXiv.2406.09725"}
+{"id": "university-students-perception-2024", "title": "University Students' Perception and Expectations of Generative AI Tools for Software Engineering", "authors": ["Mounika Yabaku", "Sofia Ouhbi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CSEET62301.2024.10663035", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Adopting Generative Artificial Intelligence (AI) tools in software engineering represents a shift in how tasks like coding and idea generation are approached. This paper investigates uni-versity stude", "doi": "10.1109/CSEET62301.2024.10663035"}
+{"id": "software-engineering-methods-2024", "title": "Software Engineering Methods for AI-Driven Deductive Legal Reasoning", "authors": ["Rohan Padhye"], "year": 2024, "venue": "SIGPLAN symposium on New ideas, new paradigms, and reflections on programming and software", "source_url": "https://arxiv.org/abs/2404.09868", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent proliferation of generative artificial intelligence (AI) technologies such as pre-trained large language models (LLMs) has opened up new frontiers in computational law. An exciting area of ", "arxiv_id": "2404.09868", "doi": "10.1145/3689492.3690050"}
+{"id": "navigating-complexity-generative-2023", "title": "Navigating the Complexity of Generative AI Adoption in Software Engineering", "authors": ["Daniel Russo"], "year": 2023, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2307.06081", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article explores the adoption of Generative Artificial Intelligence (AI) tools within the domain of software engineering, focusing on the influencing factors at the individual, technological, and", "arxiv_id": "2307.06081", "doi": "10.1145/3652154"}
+{"id": "integrating-generative-ai-2024-2", "title": "Integrating Generative AI in Software Engineering Education: Practical Strategies", "authors": ["Yishu Li", "J. Keung", "Xiaoxue Ma"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ISET61814.2024.00019", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The transformative influence of generative artificial intelligence (AI), notably large language models (LLMs), has significantly reshaped the software engineering (SE) landscape, impacting various asp", "doi": "10.1109/ISET61814.2024.00019"}
+{"id": "enhancing-software-engineering-2024-2", "title": "Enhancing Software Engineering with AI: Key Insights from ChatGPT", "authors": ["A. Al-Ahmad", "Hasan Kahtan", "L. Tahat", "Tarek Tahat"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/DASA63652.2024.10836262", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI) is currently a prominent topic in the field of software engineering. AI has greatly transformed software engineering by providing advanced tools that may boost the effecti", "doi": "10.1109/DASA63652.2024.10836262"}
+{"id": "generative-ai-create-2024", "title": "Using Generative AI to Create User Stories in the Software Engineering Classroom", "authors": ["Allan Brockenbrough", "Dominic Salinas"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CSEET62301.2024.10662994", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A user story is used in agile methodology to describe functionality that is valuable to the user and may include criteria to determine if the developer has completed the story. This study investigates", "doi": "10.1109/CSEET62301.2024.10662994"}
+{"id": "integrating-ai-software-2024", "title": "Integrating AI in Software Engineering Teaching and Learning", "authors": ["Akshay Narayan", "Bimlesh Wadhwa", "N. Tan", "Marcus Choo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TALE62452.2024.10834374", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of artificial intelligence (AI) into software engineering education presents a unique opportunity to enhance the learning experience and equip students with cutting-edge skills. In thi", "doi": "10.1109/TALE62452.2024.10834374"}
+{"id": "requirements-engineering-trustworthy-2024", "title": "Requirements Engineering for Trustworthy Human-AI Synergy in Software Engineering 2.0", "authors": ["David Lo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/RE59067.2024.00011", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software Engineering 2.0 envisions trustworthy and synergistic collaborations between humans and AI agents that are diverse, responsible, and autonomous, aiming to build the software of tomorrow – a v", "doi": "10.1109/RE59067.2024.00011"}
+{"id": "teaching-requirements-engineering-2024", "title": "Teaching Requirements Engineering for AI: A Goal-Oriented Approach in Software Engineering Courses", "authors": ["Beatriz Batista", "M. Lima", "Tayana Conte"], "year": 2024, "venue": "Brazilian Symposium on Software Quality", "source_url": "https://arxiv.org/abs/2411.07250", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: Requirements Engineering for AI-based systems (RE4AI) presents unique challenges due to the inherent volatility and complexity of AI technologies, necessitating the development of specialized", "arxiv_id": "2411.07250", "doi": "10.1145/3701625.3701686"}
+{"id": "navigating-ai-frontier-2024-2", "title": "Navigating the AI Frontier: A Critical Literature Review on Integrating Artificial Intelligence into Software Engineering Education", "authors": ["Chandan Kumar Sah", "Xiaoli Lian", "Muhammad Mirajul Islam", "K. Islam"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CSEET62301.2024.10663054", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The swift development of Artificial Intelligence (AI), namely the introduction of Large Language Models (LLMs), is drastically altering various industries and necessitating a major change in the way s", "doi": "10.1109/CSEET62301.2024.10663054"}
+{"id": "continuous-software-engineering-2024", "title": "Continuous Software Engineering Practices in AI/ML Development Past the Narrow Lens of MLOps: Adoption Challenges", "authors": ["Sini and and and Vänskä", "Kai-Kristian Kemell", "T. Mikkonen", "P. Abrahamsson"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.37190/e-inf240102", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background: Continuous software engineering practices are currently considered state of the art in Software Engineering (SE). Recently, this interest in continuous SE has extended to ML system develop", "doi": "10.37190/e-inf240102"}
+{"id": "questioning-questions-we-2024", "title": "Questioning the Questions We Ask About the Impact of AI on Software Engineering : MSR 2024 Keynote", "authors": ["M. Storey"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643991.3644895", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent advent and wide diffusion of generative AI has initiated a fundamental change in how software is developed. This technology is just one innovation along a long arc of disruptions in softwar", "doi": "10.1145/3643991.3644895"}
+{"id": "workshop-report-generative-2024", "title": "Workshop Report on Generative AI-based Software Engineering", "authors": ["Ravindra Naik", "Asha Rajbhoj", "Manasi S. Patwardhan", "R. Medicherla"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3641399.3641437", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The co-authors have organized and conducting the Generative AI-based Software Engineering workshop, co-located with the 17th Innovations in Software Engineering Conference (ISEC) at Bangalore, India o", "doi": "10.1145/3641399.3641437"}
+{"id": "llms-integration-software-2024", "title": "LLMs Integration in Software Engineering Team Projects: Roles, Impact, and a Pedagogical Design Space for AI Tools in Computing Education", "authors": ["Ahmed Kharrufa", "S. Alghamdi", "Abeer Aziz", "Christopher Bull"], "year": 2024, "venue": "ACM Transactions on Computing Education", "source_url": "https://arxiv.org/abs/2410.23069", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This work takes a pedagogical lens to explore the implications of generative AI (GenAI) models and tools, such as ChatGPT and GitHub Copilot, in a semester-long 2nd-year undergraduate Software Enginee", "arxiv_id": "2410.23069", "doi": "10.1145/3779296"}
+{"id": "steve-jobs-pioneering-2024", "title": "Steve Jobs: Pioneering AI in Software Engineering", "authors": ["Priyadharasini M", "Sriram S N", "Sudhar Aathith T", "Vigneshwaran N"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.47392/irjaeh.2024.0116", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \"STEVE JOBS: Pioneering AI in Software Engineering\" presents a revolutionary approach to software development, integrating large language models (LLMs) into traditional methodologies. This paradigm, i", "doi": "10.47392/irjaeh.2024.0116"}
+{"id": "exploring-potential-use-2024", "title": "Exploring the Potential Use of Generative AI in Software Engineering Education", "authors": ["Mounika Yabaku", "Nuno Pombo", "Sofia Ouhbi"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AICT61888.2024.10740416", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The integration of Generative AI into software engineering education marks a transformative shift in teaching methodologies. This paper explores its potential, highlighting the benefits of enhancing s", "doi": "10.1109/AICT61888.2024.10740416"}
+{"id": "aidriven-continuous-integration-2024", "title": "AI-Driven Continuous Integration and Continuous Deployment in Software Engineering", "authors": ["Abdul Sajid Mohammed", "Venkata Ramana Saddi", "Santhosh Kumar Gopal", "S. Dhanasekaran", "Mahaveer Singh Naruka"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICDT61202.2024.10489475", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI driven Continuous Integration and Continuous Deployment is a new way of managing and continually updating a software project. This process, powered by Artificial Intelligence, automates the entire ", "doi": "10.1109/ICDT61202.2024.10489475"}
+{"id": "optimal-psychological-functioning-2024", "title": "Toward Optimal Psychological Functioning in AI-Driven Software Engineering Tasks: The Software Evaluation for Well-Being and Optimal Psychological Functioning in a Context-Aware Environment Assessment Framework", "authors": ["O. Sghaier", "J. Boudrias", "H. Sahraoui"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2024.3382364", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Holistic consideration of the technical, psychological, and social aspects of software engineering tasks is essential. We introduce a conceptual framework designed to assess AI-driven software enginee", "doi": "10.1109/MS.2024.3382364"}
+{"id": "humancentered-ai-transformation-2024", "title": "Human-Centered AI Transformation: Exploring Behavioral Dynamics in Software Engineering", "authors": ["Theocharis Tavantzis", "R. Feldt"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.08693", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As Artificial Intelligence (AI) becomes integral to software development, understanding the social and cooperative dynamics that affect AI-driven organizational change is important. Yet, despite AI's ", "arxiv_id": "2411.08693", "doi": "10.48550/arXiv.2411.08693"}
+{"id": "surfing-ai-wave-2024", "title": "Surfing the AI Wave in Software Engineering: Opportunities and Challenges", "authors": ["Nicole Novielli"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3661167.3661271", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The diffusion of generative AI, specifically Large Language Models (LLMs), is profoundly affecting Software Engineering. Thanks to their unprecedented potential for disruptive changes, which mainly re", "doi": "10.1145/3661167.3661271"}
+{"id": "generative-ai-software-2024-2-2", "title": "Generative AI in the Software Engineering Domain: Tensions of Occupational Identity and Patterns of Identity Protection", "authors": ["Anuschka Schmitt", "Krzysztof Z. Gajos", "O. Mokryn"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.03571", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The adoption of generative Artificial Intelligence (GAI) in organizational settings calls into question workers' roles, and relatedly, the implications for their long-term skill development and domain", "arxiv_id": "2410.03571", "doi": "10.48550/arXiv.2410.03571"}
+{"id": "mapping-softwareengineering-industry-2024", "title": "Mapping Software-Engineering Industry AI Use to Software-Engineering Curriculum: Developing the AI-USE Framework", "authors": ["Addison Lilholt", "T. Heverin"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.34190/icair.4.1.3034", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Estimates predict a global deficit of 4 million software engineers by 2025, further complicated by the software engineering (SE) industry's escalating use of artificial intelligence (AI). To tackle th", "doi": "10.34190/icair.4.1.3034"}
+{"id": "practical-application-ai-2024", "title": "Practical Application of AI and Large Language Models in Software Engineering Education", "authors": ["Vasil Kozov", "Galina Ivanova", "Desislava Atanasova"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.14569/ijacsa.2024.0150168", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: org", "doi": "10.14569/ijacsa.2024.0150168"}
+{"id": "how-far-we-2023", "title": "How Far Are We? The Triumphs and Trials of Generative AI in Learning Software Engineering", "authors": ["Rudrajit Choudhuri", "Dylan Liu", "Igor Steinmacher", "M. Gerosa", "Anita Sarma"], "year": 2023, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2312.11719", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Conversational Generative AI (convo-genAI) is revolutionizing Software Engineering (SE) as engineers and academics embrace this technology in their work. However, there is a gap in understanding the c", "arxiv_id": "2312.11719", "doi": "10.1145/3597503.3639201"}
+{"id": "aiaugmented-software-engineering-2024", "title": "AI‐Augmented Software Engineering: Revolutionizing or Challenging Software Quality and Testing?", "authors": ["Tafline Ramos", "Amanda Dean", "D. McGregor"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1002/smr.2741", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With organizations seeking faster, cheaper, and smarter ways of delivering higher quality software, many are looking towards generative artificial intelligence (AI) to drive efficiencies and innovatio", "doi": "10.1002/smr.2741"}
+{"id": "incorporating-ai-teaching-2024", "title": "Incorporating AI in the Teaching of Requirements Tracing Within Software Engineering", "authors": ["J. Couder", "W. C. Pate", "Daniel A. Machado", "Omar Ochoa"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FIE61694.2024.10892858", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: During the Software Development Lifecycle (SDLC), the first stage entails the Requirement Engineering phase. In this phase, engineers gather, analyze, and specify the requirements for a software syste", "doi": "10.1109/FIE61694.2024.10892858"}
+{"id": "can-ai-serve-2023", "title": "Can AI serve as a substitute for human subjects in software engineering research?", "authors": ["M. Gerosa", "Bianca Trinkenreich", "Igor Steinmacher", "Anita Sarma"], "year": 2023, "venue": "International Conference on Automated Software Engineering", "source_url": "https://arxiv.org/abs/2311.11081", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Research within sociotechnical domains, such as software engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulti", "arxiv_id": "2311.11081", "doi": "10.1007/s10515-023-00409-6"}
+{"id": "ambigswe-interactive-agents-2025", "title": "Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering", "authors": ["Sanidhya Vijayvargiya", "Xuhui Zhou", "Akhila Yerukola", "Maarten Sap", "Graham Neubig"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.13069", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI agents are increasingly being deployed to automate tasks, often based on underspecified user instructions. Making unwarranted assumptions to compensate for the missing information and failing to as", "arxiv_id": "2502.13069"}
+{"id": "enhancing-software-engineering-2024-3", "title": "Enhancing software engineering practices with generative AI: A framework for automated code synthesis and refactoring", "authors": ["Kodamasimham Krishna", "Dheerender Thakur", "Harika Sree Meka"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.30574/wjaets.2024.13.1.0463", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper is based on how software development has been revolutionized using AI in automation, mainly dealing with code synthesis and rewrite frameworks. While there is no focused definition for soft", "doi": "10.30574/wjaets.2024.13.1.0463"}
+{"id": "virtualhr-aidriven-automation-2024", "title": "VIRTUALHR: AI-DRIVEN AUTOMATION FOR EFFICIENT AND UNBIASED CANDIDATE RECRUITMENT IN SOFTWARE ENGINEERING ROLES", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.56726/irjmets60905", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : Recruitment is an integral part of any HR professional's role and critical for helping an organization build a thriving workforce ready to support business growth. The entire recruitment process is ", "doi": "10.56726/irjmets60905"}
+{"id": "generative-ai-software-2023", "title": "Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023", "authors": ["Srijoni Majumdar", "Soumen Paul", "Debjyoti Paul", "Ayan Bandyopadhyay", "S. Chattopadhyay"], "year": 2023, "venue": "Fire", "source_url": "https://arxiv.org/abs/2311.03374", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language mod", "arxiv_id": "2311.03374", "doi": "10.48550/arXiv.2311.03374"}
+{"id": "humanai-collaboration-software-2023", "title": "Human-AI Collaboration in Software Engineering: Lessons Learned from a Hands-On Workshop", "authors": ["Muhammad Hamza", "Dominik Siemon", "M. Akbar", "Tahsinur Rahman"], "year": 2023, "venue": "2024 IEEE/ACM International Workshop on Software-Intensive Business (IWSiB)", "source_url": "https://arxiv.org/abs/2312.10620", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper investigates the dynamics of human-AI collaboration in software engineering, focusing on the use of ChatGPT. Through a thematic analysis of a hands-on workshop in which 22 professional soft", "arxiv_id": "2312.10620", "doi": "10.1145/3643690.3648236"}
+{"id": "scoping-software-engineering-2024", "title": "Scoping Software Engineering for AI: The TSE Perspective", "authors": ["Sebastián Uchitel", "Marsha Chechik", "M. D. Penta", "Bram Adams", "Nazareno Aguirre"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/tse.2024.3470368", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/tse.2024.3470368"}
+{"id": "generative-ai-redefining-2024", "title": "Generative AI: Redefining the Future of Software Engineering", "authors": ["Anita D. Carleton", "Davide Falessi", "Hongyu Zhang", "Xin Xia"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ms.2024.3441889", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/ms.2024.3441889"}
+{"id": "systematic-literature-review-2023", "title": "A Systematic Literature Review of Explainable AI for Software Engineering", "authors": ["Ahmad Haji Mohammadkhani", "Nitin Sai Bommi", "M. Daboussi", "Onkar Sabnis", "C. Tantithamthavorn"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2302.06065", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: In recent years, leveraging machine learning (ML) techniques has become one of the main solutions to tackle many software engineering (SE) tasks, in research studies (ML4SE). This has been ac", "arxiv_id": "2302.06065", "doi": "10.48550/arXiv.2302.06065"}
+{"id": "ai-software-engineering-2023", "title": "AI in Software Engineering: A Survey on Project Management Applications", "authors": ["Talia Crawford", "Scott Duong", "Richard Fueston", "Ayorinde Lawani", "S. Owoade"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2307.15224", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) refers to the intelligence demonstrated by machines, and within the realm of AI, Machine Learning (ML) stands as a notable subset. ML employs algorithms that undergo train", "arxiv_id": "2307.15224", "doi": "10.48550/arXiv.2307.15224"}
+{"id": "unlocking-aipowered-conversations-2024", "title": "Unlocking AI-Powered Conversations and Code Excellence: Exploring Prompt Patterns in Conversational AI and Software Engineering", "authors": ["Unknown"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.56726/irjmets47950", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.56726/irjmets47950"}
+{"id": "aidriven-software-engineering-2024", "title": "AI-Driven Software Engineering - The Role of Conceptual Modeling", "authors": ["Hans-Georg Fill", "Jordi Cabot", "Wolfgang Maass", "M. V. Sinderen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.18417/emisa.19.1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.18417/emisa.19.1"}
+{"id": "future-software-engineering-2024-2", "title": "The Future of Software Engineering Education and Training in the Age of AI", "authors": ["B. Tenbergen", "Stephan Krusche"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ms.2023.3345960", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/ms.2023.3345960"}
+{"id": "analysing-role-generative-2024", "title": "Analysing the Role of Generative AI in Software Engineering - Results from an MLR", "authors": ["Tuomas Bazzan", "Benjamin Olojo", "Przemyslaw Majda", "T. Kelly", "Murat Yılmaz"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/978-3-031-71139-8_11", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/978-3-031-71139-8_11"}
+{"id": "prompt-sapper-llmempowered-2023-2", "title": "Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services", "authors": ["Zhenchang Xing", "Qing Huang", "Yu Cheng", "Liming Zhu", "Qinghua Lu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2306.02230", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Foundation models, such as GPT-4, DALL-E have brought unprecedented AI\"operating system\"effect and new forms of human-AI interaction, sparking a wave of innovation in AI-native services, where natural", "arxiv_id": "2306.02230", "doi": "10.48550/arXiv.2306.02230"}
+{"id": "safety-performance-why-2024", "title": "Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression Against Heterogeneous Attacks Toward AI Software Deployment", "authors": ["Jie Zhu", "Leye Wang", "Xiao Han", "Anmin Liu", "Tao Xie"], "year": 2024, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2401.00996", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, hindering the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate th", "arxiv_id": "2401.00996", "doi": "10.1109/TSE.2023.3348515"}
+{"id": "diversity-empowers-intelligence-2024", "title": "Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents", "authors": ["Kexun Zhang", "Weiran Yao", "Zuxin Liu", "Yihao Feng", "Zhiwei Liu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.07060", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issu", "arxiv_id": "2408.07060", "doi": "10.48550/arXiv.2408.07060"}
+{"id": "quantum-artificial-intelligence-2025", "title": "Quantum Artificial Intelligence for Software Engineering: the Road Ahead", "authors": ["Xinyi Wang", "Shaukat Ali", "Paolo Arcaini"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.04797", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In order to handle the increasing complexity of software systems, Artificial Intelligence (AI) has been applied to various areas of software engineering, including requirements engineering, coding, te", "arxiv_id": "2505.04797", "doi": "10.48550/arXiv.2505.04797"}
+{"id": "roadmap-software-engineering-2022", "title": "Towards a Roadmap on Software Engineering for Responsible AI", "authors": ["Q. Lu", "Liming Zhu", "Xiwei Xu", "J. Whittle", "Zhenchang Xing"], "year": 2022, "venue": "2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN)", "source_url": "https://arxiv.org/abs/2203.08594", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Although AI is transforming the world, there are serious concerns about its ability to behave and make decisions responsibly. Many ethical regulations, principles, and frameworks for responsible AI ha", "arxiv_id": "2203.08594", "doi": "10.1145/3522664.3528607"}
+{"id": "get-train-be-2025", "title": "Get on the Train or be Left on the Station: Using LLMs for Software Engineering Research", "authors": ["Bianca Trinkenreich", "Fabio Calefato", "G. Hanssen", "Kelly Blincoe", "Marcos Kalinowski"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2506.12691", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The adoption of Large Language Models (LLMs) is not only transforming software engineering (SE) practice but is also poised to fundamentally disrupt how research is conducted in the field. While persp", "arxiv_id": "2506.12691", "doi": "10.1145/3696630.3731666"}
+{"id": "aidriven-software-engineering-2023", "title": "AI-driven software engineering", "authors": ["Josh Mahmood Ali"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.54254/2977-3903/3/2023030", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The intersection of artificial intelligence (AI) and software engineering marks a transformative phase in the technology industry. This paper delves into AI-driven software engineering, exploring its ", "doi": "10.54254/2977-3903/3/2023030"}
+{"id": "software-engineering-responsible-2023", "title": "Software Engineering for Responsible AI", "authors": ["Qinghua Lu", "Liming Zhu", "Jon Whittle", "J. Michael"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/mc.2023.3242055", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/mc.2023.3242055"}
+{"id": "empirical-study-usage-2024", "title": "An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project", "authors": ["Sanka Rasnayaka", "Guanlin Wang", "Ridwan Shariffdeen", "Ganesh Neelakanta Iyer"], "year": 2024, "venue": "2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)", "source_url": "https://arxiv.org/abs/2401.16186", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) represent a leap in artificial intelligence, excelling in tasks using human language(s). Although the main focus of general-purpose LLMs is not code generation, they have ", "arxiv_id": "2401.16186", "doi": "10.1145/3643795.3648379"}
+{"id": "software-engineering-as-2023", "title": "Software Engineering as the Linchpin of Responsible AI", "authors": ["Liming Zhu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE48619.2023.00012", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: From humanity's existential risks to safety risks in critical systems to ethical risks, responsible AI, as the saviour, has become a major research challenge with significant real-world consequences. ", "doi": "10.1109/ICSE48619.2023.00012"}
+{"id": "ethical-requirements-stack-2023", "title": "Ethical Requirements Stack: A framework for implementing ethical requirements of AI in software engineering practices", "authors": ["M. Agbese", "Rahul Mohanani", "A. Khan", "P. Abrahamsson"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3593434.3593489", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1145/3593434.3593489"}
+{"id": "ai-safety-subproblems-2023", "title": "AI Safety Subproblems for Software Engineering Researchers", "authors": ["David Gros", "P. Devanbu", "Zhou Yu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2304.14597", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this 4-page manuscript we discuss the problem of long-term AI Safety from a Software Engineering (SE) research viewpoint. We briefly summarize long-term AI Safety, and the challenge of avoiding har", "arxiv_id": "2304.14597", "doi": "10.48550/arXiv.2304.14597"}
+{"id": "generative-ai-modeldriven-2023", "title": "Generative AI in Model-Driven Software Engineering Education: Friend or Foe?", "authors": ["Sergio Morales", "Elena Planas", "R. Clarisó", "Martin Gogolla"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MODELS-C59198.2023.00034", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The availability and effectiveness of generative AI tools challenge the currently established methods for learning, teaching and assessment. In this paper, we discuss their potential impact for model-", "doi": "10.1109/MODELS-C59198.2023.00034"}
+{"id": "landscape-source-code-2023", "title": "The Landscape of Source Code Representation Learning in AI-Driven Software Engineering Tasks", "authors": ["S. Chimalakonda", "Debeshee Das", "Alex Mathai", "Srikanth G. Tamilselvam", "Atul Kumar"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSE-Companion58688.2023.00098", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Appropriate representation of source code and its relevant properties form the backbone of Artificial Intelligence (AI)/ Machine Learning (ML) pipelines for various software engineering (SE) tasks suc", "doi": "10.1109/ICSE-Companion58688.2023.00098"}
+{"id": "aiassisted-software-engineering-2023", "title": "AI-assisted Software Engineering: a tertiary study", "authors": ["Orges Cico", "B. Çiço", "Andja Cico"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MECO58584.2023.10154972", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The research in Artificial Intelligence (AI) and its applications across the software engineering (SE) domain has progressed significantly in the last decade, evidenced by an increase in systematic li", "doi": "10.1109/MECO58584.2023.10154972"}
+{"id": "review-trending-crowdsourcing-2023", "title": "A Review of Trending Crowdsourcing Topics in Software Engineering Highlighting Mobile Crowdsourcing and AI Utilization", "authors": ["Mohammed Alghasham", "Mousa Alzakan", "Mohammed Al-Hagery"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.14569/ijacsa.2023.0140486", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: —Today’s modern technologies and requirements make the utilization of crowdsourcing more viable and applicable. It is one of the problem-solving models that can be used in various domains to reduce co", "doi": "10.14569/ijacsa.2023.0140486"}
+{"id": "ai-software-engineering-2023-2", "title": "AI in Software Engineering: Case Studies and Prospects", "authors": ["Lei Wang"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2309.15768", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial intelligence (AI) and software engineering (SE) are two important areas in computer science. In recent years, researchers are trying to apply AI techniques in various stages of software dev", "arxiv_id": "2309.15768", "doi": "10.48550/arXiv.2309.15768"}
+{"id": "aiaugmented-devops-paradigm-2023", "title": "Ai-Augmented DevOps: A paradigm shifts in scalable software engineering and IT operations", "authors": ["Mahipal Reddy Yalla"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.30574/wjaets.2023.10.2.0293", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With respect to scalable software engineering in addition to IT operations, AI-enabled DevOps, also known as AI-augmented DevOps or AIOps, is a phenomenal transformation catalyst. Last but certainly n", "doi": "10.30574/wjaets.2023.10.2.0293"}
+{"id": "ai-driven-strategies-2023", "title": "AI Driven Strategies for Efficient Project Tracking and Delivery in Software Engineering Management", "authors": ["Srikanth Reddy Keshireddy"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.69978/rebicte.v9i.202", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The effective monitoring of progress and the timely execution of deliverables continue to present problems in the field of software engineering management owing to shifting requirements, resource limi", "doi": "10.69978/rebicte.v9i.202"}
+{"id": "trustworthy-sentiment-analysis-2025", "title": "Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection", "authors": ["Martin Obaidi", "Marc Herrmann", "J. Klünder", "Kurt Schneider"], "year": 2025, "venue": "2025 IEEE 33rd International Requirements Engineering Conference Workshops (REW)", "source_url": "https://arxiv.org/abs/2507.02137", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software development relies heavily on text-based communication, making sentiment analysis a valuable tool for understanding team dynamics and supporting trustworthy AI-driven analytics in requirement", "arxiv_id": "2507.02137", "doi": "10.1109/REW66121.2025.00080"}
+{"id": "early-formalization-aitools-2023", "title": "Early Formalization of AI-tools Usage in Software Engineering in Europe: Study of 2023", "authors": ["D. Pashchenko"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.5815/ijitcs.2023.06.03", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This scientific article presents the results of a study focused on the current practices and future prospects of AI-tools usage, specifically large language models (LLMs), in software development (SD)", "doi": "10.5815/ijitcs.2023.06.03"}
+{"id": "exploring-individual-factors-2025", "title": "Exploring Individual Factors in the Adoption of LLMs for Specific Software Engineering Tasks", "authors": ["Stefano Lambiase", "Gemma Catolino", "Fabio Palomba", "F. Ferrucci", "Daniel Russo"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.02553", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The advent of Large Language Models (LLMs) is transforming software development, significantly enhancing software engineering processes. Research has explored their role within development teams, focu", "arxiv_id": "2504.02553", "doi": "10.48550/arXiv.2504.02553"}
+{"id": "trustworthy-ai-software-2023", "title": "Towards Trustworthy AI Software Development Assistance", "authors": ["Daniel Maninger", "Krishna Narasimhan", "Mira Mezini"], "year": 2023, "venue": "2024 IEEE/ACM 46th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)", "source_url": "https://arxiv.org/abs/2312.09126", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: It is expected that in the near future, AI software development assistants will play an important role in the software industry. However, current software development assistants tend to be unreliable,", "arxiv_id": "2312.09126", "doi": "10.1145/3639476.3639770"}
+{"id": "curious-critical-thinker-2025", "title": "Curious, Critical Thinker, Empathetic, and Ethically Responsible: Essential Soft Skills for Data Scientists in Software Engineering", "authors": ["Matheus de Morais Leça", "Ronnie de Souza Santos"], "year": 2025, "venue": "2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)", "source_url": "https://arxiv.org/abs/2501.02088", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Background. As artificial intelligence and AI-powered systems continue to grow, the role of data scientists has become essential in software development environments. Data scientists face challenges r", "arxiv_id": "2501.02088", "doi": "10.1109/ICSE-SEIS66351.2025.00021"}
+{"id": "software-engineering-education-2024-2", "title": "Software Engineering Education Must Adapt and Evolve for an LLM Environment", "authors": ["V. Kirova", "Cyril S. Ku", "Joseph R. Laracy", "Thomas J. Marlowe"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3626252.3630927", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the era of artificial intelligence (AI), generative AI, and Large Language Models (LLMs) in particular, have become increasingly significant in various sectors. LLMs such as GPT expand their applic", "doi": "10.1145/3626252.3630927"}
+{"id": "trailer-acm-2030-2024", "title": "The Trailer of the ACM 2030 Roadmap for Software Engineering", "authors": ["Mauro Pezzè", "Matteo Ciniselli", "L. Grazia", "Niccolò Puccinelli", "Ketai Qiu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3696117.3696126", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The landscape of software engineering has dramatically changed. The recent advances in AI, the new opportunities of quantum computing, and the new challenges of sustainability and cyber security upset", "doi": "10.1145/3696117.3696126"}
+{"id": "code-ownership-opensource-2023", "title": "Code Ownership in Open-Source AI Software Security", "authors": ["Jiawen Wen", "Dong Yuan", "Lei Ma", "Huaming Chen"], "year": 2023, "venue": "2024 IEEE/ACM International Workshop on Responsible AI Engineering (RAIE)", "source_url": "https://arxiv.org/abs/2312.10861", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: As open-source AI software projects become an integral component in the AI software development, it is critical to develop a novel measurement method to ensure the security of the open-source AI proje", "arxiv_id": "2312.10861", "doi": "10.1145/3643691.3648586"}
+{"id": "current-challenges-software-2024", "title": "The Current Challenges of Software Engineering in the Era of Large Language Models", "authors": ["Cuiyun Gao", "Xing Hu", "Shan Gao", "X. Xia", "Zhi Jin"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2412.14554", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the advent of large language models (LLMs) in the AI area, the field of software engineering (SE) has also witnessed a paradigm shift. These models, by leveraging the power of deep learning and m", "arxiv_id": "2412.14554", "doi": "10.1145/3712005"}
+{"id": "syncmind-measuring-agent-2025", "title": "SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering", "authors": ["Xuehang Guo", "Xingyao Wang", "Yangyi Chen", "Sha Li", "Chi Han"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2502.06994", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software engineering (SE) is increasingly collaborative, with developers working together on shared complex codebases. Effective collaboration in shared environments requires participants -- whether h", "arxiv_id": "2502.06994", "doi": "10.48550/arXiv.2502.06994"}
+{"id": "facilitating-trustworthy-humanagent-2025", "title": "Facilitating Trustworthy Human-Agent Collaboration in LLM-based Multi-Agent System oriented Software Engineering", "authors": ["Krishna Ronanki"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2505.04251", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multi-agent autonomous systems (MAS) are better at addressing challenges that spans across multiple domains than singular autonomous agents. This holds true within the field of software engineering (S", "arxiv_id": "2505.04251", "doi": "10.1145/3696630.3728717"}
+{"id": "engineered-prompts-chatgpt-2025", "title": "Engineered Prompts in ChatGPT for Educational Assessment in Software Engineering and Computer Science", "authors": ["Ayman Diyab", "R. Frost", "Benjamin David Fedoruk", "Ahmad Diyab"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3390/educsci15020156", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI Assess, a ChatGPT-based assessment system utilizing the ChatGPT platform by OpenAI, composed of four components, is proposed herein. The components are tested on the GPT model to determine to what ", "doi": "10.3390/educsci15020156"}
+{"id": "artificial-intelligence-system-2024", "title": "Artificial Intelligence in System and Software Engineering for Auto Code Generation", "authors": ["Anupriya Sharma Ghai", "Vandna Rawat", "V. Gupta", "Kapil Ghai"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICEECT61758.2024.10738945", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence has profoundly impacted system and software engineering by facilitating the automation of complicated tasks, reducing errors, and accelerating the development process. AI-drive", "doi": "10.1109/ICEECT61758.2024.10738945"}
+{"id": "software-engineering-large-2025", "title": "Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead", "authors": ["Hongzhou Rao", "Yanjie Zhao", "Xinyi Hou", "Shenao Wang", "Haoyu Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.23762", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) has redefined artificial intelligence (AI), pushing the boundaries of AI research and enabling unbounded possibilities for both academia and the i", "arxiv_id": "2506.23762", "doi": "10.48550/arXiv.2506.23762"}
+{"id": "systematic-literature-review-2025-3", "title": "A Systematic Literature Review on Explainability for ML/DL-based Software Engineering", "authors": ["Sicong Cao", "Xiaobing Sun", "Ratnadira Widyasari", "David Lo", "Xiaoxue Wu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3763230", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, in", "doi": "10.1145/3763230"}
+{"id": "encouraging-students-responsible-2025", "title": "Encouraging Students' Responsible Use of GenAI in Software Engineering Education: A Causal Model and Two Institutional Applications", "authors": ["Vahid Garousi", "Zafar Jafarov", "Aytan Movsumova", "Atif Namazov", "Huseyn Mirzayev"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.00682", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Context: As generative AI (GenAI) tools such as ChatGPT and GitHub Copilot become pervasive in education, concerns are rising about students using them to complete rather than learn from coursework-ri", "arxiv_id": "2506.00682", "doi": "10.48550/arXiv.2506.00682"}
+{"id": "leveraging-open-source-2024", "title": "Leveraging Open Source LLMs for Software Engineering Education and Training", "authors": ["Juanan Pereira", "J. López", "Xabier Garmendia", "Maider Azanza"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CSEET62301.2024.10663055", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI, particularly Large Language Models (LLMs), presents innovative opportunities to enhance software engineering education. Open source LLMs such as LLaMA and Mistral leverage the potential", "doi": "10.1109/CSEET62301.2024.10663055"}
+{"id": "examining-utilization-artificial-2024", "title": "Examining the Utilization of Artificial Intelligence Tools by Students in Software Engineering Projects", "authors": ["A. Dirin", "T. Laine"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.5220/0012729400003693", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : With the popularity of AI-based tools, the landscape of learning and teaching software engineering has shifted to a new era, which has left both educators and students confused regarding the extent ", "doi": "10.5220/0012729400003693"}
+{"id": "software-engineering-ai-2023", "title": "Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things - SEA4DQ'22 Report", "authors": ["P. Nguyen", "Sagar Sen", "Beatriz Bretones-Cassoli", "Nicolas Jourdan", "M. C. Magnanini"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3573074.3573103", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Cyber-physical systems (CPS)/Internet of Things (IoT) are omnipresent in many industrial sectors and application domains in which the quality of the data acquired and used for decision support in comm", "doi": "10.1145/3573074.3573103"}
+{"id": "study-aibased-techniques-2023", "title": "A study of AI-based techniques for requirement analysis in software engineering", "authors": ["R. Budake", "S. Bhoite", "Kabir Kharade"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1063/5.0178114", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1063/5.0178114"}
+{"id": "evidencebased-software-engineering-2025", "title": "Evidence-Based Software Engineering Guidelines Revisited", "authors": ["S. Pfleeger", "Barbara Kitchenham"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2025.3526730", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In 2002, the authors and their colleagues proposed some preliminary guidelines for empirical software engineering research. In this paper, we revisit them. We believe that for the purpose of supportin", "doi": "10.1109/TSE.2025.3526730"}
+{"id": "agentic-software-engineering-2025", "title": "Toward Agentic Software Engineering Beyond Code: Framing Vision, Values, and Vocabulary", "authors": ["Rashina Hoda"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.19692", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Agentic AI is poised to usher in a seismic paradigm shift in Software Engineering (SE). As technologists rush head-along to make agentic AI a reality, SE researchers are driven to establish agentic SE", "arxiv_id": "2510.19692", "doi": "10.48550/arXiv.2510.19692"}
+{"id": "federated-learning-software-2024", "title": "Federated Learning for Software Engineering: A Case Study of Code Clone Detection and Defect Prediction", "authors": ["Yanming Yang", "Xing Hu", "Zhipeng Gao", "Jinfu Chen", "Chao Ni"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2023.3347898", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In various research domains, artificial intelligence (AI) has gained significant prominence, leading to the development of numerous learning-based models in research laboratories, which are evaluated ", "doi": "10.1109/TSE.2023.3347898"}
+{"id": "automatic-grading-short-2024", "title": "Automatic Grading of Short Answers Using Large Language Models in Software Engineering Courses", "authors": ["Ta Nguyen Binh Duong", "Chai Yi Meng"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EDUCON60312.2024.10578839", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Short-answer based questions have been used widely due to their effectiveness in assessing whether the desired learning outcomes have been attained by students. However, due to their open-ended nature", "doi": "10.1109/EDUCON60312.2024.10578839"}
+{"id": "opportunities-challenges-software-2025", "title": "Opportunities and Challenges of Software Engineering Bots: A Forward-Looking Analysis", "authors": ["Glaucia Melo"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/BotSE67031.2025.00014", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The landscape of software engineering bots has evolved dramatically with the emergence of sophisticated AI-powered development tools and foundation models. This paper examines software engineering bot", "doi": "10.1109/BotSE67031.2025.00014"}
+{"id": "laws-ethics-fairness-2025", "title": "Laws, Ethics, and Fairness in Software Engineering", "authors": ["Miroslaw Staron", "S. Abrahão", "Alexander Serebrenik", "Birgit Penzenstadler", "J. Horkoff"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MS.2024.3469488", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software engineering in the era of generative AI, large data sets and superfast pace of software development often tends to focus on technology, tools and methods, putting aside us, software engineers", "doi": "10.1109/MS.2024.3469488"}
+{"id": "designing-reusable-llmenhanced-2025", "title": "Designing Reusable LLM-Enhanced Assignments: A Quality-Oriented Framework for Software Engineering Education", "authors": ["Olga Manakina", "Chung-Horng Lung"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/COMPSAC65507.2025.00310", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are increasingly prevalent in software engineering (SE) practice, yet their integration into education remains improvised and lacks theoretical grounding. This paper prese", "doi": "10.1109/COMPSAC65507.2025.00310"}
+{"id": "analysis-studentllm-interaction-2025", "title": "Analysis of Student-LLM Interaction in a Software Engineering Project", "authors": ["Naman Agrawal", "Ridwan Shariffdeen", "Guanlin Wang", "Sanka Rasnayaka", "Ganesh Neelakanta Iyer"], "year": 2025, "venue": "2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)", "source_url": "https://arxiv.org/abs/2502.01273", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are becoming increasingly competent across various domains, educators are showing a growing interest in integrating these LLMs into the learning process. Especially in sof", "arxiv_id": "2502.01273", "doi": "10.1109/LLM4Code66737.2025.00019"}
+{"id": "impact-artificial-intelligence-2025-2", "title": "Impact of Artificial Intelligence on Software Engineering Phases and Activities (2013–2024): A Quantitative Analysis Using Zero- Truncated Poisson Model", "authors": ["U. Durrani", "Mustafa Akpınar", "Hakan Bektaş", "Mohammed Saleh"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3574462", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents the results of a quantitative analysis derived from data collected in our earlier systematic literature review, focusing on integrating Artificial Intelligence (AI) techniques acro", "doi": "10.1109/ACCESS.2025.3574462"}
+{"id": "supporting-brainstorming-activities-2025", "title": "Supporting Brainstorming Activities with Bots in Software Engineering Education", "authors": ["Juan Carlos Farah", "Jérémy La Scala", "Sandy Ingram", "Denis Gillet"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/BotSE67031.2025.00013", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent rise in the performance and availability of large language models (LLMs) has fueled the adoption of generative artificial intelligence (AI) to support software engineering. Technologies suc", "doi": "10.1109/BotSE67031.2025.00013"}
+{"id": "senai-software-engineering-2025", "title": "SENAI: Towards Software Engineering Native Generative Artificial Intelligence", "authors": ["M. Saad", "Jos'e Antonio Hern'andez L'opez", "Boqi Chen", "Neil A. Ernst", "D'aniel Varr'o"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2503.15282", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models have significantly advanced the field of code generation, demonstrating the ability to produce functionally correct code snippets. However, advancements in generative AI for code", "arxiv_id": "2503.15282", "doi": "10.48550/arXiv.2503.15282"}
+{"id": "green-prompt-engineering-2025", "title": "Green Prompt Engineering: Investigating the Energy Impact of Prompt Design in Software Engineering", "authors": ["Vincenzo De Martino", "Mohammad Amin Zadenoori", "Xavier Franch", "Alessio Ferrari"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.22320", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Language Models are increasingly applied in software engineering, yet their inference raises growing environmental concerns. Prior work has examined hardware choices and prompt length, but little atte", "arxiv_id": "2509.22320", "doi": "10.48550/arXiv.2509.22320"}
+{"id": "drinking-chai-your-2022", "title": "Drinking Chai with Your (AI) Programming Partner: A Design Fiction about Generative AI for Software Engineering 107-122", "authors": ["Michael J. Muller", "Steven I. Ross", "Stephanie Houde", "Mayank Agarwal", "Fernando Martinez"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/bd47757c72178d4657b618cd55514b6e1c36dd42", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "software-engineering-ai-2022", "title": "Software Engineering and AI for Data Quality in Cyber- Physical Systems - SEA4DQ'21 Workshop Report", "authors": ["P. Nguyen", "Sagar Sen", "Nicolas Jourdan", "Beatriz Cassoli", "Per Myrseth"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3502771.3502781", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Cyber-physical systems (CPS) have been developed in many industrial sectors and application domains in which the quality requirements of data acquired are a common factor. Data quality in CPS can dete", "doi": "10.1145/3502771.3502781"}
+{"id": "sustainable-energy-solutions-2022", "title": "Sustainable energy solutions through AI and software engineering: Optimizing resource management in renewable energy systems", "authors": ["Olusegun Gbenga Odunaiya", "Oluwatobi Timothy Soyombo", "Olakojo Yusuff Ogunsola"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.54660/.jaes.2022.2.1.26-37", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Sustainable energy solutions are increasingly critical in addressing global energy demands while minimizing environmental impact. The integration of artificial intelligence (AI) and software engineeri", "doi": "10.54660/.jaes.2022.2.1.26-37"}
+{"id": "ai-software-reliability-2023", "title": "AI Software Reliability: Concepts and Related Domains", "authors": ["Cong Pan", "Jun You", "Yan Gao"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AIIIP61647.2023.00061", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software reliability, a cornerstone in software engineering, has gained renewed attention with the emergence of artificial intelligence (AI), spotlighting the significance of AI software reliability. ", "doi": "10.1109/AIIIP61647.2023.00061"}
+{"id": "quantum-software-engineering-2024", "title": "Quantum software engineering and quantum software development lifecycle: a survey", "authors": ["Kanishk Dwivedi", "Majid Haghparast", "T. Mikkonen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10586-024-04362-1", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Quantum software engineering is advancing in the domain of quantum computing research and application, yet the documentation is scattered. The slow transition from Von-Neumann based computation system", "doi": "10.1007/s10586-024-04362-1"}
+{"id": "aiguided-modeldriven-embedded-2022", "title": "AI-guided Model-Driven Embedded Software Engineering", "authors": ["Padma Iyenghar", "Friedrich Otte", "Elke Pulvermueller"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.5220/0011006200003119", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: : In this paper, an use case of Artiﬁcial Intelligence (AI) empowered Model Driven Engineering (MDE) in the ﬁeld of Embedded Software Engineering (ESE) is introduced. In this context, we propose to qu", "doi": "10.5220/0011006200003119"}
+{"id": "digital-sovereignty-software-2022", "title": "Digital Sovereignty and Software Engineering for the IoT-laden, AI/ML-driven Era", "authors": ["C. Berger"], "year": 2022, "venue": "IEEE International Conference on Services Computing", "source_url": "https://arxiv.org/abs/2205.14137", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Today’s software engineering already needs to deal with challenges originating from the multidisciplinarity that is required to realize IoT products. Many variants consist of sensor/actuator-powered s", "arxiv_id": "2205.14137", "doi": "10.1109/SCC55611.2022.00059"}
+{"id": "some-aspects-software-2022", "title": "Some aspects of software engineering for AI-based systems", "authors": ["V. Liubchenko"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.15407/pp2022.03-04.099", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI-based software systems are rapidly spreading in various business areas. In this context, the unavoidable convergence of the Software Engineering and Artificial Intelligence and Machine Learning (AI", "doi": "10.15407/pp2022.03-04.099"}
+{"id": "you-real-software-2024", "title": "Are You a Real Software Engineer? Best Practices in Online Recruitment for Software Engineering Studies", "authors": ["Adam Alami", "Mansooreh Zahedi", "Neil A. Ernst"], "year": 2024, "venue": "2024 IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering (WSESE)", "source_url": "https://arxiv.org/abs/2402.01925", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Online research platforms, such as Prolific, offer rapid access to diverse participant pools but also pose unique challenges in participant qualification and skill verification. Previous studies repor", "arxiv_id": "2402.01925", "doi": "10.1145/3643664.3648207"}
+{"id": "evaluating-quality-genai-2025", "title": "Evaluating the quality of GenAI applications in software engineering: a multi-case study", "authors": ["Liang Yu", "Emil Alégroth", "Panagiota Chatzipetrou", "Tony Gorschek"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10664-025-10759-2", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10664-025-10759-2"}
+{"id": "from-natural-language-2024", "title": "From Natural Language to Web Applications: Using Large Language Models for Model-Driven Software Engineering", "authors": ["Lukas Netz", "Judith Michael", "Bernhard Rumpe"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.18420/modellierung2024_018", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.18420/modellierung2024_018"}
+{"id": "review-new-challenges-2022", "title": "A Review on New Challenges in AI and Software Engineering", "authors": ["I. Venkata Dwaraka Srihith", "R. Varaprasad", "Y. Rama Mohan", "T. Aditya Sai Srinivas", "Y. Sravanthi"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.48175/ijarsct-7137", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Artificial Intelligence (AI) has been around for a long time, but it's only recently become a mainstream concern. When it comes to cutting-edge research and development, At the moment, AI is at the to", "doi": "10.48175/ijarsct-7137"}
+{"id": "systematic-literature-review-2024-3", "title": "Systematic Literature Review of Prompt Engineering Patterns in Software Engineering", "authors": ["Yuya Sasaki", "H. Washizaki", "Jialong Li", "Dominik Sander", "Nobukazu Yoshioka"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/COMPSAC61105.2024.00096", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Advancements in large language models (LLMs) are transforming software engineering through innovative prompt engineering strategies. By analyzing prompt-driven enhancements across key software enginee", "doi": "10.1109/COMPSAC61105.2024.00096"}
+{"id": "context-engineering-ai-2025", "title": "Context Engineering for AI Agents in Open-Source Software", "authors": ["Seyedmoein Mohsenimofidi", "Matthias Galster", "Christoph Treude", "Sebastian Baltes"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.21413", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: GenAI-based coding assistants have disrupted software development. The next generation of these tools is agent-based, operating with more autonomy and potentially without human oversight. Like human d", "arxiv_id": "2510.21413", "doi": "10.48550/arXiv.2510.21413"}
+{"id": "engineering-hyperpersonalization-software-2025", "title": "Engineering hyper-personalization: Software challenges and brand performance in AI-driven digital marketing management: An empirical study", "authors": ["Raiyan Haider", "Md Farhan Abrar Ibne Bari", "Md. Farhan Israk Shaif", "Mushfiqur Rahman"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.30574/ijsra.2025.15.2.1525", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this empirical study, we delve into engineering hyper-personalization within AI-driven digital marketing management. We focus specifically on the software challenges encountered and their impact on", "doi": "10.30574/ijsra.2025.15.2.1525"}
+{"id": "trends-intelligent-aibased-2022", "title": "Trends in Intelligent and AI-Based Software Engineering Processes: A Deep Learning-Based Software Process Model Recommendation Method", "authors": ["Fahad H. Alshammari"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1155/2022/1960684", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, numerous studies have successfully implemented machine learning strategies in a wide range of application areas. Therefore, several different deep learning models exist, each one tail", "doi": "10.1155/2022/1960684"}
+{"id": "systematic-literature-review-2024-3-2", "title": "A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research", "authors": ["Sicong Cao", "Xiaobing Sun", "Ratnadira Widyasari", "David Lo", "Xiaoxue Wu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2401.14617", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, in", "arxiv_id": "2401.14617", "doi": "10.48550/arXiv.2401.14617"}
+{"id": "aidriven-requirements-engineering-2025", "title": "An AI-driven Requirements Engineering Framework Tailored for Evaluating AI-Based Software", "authors": ["Hamed Barzamini", "Fatemeh Nazaritiji", "A. Brockmann", "Hasan Ferdowsi", "Mona Rahimi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CAIN66642.2025.00025", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Requirements Engineering (RE) has been extensively refined for traditional software systems, but AI-based software (AIS)11In this work, AI-based software (AIS) refers to software that relies exclusive", "doi": "10.1109/CAIN66642.2025.00025"}
+{"id": "survey-application-ai-2025", "title": "A Survey on Application of AI on Reverse Engineering for Software Analysis and Security", "authors": ["Ashutosh Ghimire", "Sahasra Rao Lingala", "Junjie Zhang", "Faris Alsulami", "Fathi H. Amsaad"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3593456", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reverse engineering process serves essential functions in software analysis and security auditing and malware detection but requires significant time and effort. Researchers and practitioners now inve", "doi": "10.1109/ACCESS.2025.3593456"}
+{"id": "llms-imperfect-then-2024", "title": "LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering", "authors": ["Jiessie Tie", "Bingsheng Yao", "Tianshi Li", "Syed Ishtiaque Ahmed", "Dakuo Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.09916", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software engineers are integrating AI assistants into their workflows to enhance productivity and reduce cognitive strain. However, experiences vary significantly, with some engineers finding large la", "arxiv_id": "2411.09916", "doi": "10.48550/arXiv.2411.09916"}
+{"id": "motivations-challenges-best-2024", "title": "Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review", "authors": ["Stefano Lambiase", "Gemma Catolino", "Fabio Palomba", "F. Ferrucci"], "year": 2024, "venue": "ACM Computing Surveys", "source_url": "https://arxiv.org/abs/2409.11864", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Bots are software systems designed to support users by automating specific processes, tasks, or activities. When these systems implement a conversational component to interact with users, they are als", "arxiv_id": "2409.11864", "doi": "10.1145/3704806"}
+{"id": "requirements-all-you-2024", "title": "Requirements Are All You Need: The Final Frontier for End-User Software Engineering", "authors": ["Diana Robinson", "Christian Cabrera", "Andrew D. Gordon", "Neil D. Lawrence", "Lars Mennen"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2405.13708", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: What if end-users could own the software development lifecycle from conception to deployment using only requirements expressed in language, images, video or audio? We explore this idea, building on th", "arxiv_id": "2405.13708", "doi": "10.1145/3708524"}
+{"id": "insights-from-frontline-2024", "title": "Insights from the Frontline: GenAI Utilization Among Software Engineering Students", "authors": ["Rudrajit Choudhuri", "Ambareesh Ramakrishnan", "Amreeta Chatterjee", "Bianca Trinkenreich", "Igor Steinmacher"], "year": 2024, "venue": "Conference on Software Engineering Education and Training", "source_url": "https://arxiv.org/abs/2412.15624", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative AI (genAI) tools (e.g., ChatGPT, Copilot) have become ubiquitous in software engineering (SE). As SE educators, it behooves us to understand the consequences of genAI usage among SE student", "arxiv_id": "2412.15624", "doi": "10.1109/CSEET66350.2025.00007"}
+{"id": "secure-qa-aidriven-2025", "title": "Secure QA: AI-driven security testing and privacy-preserving frameworks in modern software quality engineering", "authors": ["Jyotheeswara Reddy Gottam"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.30574/wjaets.2025.15.2.0531", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This article presents a comprehensive analysis of emerging approaches to integrate security and privacy measures throughout the software quality lifecycle. The article examines how AI-driven security ", "doi": "10.30574/wjaets.2025.15.2.0531"}
+{"id": "impact-chatgpt-teaching-2024", "title": "On the Impact of ChatGPT on Teaching and Studying Software Engineering", "authors": ["Benedikt Zönnchen", "Veronika Thurner", "Axel Böttcher"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EDUCON60312.2024.10578680", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: AI -systems that are based on large language models, such as ChatGPT, have quickly increased their prowess over the last year, and at the same time became readily available. As of now, many discipline", "doi": "10.1109/EDUCON60312.2024.10578680"}
+{"id": "generative-artificial-intelligence-2024", "title": "Generative Artificial Intelligence Use in Optimising Software Engineering Process: A Systematic Literature Review", "authors": ["Uldis Karlovs-Karlovskis"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.2478/acss-2024-0009", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Abstract Generative AI is only a few years old but already being applied in Software Engineering (SE). This literature review examines the most popular SE sub-fields of such cases and research methods", "doi": "10.2478/acss-2024-0009"}
+{"id": "chatgpt-undergraduate-computer-2024", "title": "Using ChatGPT in Undergraduate Computer Science and Software Engineering Courses: A Students' Perspective", "authors": ["Noah Andersen-Kiel", "P. P. Linos"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FIE61694.2024.10892934", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This innovative practice full paper presents an empirical study aimed at evaluating the potential of ChatGPT, an advanced AI-driven chatbot, as a supplementary educational tool in undergraduate Comput", "doi": "10.1109/FIE61694.2024.10892934"}
+{"id": "morescient-gai-software-2024", "title": "Morescient GAI for Software Engineering", "authors": ["Marcus Kessel", "Colin Atkinson"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2406.04710", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ability of Generative AI (GAI) technology to automatically check, synthesize, and modify software engineering artifacts promises to revolutionize all aspects of software engineering. Using GAI for", "arxiv_id": "2406.04710", "doi": "10.1145/3709354"}
+{"id": "when-neural-code-2024", "title": "When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference", "authors": ["Zhensu Sun", "Xiaoning Du", "Fu Song", "Shangwen Wang", "Li Li"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2401.09964", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Leveraging recent advancements in large language models, modern neural code completion models have demonstrated the capability to generate highly accurate code suggestions. However, their massive size", "arxiv_id": "2401.09964", "doi": "10.1145/3597503.3639120"}
+{"id": "does-your-neural-2024", "title": "Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach", "authors": ["Yao Wan", "Guang-Xiao Wan", "Shijie Zhang", "Hongyu Zhang", "Yulei Sui"], "year": 2024, "venue": "ACM Transactions on Software Engineering and Methodology", "source_url": "https://arxiv.org/abs/2404.14296", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent years have witnessed significant progress in developing deep learning-based models for automated code completion. Examples of such models include CodeGPT and StarCoder. These models are typical", "arxiv_id": "2404.14296", "doi": "10.1145/3742785"}
+{"id": "productivity-assessment-neural-2022", "title": "Productivity assessment of neural code completion", "authors": ["Albert Ziegler", "Eirini Kalliamvakou", "Shawn Simister", "Ganesh Sittampalam", "Alice Li"], "year": 2022, "venue": "MAPS@PLDI", "source_url": "https://arxiv.org/abs/2205.06537", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural code synthesis has reached a point where snippet generation is accurate enough to be considered for integration into human software development workflows. Commercial products aim to increase pr", "arxiv_id": "2205.06537", "doi": "10.1145/3520312.3534864"}
+{"id": "codemark-imperceptible-watermarking-2023", "title": "CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models", "authors": ["Zhensu Sun", "Xiaoning Du", "Fu Song", "Li Li"], "year": 2023, "venue": "ESEC/SIGSOFT FSE", "source_url": "https://arxiv.org/abs/2308.14401", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code datasets are of immense value for training neural-network-based code completion models, where companies or organizations have made substantial investments to establish and process these datasets.", "arxiv_id": "2308.14401", "doi": "10.1145/3611643.3616297"}
+{"id": "your-code-secret-2023", "title": "Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-Coded Credentials", "authors": ["Yizhan Huang", "Yichen Li", "Weibin Wu", "Jianping Zhang", "Michael R. Lyu"], "year": 2023, "venue": "Proc. ACM Softw. Eng.", "source_url": "https://arxiv.org/abs/2309.07639", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural Code Completion Tools (NCCTs) have reshaped the field of software engineering, which are built upon the language modeling technique and can accurately suggest contextually relevant code snippet", "arxiv_id": "2309.07639", "doi": "10.1145/3660818"}
+{"id": "do-not-give-2023", "title": "Do Not Give Away My Secrets: Uncovering the Privacy Issue of Neural Code Completion Tools", "authors": ["Yizhan Huang", "Yichen Li", "Weibin Wu", "Jianping Zhang", "Michael R. Lyu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2309.07639", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2309.07639"}
+{"id": "multiplepath-learning-neural-2023", "title": "A Multiple-Path Learning Neural Network Model for Code Completion", "authors": ["Yi Liu", "Jianxun Liu", "Xiangping Zhang", "Haize Hu"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICWS60048.2023.00042", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, which can accelerate the software development process and improve the quality of software products, is an essential part of today’s integrated development environments. It has become ", "doi": "10.1109/ICWS60048.2023.00042"}
+{"id": "repofusionindecoder-efficient-crossfile-2025", "title": "RepoFusion-in-Decoder: Efficient Cross-File Code Completion via Lightweight Encoder Fusion", "authors": ["Zhuoqing Zhong", "Wei Liu", "Changye Yang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AANN66429.2025.11257694", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language model (LLM) based code completion tools have greatly enhanced software development productivity by providing repository-aware suggestions. However, existing state-of-the-art decoder-onl", "doi": "10.1109/AANN66429.2025.11257694"}
+{"id": "specializing-neural-networks-2023", "title": "Specializing Neural Networks for Cryptographic Code Completion Applications", "authors": ["Ya Xiao", "Wen-Kai Song", "Jingyuan Qi", "Bimal Viswanath", "P. McDaniel"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2023.3265362", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Similarities between natural languages and programming languages have prompted researchers to apply neural network models to software problems, such as code generation and repair. However, program-spe", "doi": "10.1109/TSE.2023.3265362"}
+{"id": "review-statistical-language-2023", "title": "A review on statistical language and neural network based code completion", "authors": ["Ze Gao"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.54254/2755-2721/22/20231222", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, also referred to as intellisense, is a prevalent feature of Integrated Development Environments (IDEs) and code editors. It aids developers by automatically recommending and inserting", "doi": "10.54254/2755-2721/22/20231222"}
+{"id": "grace-graphguided-repositoryaware-2025", "title": "GRACE: Graph-Guided Repository-Aware Code Completion through Hierarchical Code Fusion", "authors": ["Xingliang Wang", "Baoyi Wang", "Chen Zhi", "Junxiao Han", "Xinkui Zhao"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.05980", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLMs excel in localized code completion but struggle with repository-level tasks due to limited context windows and complex semantic and structural dependencies across codebases. While Retrieval-Augme", "arxiv_id": "2509.05980", "doi": "10.48550/arXiv.2509.05980"}
+{"id": "dont-complete-it-2022", "title": "Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems", "authors": ["Zhensu Sun", "Xiaoning Du", "Fu Song", "Shangwen Wang", "Mingze Ni"], "year": 2022, "venue": "2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)", "source_url": "https://arxiv.org/abs/2209.05948", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Currently, large pre-trained language models are widely applied in neural code completion systems. Though large code models significantly outperform their smaller counterparts, around 70% of displayed", "arxiv_id": "2209.05948", "doi": "10.1145/3688831"}
+{"id": "challenge-optimization-context-2025", "title": "Challenge on Optimization of Context Collection for Code Completion", "authors": ["Dmitry Ustalov", "Egor Bogomolov", "A. Bezzubov", "Yaroslav Golubev", "Evgeniy Glukhov"], "year": 2025, "venue": "2025 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)", "source_url": "https://arxiv.org/abs/2510.04349", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of workflows and methods for software engineering using AI emphasizes the need for a systematic evaluation and analysis of their ability to leverage information from entire proje", "arxiv_id": "2510.04349", "doi": "10.1109/ASEW67777.2025.00072"}
+{"id": "comprehensive-study-code-2025", "title": "A Comprehensive Study on Code Completion for Large Language Models", "authors": ["Yunzhen Cai"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICAACE65325.2025.11020128", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have demonstrated remarkable capabilities in code generation, exhibiting a profound understanding of code semantics and functionality. Code completion is a critical task within c", "doi": "10.1109/ICAACE65325.2025.11020128"}
+{"id": "learning-prevent-profitless-2022", "title": "Learning to Prevent Profitless Neural Code Completion", "authors": ["Zhensu Sun", "Xiaoning Du", "Fu Song", "Shangwen Wang", "Mingze Ni"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2209.05948", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2209.05948"}
+{"id": "methodology-refined-evaluation-2022", "title": "A methodology for refined evaluation of neural code completion approaches", "authors": ["K. T. Le", "Gabriel Rashidi", "A. Andrzejak"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10618-022-00866-9", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion has become an indispensable feature of modern Integrated Development Environments. In recent years, many approaches have been proposed to tackle this task. However, it is hard to compa", "doi": "10.1007/s10618-022-00866-9"}
+{"id": "model-cascading-code-2024-2", "title": "Model Cascading for Code: A Cascaded Black-Box Multi-Model Framework for Cost-Efficient Code Completion with Self-Testing", "authors": ["Boyuan Chen", "Mingzhi Zhu", "Brendan Dolan-Gavitt", "Muhammad Shafique", "Siddharth Garg"], "year": 2024, "venue": "IEEE International Joint Conference on Neural Network", "source_url": "https://arxiv.org/abs/2405.15842", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid advancement of large language models (LLMs) has significantly improved code completion tasks, yet the trade-off between accuracy and computational cost remains a critical challenge. While us", "arxiv_id": "2405.15842", "doi": "10.1109/IJCNN64981.2025.11227916"}
+{"id": "how-robust-neural-2022", "title": "How Robust are Neural Code Completion Models to Source Code Transformation?", "authors": ["Unknown"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/f01e316d3b28ccecda25b4d57926f496a9b17d3d", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "context-composing-full-2024", "title": "Context Composing for Full Line Code Completion", "authors": ["Anton Semenkin", "Yaroslav Sokolov", "Evgeniia Vu"], "year": 2024, "venue": "Ide", "source_url": "https://arxiv.org/abs/2402.09230", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code Completion is one of the most used Integrated Development Environment (IDE) features, which affects the everyday life of a software developer. Modern code completion approaches moved from the com", "arxiv_id": "2402.09230", "doi": "10.1145/3643796.3648446"}
+{"id": "neural-models-source-2024", "title": "Neural Models for Source Code Synthesis and Completion", "authors": ["Mitodru Niyogi"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.06690", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet. The current approaches mainl", "arxiv_id": "2402.06690", "doi": "10.48550/arXiv.2402.06690"}
+{"id": "improving-astlevel-code-2024", "title": "Improving AST-Level Code Completion with Graph Retrieval and Multi-Field Attention", "authors": ["Yu Xia", "Tian Liang", "Weihuan Min", "Li Kuang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3643916.3644420", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, which provides code suggestions by generating code snippets or structures, has become an essential feature of integrated development environments (IDEs). Recently, some studies have b", "doi": "10.1145/3643916.3644420"}
+{"id": "combined-approach-program-2024", "title": "A Combined Approach of Program Analysis and Deep Learning for Code Completion", "authors": ["Yi Liu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.54691/hkyc3a89", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, a critical feature in integrated development environments, significantly reduces the coding workload for developers. Traditional code completion techniques often focus on the natural ", "doi": "10.54691/hkyc3a89"}
+{"id": "ognidc-robust-depth-2024", "title": "OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations", "authors": ["Yiming Zuo", "Jia Deng"], "year": 2024, "venue": "European Conference on Computer Vision", "source_url": "https://arxiv.org/abs/2406.11711", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Depth completion is the task of generating a dense depth map given an image and a sparse depth map as inputs. It has important applications in various downstream tasks. In this paper, we present OGNI-", "arxiv_id": "2406.11711", "doi": "10.48550/arXiv.2406.11711"}
+{"id": "empirical-investigation-performance-2023", "title": "An Empirical Investigation on the Performance of Domain Adaptation for T5 Code Completion", "authors": ["Daisuke Fukumoto", "Yutaro Kashiwa", "Toshiki Hirao", "Kenji Fujiwara", "Hajimu Iida"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SANER56733.2023.00073", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion has the benefit of improving coding speed and reducing the chance of inducing bugs. In recent years, DL-based code completion techniques have been proposed. In particular, pre-trained ", "doi": "10.1109/SANER56733.2023.00073"}
+{"id": "rwkvbased-encoderdecoder-model-2023", "title": "RWKV-based Encoder-Decoder Model for Code Completion", "authors": ["Lu Zhou", "Zhonglin Xiao", "Zhipeng Ning"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/EIECT60552.2023.10442108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Intelligent code completion techniques, which offer code suggestions, are essential for assisting programmers in reducing errors and improving programming efficiency. Traditional Recurrent Neural Netw", "doi": "10.1109/EIECT60552.2023.10442108"}
+{"id": "serenity-library-based-2023", "title": "Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning", "authors": ["Wenting Zhao", "I. Abdelaziz", "Julian Dolby", "Kavitha Srinivas", "M. Helali"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2301.05108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Dynamically typed languages such as Python have become very popular 1 . Among other strengths, Python’s dynamic nature and its straightforward linking to native code have made it the de-facto language", "arxiv_id": "2301.05108", "doi": "10.48550/arXiv.2301.05108"}
+{"id": "unified-multitask-learning-2022", "title": "A unified multi-task learning model for AST-level and token-level code completion", "authors": ["F. Liu", "Ge Li", "Bolin Wei", "Xin Xia", "Zhiyi Fu"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10664-022-10140-7", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10664-022-10140-7"}
+{"id": "framing-program-repair-2022", "title": "Framing Program Repair as Code Completion", "authors": ["Francisco Ribeiro", "Rui Abreu", "João Saraiva"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3524459.3527347", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many techniques have contributed to the advancement of auto-mated program repair, such as: generate and validate approaches, constraint-based solvers and even neural machine translation. Si-multaneous", "doi": "10.1145/3524459.3527347"}
+{"id": "improving-code-completion-2022", "title": "Improving Code Completion by Sequence Features and Structural Features", "authors": ["Ya-Ping Liu", "Zhiqiu Huang", "Yaoshen Yu", "Yasir Hussain", "Lile Lin"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3568364.3568373", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion is essential in integrated development environments (IDEs). It has also shown intelligence in helping developers to product. Recently, neural network-based models have helped improve c", "doi": "10.1145/3568364.3568373"}
+{"id": "improved-methods-pointer-2022", "title": "Improved Methods of Pointer Mixture Network for Code Completion", "authors": ["Cheng Wei", "Zhiqiu Huang", "Yaoshen Yu"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/QRS57517.2022.00095", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion is an efficient software development technique in modern integrated development environments (IDEs), which can predict the most likely code token(s) based on the context of the code to", "doi": "10.1109/QRS57517.2022.00095"}
+{"id": "novel-code-completion-2022", "title": "A Novel Code Completion Strategy", "authors": ["Hayatou Oumarou", "Ousmanou Dahirou"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.14569/ijacsa.2022.0130598", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: —Programmers rely on a multitude of techniques to speed up the development process. Among these techniques is code completion, a productivity improvement technique widely used by developers to explore", "doi": "10.14569/ijacsa.2022.0130598"}
+{"id": "grammart5-grammarintegrated-pretrained-2024", "title": "GrammarT5: Grammar-Integrated Pretrained Encoder-Decoder Neural Model for Code", "authors": ["Qihao Zhu", "Qing-Lin Liang", "Zeyu Sun", "Yingfei Xiong", "Lu Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3597503.3639125", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Pretrained models for code have exhibited promising performance across various code-related tasks, such as code summarization, code completion, code translation, and bug detection. However, despite th", "doi": "10.1145/3597503.3639125"}
+{"id": "parsing-auditory-neural-2025", "title": "Parsing auditory neural code into maximum-entropy packets", "authors": ["Huanqiu Zhang", "Israel Nelken", "Tatyana O. Sharpee"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2025.11.09.687481", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deciphering the neural code requires identifying its fundamental symbols or code-words. Neural activity is usually interpreted either as a rate code – based on average spike counts – or as a temporal ", "doi": "10.1101/2025.11.09.687481"}
+{"id": "modelling-particle-flow-2024", "title": "Modelling of particle flow code geotechnical material parameter relationships based on orthogonal design and back propagation neural network", "authors": ["Yaodong Ni", "Ruirui Wang", "Xianlun Leng", "Fengmin Xia", "Feng Wang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s40571-024-00806-y", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s40571-024-00806-y"}
+{"id": "poster-comparing-neural-2022", "title": "Poster: Comparing Neural Network Solutions in Cryptographic API Completion", "authors": ["Ya Xiao", "Salman Ahmed", "Wen-Kai Song", "Bimal ⇤", "Na Meng"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/9e8a98aa7781f591281bba279d9fa3570a144892", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "compilable-neural-code-2022", "title": "Compilable Neural Code Generation with Compiler Feedback", "authors": ["Xin Wang", "Yasheng Wang", "Yao Wan", "Fei Mi", "Yitong Li"], "year": 2022, "venue": "Findings", "source_url": "https://arxiv.org/abs/2203.05132", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Exis", "arxiv_id": "2203.05132", "doi": "10.48550/arXiv.2203.05132"}
+{"id": "sann-programming-code-2023", "title": "SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction", "authors": ["Muntasir Hoq", "Sushanth Reddy Chilla", "Melika Ahmadi Ranjbar", "Peter Brusilovsky", "Bita Akram"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3583780.3615047", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated analysis of programming data using code representation methods offers valuable services for programmers, from code completion to clone detection to bug detection. Recent studies show the eff", "doi": "10.1145/3583780.3615047"}
+{"id": "neural-machine-translation-2023", "title": "Neural Machine Translation for Code Generation", "authors": ["K. Dharma", "Clayton T. Morrison"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2305.13504", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural machine translation (NMT) methods developed for natural language processing have been shown to be highly successful in automating translation from one natural language to another. Recently, the", "arxiv_id": "2305.13504", "doi": "10.48550/arXiv.2305.13504"}
+{"id": "3dshape2vecset-3d-shape-2023", "title": "3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models", "authors": ["Biao Zhang", "Jiapeng Tang", "M. Nießner", "Peter Wonka"], "year": 2023, "venue": "ACM Transactions on Graphics", "source_url": "https://arxiv.org/abs/2301.11445", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce 3DShape2VecSet, a novel shape representation for neural fields designed for generative diffusion models. Our shape representation can encode 3D shapes given as surface models or point clo", "arxiv_id": "2301.11445", "doi": "10.1145/3592442"}
+{"id": "neural-language-models-2022", "title": "Neural language models for code quality identification", "authors": ["Srinivasan H. Sengamedu", "Hangqi Zhao"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3549034.3561175", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1145/3549034.3561175"}
+{"id": "semcoder-training-code-2024", "title": "SemCoder: Training Code Language Models with Comprehensive Semantics", "authors": ["Yangruibo Ding", "Jinjun Peng", "Marcus J. Min", "Gail E. Kaiser", "Junfeng Yang"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2406.01006", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap betwe", "arxiv_id": "2406.01006", "doi": "10.48550/arXiv.2406.01006"}
+{"id": "enhancing-vulnerability-detection-2025", "title": "Enhancing Vulnerability Detection via Inter-procedural Semantic Completion", "authors": ["Bozhi Wu", "Chengjie Liu", "Zhiming Li", "Yushi Cao", "Jun Sun"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3728912", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inspired by advances in deep learning, numerous learning-based approaches for vulnerability detection have emerged, primarily operating at the function level for scalability. However, this design choi", "doi": "10.1145/3728912"}
+{"id": "how-effective-neural-2023", "title": "How Effective Are Neural Networks for Fixing Security Vulnerabilities", "authors": ["Yi Wu", "Nan Jiang", "H. Pham", "Thibaud Lutellier", "Jordan Davis"], "year": 2023, "venue": "International Symposium on Software Testing and Analysis", "source_url": "https://arxiv.org/abs/2305.18607", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on s", "arxiv_id": "2305.18607", "doi": "10.1145/3597926.3598135"}
+{"id": "deeplearningbased-optimization-large-2025", "title": "Deep-learning-based optimization of large language models for code generation", "authors": ["Shanqi Zhan", "Ying Lin", "Junlin Zhu", "Yao Yao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1117/12.3071470", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In order to improve the performance of the code generation system in semantic modeling and structural dependency construction, a deep learning-based multi-layer Transformer encoding and decoding struc", "doi": "10.1117/12.3071470"}
+{"id": "unsupervised-point-cloud-2024", "title": "Unsupervised Point Cloud Completion through Unbalanced Optimal Transport", "authors": ["Taekyung Lee", "Jaemoo Choi", "Jaewoong Choi", "Myungjoo Kang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.02671", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unpaired point cloud completion is crucial for real-world applications, where ground-truth data for complete point clouds are often unavailable. By learning a completion map from unpaired incomplete a", "arxiv_id": "2410.02671", "doi": "10.48550/arXiv.2410.02671"}
+{"id": "infocd-contrastive-chamfer-2023", "title": "InfoCD: A Contrastive Chamfer Distance Loss for Point Cloud Completion", "authors": ["Fangzhou Lin", "Yun Yue", "Ziming Zhang", "Songlin Hou", "Kazunori D. Yamada"], "year": 2023, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/4827262c9fb1ac41569af6f76944eb8db80ab05d", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "pmpnet-point-cloud-2022", "title": "PMP-Net++: Point Cloud Completion by Transformer-Enhanced Multi-Step Point Moving Paths", "authors": ["Xin Wen", "Peng Xiang", "Yaru Cao", "Pengfei Wan", "Wen Zheng"], "year": 2022, "venue": "IEEE Transactions on Pattern Analysis and Machine Intelligence", "source_url": "https://arxiv.org/abs/2202.09507", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clou", "arxiv_id": "2202.09507", "doi": "10.1109/TPAMI.2022.3159003"}
+{"id": "simcopilot-evaluating-large-2025", "title": "SIMCOPILOT: Evaluating Large Language Models for Copilot-Style Code Generation", "authors": ["Ming Jiang", "Abhinav Jain", "Sophia Zorek", "Christopher Jermaine"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.21514", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce SIMCOPILOT, a benchmark that simulates the role of large language models (LLMs) as interactive,\"copilot\"-style coding assistants. Targeting both completion (finishing incomplete methods o", "arxiv_id": "2505.21514", "doi": "10.48550/arXiv.2505.21514"}
+{"id": "hierarchical-neural-coding-2023", "title": "Hierarchical Neural Coding for Controllable CAD Model Generation", "authors": ["Xiang Xu", "P. Jayaraman", "J. Lambourne", "Karl D. D. Willis", "Yasutaka Furukawa"], "year": 2023, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2307.00149", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from globa", "arxiv_id": "2307.00149", "doi": "10.48550/arXiv.2307.00149"}
+{"id": "large-language-models-2023-3", "title": "Large Language Models of Code Fail at Completing Code with Potential Bugs", "authors": ["Tuan Dinh", "Jinman Zhao", "Samson Tan", "Renato M. P. Negrinho", "Leonard Lausen"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2306.03438", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing wo", "arxiv_id": "2306.03438", "doi": "10.48550/arXiv.2306.03438"}
+{"id": "pattern-completion-disruption-2023", "title": "Pattern completion and disruption characterize contextual modulation in mouse visual cortex", "authors": ["Jiakun Fu", "Suhas Shrinivasan", "Kayla Ponder", "Taliah Muhammad", "Zhuokun Ding"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2023.03.13.532473", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Vision is fundamentally context-dependent, with neuronal responses influenced not just by local features but also by surrounding contextual information. In the visual cortex, studies using simple grat", "doi": "10.1101/2023.03.13.532473"}
+{"id": "alanca-active-learning-2024", "title": "ALANCA: Active Learning Guided Adversarial Attacks for Code Comprehension on Diverse Pre-trained and Large Language Models", "authors": ["Dexin Liu", "Shikun Zhang"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/SANER60148.2024.00067", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural code models have demonstrated their efficacy across a range of code comprehension tasks, including vulnerability detection, code classification, automatic code summarization, completion, clone ", "doi": "10.1109/SANER60148.2024.00067"}
+{"id": "ddit-semantic-scene-2023", "title": "DDIT: Semantic Scene Completion via Deformable Deep Implicit Templates", "authors": ["Haoang Li", "Jinhui Dong", "Binghui Wen", "Ming Gao", "Tianyu Huang"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICCV51070.2023.02001", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scene reconstructions are often incomplete due to occlusions and limited viewpoints. There have been efforts to use semantic information for scene completion. However, the completed shapes may be roug", "doi": "10.1109/ICCV51070.2023.02001"}
+{"id": "probing-numeracy-logic-2023", "title": "Probing Numeracy and Logic of Language Models of Code", "authors": ["Razan Baltaji", "Parth Thakkar"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/InteNSE59150.2023.00006", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Machine learning techniques have found a widespread use in the software engineering community. In particular, language models (LMs) trained on code form the backbone of a majority of these application", "doi": "10.1109/InteNSE59150.2023.00006"}
+{"id": "unleashing-power-compiler-2022", "title": "Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings", "authors": ["Zongjie Li", "Pingchuan Ma", "Huaijin Wang", "Shuai Wang", "Qiyi Tang"], "year": 2022, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2204.09191", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, ", "arxiv_id": "2204.09191", "doi": "10.1145/3510003.3510217"}
+{"id": "graph-generation-recurrent-2023", "title": "Graph Generation with Recurrent and Graph Neural Networks", "authors": ["Xikun Huang", "Yangyang Li", "Chaoqun Fei", "Chuanqing Wang"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1109/MedAI59581.2023.00033", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Graph generation has applications as diverse as drug discovery, materials design, and code completion. In this paper, we propose a novel auto-regressive graph generation model, where graph generation ", "doi": "10.1109/MedAI59581.2023.00033"}
+{"id": "implicit-shape-completion-2022", "title": "Implicit Shape Completion via Adversarial Shape Priors", "authors": ["Abhishek Saroha", "Marvin Eisenberger", "Tarun Yenamandra", "Daniel Cremers"], "year": 2022, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2204.10060", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a novel neural implicit shape method for partial point cloud completion. To that end, we combine a conditional Deep-SDF architecture with learned, adversarial shape priors. More specificall", "arxiv_id": "2204.10060", "doi": "10.48550/arXiv.2204.10060"}
+{"id": "interpretable-linear-models-2022", "title": "Interpretable linear models for predicting security vulnerabilities in source code", "authors": ["T. Hocking", "Joseph R. Barr", "Tyler Thatcher"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TransAI54797.2022.00032", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In our increasingly digital and networked society, computer code is responsible for many essential tasks. There are an increasing number of attacks on such code using unpatched security vulnerabilitie", "doi": "10.1109/TransAI54797.2022.00032"}
+{"id": "learning-represent-programs-2022", "title": "Learning to Represent Programs with Code Hierarchies", "authors": ["Minh Huynh Nguyen", "Nghi D. Q. Bui"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2205.15479", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2205.15479"}
+{"id": "ssa-data-flow-2022", "title": "SSA Data Flow Information for Semantic Code Tasks", "authors": ["T. Stocker", "Peter Belcák", "Florian Grötschla Prof", "Dr. Roger Wattenhofer"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/766a49ba271d8b07b23ca74c39313a5f908c0fff", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "drf-llmagent-dynamic-2025", "title": "DRF: LLM-AGENT Dynamic Reputation Filtering Framework", "authors": ["Yuwei Lou", "Hao Hu", "Shaocong Ma", "Zongfei Zhang", "Liang Wang"], "year": 2025, "venue": "International Conference on Neural Information Processing", "source_url": "https://arxiv.org/abs/2509.05764", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the evolution of generative AI, multi - agent systems leveraging large - language models(LLMs) have emerged as a powerful tool for complex tasks. However, these systems face challenges in quantif", "arxiv_id": "2509.05764", "doi": "10.48550/arXiv.2509.05764"}
+{"id": "diffusionsdf-conditional-generative-2022", "title": "Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions", "authors": ["Gene Chou", "Yuval Bahat", "Felix Heide"], "year": 2022, "venue": "IEEE International Conference on Computer Vision", "source_url": "https://arxiv.org/abs/2211.13757", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Probabilistic diffusion models have achieved state-of-the-art results for image synthesis, inpainting, and text-to-image tasks. However, they are still in the early stages of generating complex 3D sha", "arxiv_id": "2211.13757", "doi": "10.1109/ICCV51070.2023.00215"}
+{"id": "automated-bug-detection-2025", "title": "Automated Bug Detection and Correction in Software Development using Machine Learning", "authors": ["Isabella Hoffman", "Nathaniel Brooks"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.65521/ijacte.v12i1.108", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software quality assurance is a critical aspect of modern software development, where timely detection and correction of bugs can significantly enhance reliability, security, and efficiency. Tradition", "doi": "10.65521/ijacte.v12i1.108"}
+{"id": "policybased-input-space-2025", "title": "Policy-Based Input Space Exploration to Find Worst-Case Inputs in Machine-Learning Based ARINC653 Applications", "authors": ["Bastian Luettig", "Yousif M. Elsheikh", "Yassine Akhiat", "Bjoern Annighoefer"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/DASC66011.2025.11257228", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern programming assistants have made their way into developers' daily workflows, offering features such as code completion, automated test and document generation. However, for safety-critical doma", "doi": "10.1109/DASC66011.2025.11257228"}
+{"id": "conco-optimizing-compilation-2025", "title": "ConCo: Optimizing Compilation of Concurrent Tensor Programs on Shared GPU", "authors": ["Jiamin Lu", "Jingwei Sun", "Yunlong Xu", "Peng Sun", "Guangzhong Sun"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3721145.3735113", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Serving multiple inference tasks of deep neural networks (DNNs) concurrently on a shared GPU is an established method for maximizing hardware resource. Although DNN compilers effectively generate opti", "doi": "10.1145/3721145.3735113"}
+{"id": "comback-versatile-dataset-2024", "title": "ComBack: A Versatile Dataset for Enhancing Compiler Backend Development Efficiency", "authors": ["Ming Zhong", "Fang Lyu", "Lulin Wang", "Hongna Geng", "Lei Qiu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.52202/079017-3567", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Compiler backends are tasked with generating executable machine code for processors. With the proliferation of diverse processors, it is imperative for programmers to tailor specific compiler backends", "doi": "10.52202/079017-3567"}
+{"id": "superposed-decoding-multiple-2024", "title": "Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass", "authors": ["Ethan Shen", "Alan Fan", "Sarah M. Pratt", "J. Park", "Matthew Wallingford"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2405.18400", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood", "arxiv_id": "2405.18400", "doi": "10.48550/arXiv.2405.18400"}
+{"id": "understanding-impact-data-2024", "title": "Towards understanding the impact of data bugs on deep learning models in software engineering", "authors": ["Mehil B. Shah", "M. Rahman", "Foutse Khomh"], "year": 2024, "venue": "Empirical Software Engineering", "source_url": "https://arxiv.org/abs/2411.12137", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Deep learning (DL) techniques have achieved significant success in various software engineering tasks (e.g., code completion by Copilot). However, DL systems are prone to bugs from many sources, inclu", "arxiv_id": "2411.12137", "doi": "10.1007/s10664-025-10717-y"}
+{"id": "osiris-systolic-approach-2024", "title": "Osiris: A Systolic Approach to Accelerating Fully Homomorphic Encryption", "authors": ["Austin Ebel", "Brandon Reagen"], "year": 2024, "venue": "ACM Transactions on Architecture and Code Optimization (TACO)", "source_url": "https://arxiv.org/abs/2408.09593", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper we demonstrate how fully homomorphic encryption (FHE) can be accelerated using a systolic-inspired architecture. We start by analyzing FHE algorithms and then design dedicated systolic o", "arxiv_id": "2408.09593", "doi": "10.1145/3788287"}
+{"id": "jarvis-aienhanced-desktop-2024", "title": "Jarvis: AI-Enhanced Desktop Virtual Assistant", "authors": ["Abhay Gupta", "Ekta Bhardwaj", "Neeraj Sirawag", "Suvrat Pandey", "Riya Mehta"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICAIQSA64000.2024.10882426", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents the development and evaluation of “Jarvis: AI-Enhanced Desktop Virtual Assistant,” a multi- functional system designed to automate daily tasks through voice commands and natural la", "doi": "10.1109/ICAIQSA64000.2024.10882426"}
+{"id": "aiaugmented-framework-enable-2024", "title": "AI-augmented Framework to Enable Process Awareness in Collaborative Teams", "authors": ["Minh Khoi Nguyen", "H. Tran", "Ileana Ober", "Razan Abualsaud"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IJCNN60899.2024.10650888", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Process Management Systems (PMS) offer effective means to coordinate tasks for various teams involved in complex projects. However, in practice, participants execute their tasks using applications wit", "doi": "10.1109/IJCNN60899.2024.10650888"}
+{"id": "high-performance-qnns-2024", "title": "Towards High Performance QNNs via Distribution-Based CNOT Gate Reduction", "authors": ["Manojna Sistla", "Yiding Liu", "Xin Fu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3695872", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Quantum Neural Networks (QNNs) are one of the most promising applications that can be implemented on NISQ-era quantum computers. In this study, we observe that QNNs often suffer from gate redundancy, ", "doi": "10.1145/3695872"}
+{"id": "new-weighted-bert-2023", "title": "New weighted BERT features and multi-CNN models to enhance the performance of MOOC posts classification", "authors": ["Mohamed A. El-Rashidy", "A. Farouk", "Nawal A. El-Fishawy", "H. Aslan", "Nabila A. Khodeir"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s00521-023-08673-z", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Learning is an essential requirement for humans, and its means have evolved. Ten years ago, Massive Open Online Courses (MOOCs) were introduced, attracting many interests and learners. MOOCs provide f", "doi": "10.1007/s00521-023-08673-z"}
+{"id": "parametric-surface-constrained-2023", "title": "Parametric Surface Constrained Upsampler Network for Point Cloud", "authors": ["Pingping Cai", "Zhenyao Wu", "Xinyi Wu", "Song Wang"], "year": 2023, "venue": "AAAI Conference on Artificial Intelligence", "source_url": "https://arxiv.org/abs/2303.08240", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Designing a point cloud upsampler, which aims to generate a clean and dense point cloud given a sparse point representation, is a fundamental and challenging problem in computer vision. A line of atte", "arxiv_id": "2303.08240", "doi": "10.1609/aaai.v37i1.25097"}
+{"id": "heat-hyperedge-attention-2022", "title": "HEAT: Hyperedge Attention Networks", "authors": ["Dobrik Georgiev", "Marc Brockschmidt", "Miltiadis Allamanis"], "year": 2022, "venue": "Trans. Mach. Learn. Res.", "source_url": "https://arxiv.org/abs/2201.12113", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Learning from structured data is a core machine learning task. Commonly, such data is represented as graphs, which normally only consider (typed) binary relationships between pairs of nodes. This is a", "arxiv_id": "2201.12113"}
+{"id": "can-we-automatically-2022", "title": "Can We Automatically Fix Bugs by Learning Edit Operations?", "authors": ["Aidan Connor", "Aaron Harris", "Nathan Cooper", "D. Poshyvanyk"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1109/saner53432.2022.00096", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: There has been much work done in the area of automated program repair, specifically through using machine learning methods to correct buggy code. Whereas some degree of success has been attained by th", "doi": "10.1109/saner53432.2022.00096"}
+{"id": "l-everaging-n-2022", "title": "L EVERAGING N EURAL L ANGUAGE M ODEL FOR A U - TOMATED C ODE Q UALITY I SSUE I DENTIFICATION", "authors": ["Unknown"], "year": 2022, "venue": "Unknown", "source_url": "https://www.semanticscholar.org/paper/62af36fa3202d998a895187b439d54ce4c2f0559", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar."}
+{"id": "lofi-sketch-large-2022", "title": "LoFi Sketch: A Large Scale Dataset of Smartphone Low Fidelity Sketches", "authors": ["Vinoth Pandian Sermuga Pandian", "Abdullah Shams", "Sarah Suleri", "M. Jarke"], "year": 2022, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3491101.3519624", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent research on automating the transformation of low fidelity (LoFi) sketches to code using Deep Neural Networks require a large-scale dataset for generalizable results. This paper introduces the L", "doi": "10.1145/3491101.3519624"}
+{"id": "language-models-code-2024", "title": "Language Models for Code Completion: A Practical Evaluation", "authors": ["M. Izadi", "J. Katzy", "Tim van Dam", "Marc Otten", "R. Popescu"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2402.16197", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Transformer-based language models for automatic code completion have shown great promise so far, yet the evaluation of these models rarely uses real data. This study provides both quantitative and qua", "arxiv_id": "2402.16197", "doi": "10.1145/3597503.3639138"}
+{"id": "rlcoder-reinforcement-learning-2024", "title": "RLCoder: Reinforcement Learning for Repository-Level Code Completion", "authors": ["Yanlin Wang", "Yanlin Wang", "Daya Guo", "Jiachi Chen", "Ruikai Zhang"], "year": 2024, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2407.19487", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrievalaugmented generation strat", "arxiv_id": "2407.19487", "doi": "10.1109/ICSE55347.2025.00014"}
+{"id": "repocoder-repositorylevel-code-2023", "title": "RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation", "authors": ["Fengji Zhang", "B. Chen", "Yue Zhang", "Jin Liu", "Daoguang Zan"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2303.12570", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to util", "arxiv_id": "2303.12570", "doi": "10.48550/arXiv.2303.12570"}
+{"id": "evaluating-language-models-2025", "title": "Evaluating Language Models for Computer Graphics Code Completion", "authors": ["Jan Kels", "Abdelhalim Hafedh Dahou", "Brigitte Mathiak"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/LLM4Code66737.2025.00017", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluation benchmarks are essential for developing and training language models, providing both comparison and optimization targets. Existing code completion benchmarks, often based on standalone Pyth", "doi": "10.1109/LLM4Code66737.2025.00017"}
+{"id": "nonautoregressive-linelevel-code-2024", "title": "Non-Autoregressive Line-Level Code Completion", "authors": ["Fang Liu", "Zhiyi Fu", "Ge Li", "Zhi Jin", "Hui Liu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3649594", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software developers frequently use code completion tools to accelerate software development by suggesting the following code elements. Researchers usually employ AutoRegressive (AR) decoders to comple", "doi": "10.1145/3649594"}
+{"id": "longcoder-longrange-pretrained-2023", "title": "LongCoder: A Long-Range Pre-trained Language Model for Code Completion", "authors": ["Daya Guo", "Canwen Xu", "Nan Duan", "Jian Yin", "Julian McAuley"], "year": 2023, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2306.14893", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a ", "arxiv_id": "2306.14893", "doi": "10.48550/arXiv.2306.14893"}
+{"id": "reacc-retrievalaugmented-code-2022", "title": "ReACC: A Retrieval-Augmented Code Completion Framework", "authors": ["Shuai Lu", "Nan Duan", "Hojae Han", "Daya Guo", "Seung-won Hwang"], "year": 2022, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2203.07722", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development. Recent work has proved that statistical language", "arxiv_id": "2203.07722", "doi": "10.48550/arXiv.2203.07722"}
+{"id": "cocomic-code-completion-2022", "title": "CoCoMIC: Code Completion by Jointly Modeling In-file and Cross-file Context", "authors": ["Yangruibo Ding", "Zijian Wang", "Wasi Uddin Ahmad", "M. Ramanathan", "Ramesh Nallapati"], "year": 2022, "venue": "International Conference on Language Resources and Evaluation", "source_url": "https://arxiv.org/abs/2212.10007", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore t", "arxiv_id": "2212.10007", "doi": "10.48550/arXiv.2212.10007"}
+{"id": "codefill-multitoken-code-2022", "title": "CodeFill: Multi-token Code Completion by Jointly learning from Structure and Naming Sequences", "authors": ["M. Izadi", "Roberta Gismondi", "Georgios Gousios"], "year": 2022, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2202.06689", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion is an essential feature of IDEs, yet current auto-completers are restricted to either grammar-based or NLP-based single token completions. Both approaches have significant draw-backs: ", "arxiv_id": "2202.06689", "doi": "10.1145/3510003.3510172"}
+{"id": "cctest-testing-repairing-2022", "title": "CCTEST: Testing and Repairing Code Completion Systems", "authors": ["Zongjie Li", "Chaozheng Wang", "Zhibo Liu", "Hao Wang", "Shuai Wang"], "year": 2022, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2208.08289", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion, a highly valuable topic in the software development domain, has been increasingly promoted for use by recent advances in large language models (LLMs). To date, visible LLM-based code ", "arxiv_id": "2208.08289", "doi": "10.1109/ICSE48619.2023.00110"}
+{"id": "simplified-multiview-graph-2024", "title": "Simplified multi-view graph neural network for multilingual knowledge graph completion", "authors": ["Bingbing Dong", "Chenyang Bu", "Yi Zhu", "Shengwei Ji", "Xindong Wu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s11704-024-3577-3", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s11704-024-3577-3"}
+{"id": "knowledge-graph-completion-2024", "title": "Knowledge graph completion assisted graph neural network for spectrum resource optimization of UAV swarm", "authors": ["Y. Wang", "X. Liao", "G. Ye", "X. Zhu"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1088/1742-6596/2717/1/012012", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unmanned aerial vehicles (UAVs) are playing an increasing critical role in industrial, agricultural and military Scenario. However, the energy consuming cannot be ignored, which constrains the flying ", "doi": "10.1088/1742-6596/2717/1/012012"}
+{"id": "3dantc-3d-channel-2024", "title": "3DA-NTC: 3D Channel Attention Aided Neural Tensor Completion for Crowdsensing Data Inference", "authors": ["Xu Kang", "Zhiyang Jia", "Jia Jia", "Jiadong Ren"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IJCNN60899.2024.10650963", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Mobile crowdsensing is a promising scheme for performing large-scale urban monitoring, but it always faces the issue of unstable spatiotemporal coverage, which results in the incompletion of data coll", "doi": "10.1109/IJCNN60899.2024.10650963"}
+{"id": "convntc-convolutional-neural-2024", "title": "ConvNTC: Convolutional neural tensor completion for predicting the disease-related miRNA pairs and cell-related drug pairs", "authors": ["Pei Liu", "Xiao Liang", "Yue Li", "Jiawei Luo"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1101/2024.10.21.619432", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1101/2024.10.21.619432"}
+{"id": "shallowbkgc-bertenhanced-shallow-2024", "title": "ShallowBKGC: a BERT-enhanced shallow neural network model for knowledge graph completion", "authors": ["Ningning Jia", "Cuiyou Yao"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.7717/peerj-cs.2058", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Knowledge graph completion aims to predict missing relations between entities in a knowledge graph. One of the effective ways for knowledge graph completion is knowledge graph embedding. However, exis", "doi": "10.7717/peerj-cs.2058"}
+{"id": "neural-common-neighbor-2023", "title": "Neural Common Neighbor with Completion for Link Prediction", "authors": ["Xiyuan Wang", "Hao-Ting Yang", "Muhan Zhang"], "year": 2023, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2302.00890", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this work, we propose a novel link prediction model and further boost it by studying graph incompleteness. First, we introduce MPNN-then-SF, an innovative architecture leveraging structural feature", "arxiv_id": "2302.00890", "doi": "10.48550/arXiv.2302.00890"}
+{"id": "syntaxaware-onthefly-code-2022", "title": "Syntax-Aware On-the-Fly Code Completion", "authors": ["Wannita Takerngsaksiri", "C. Tantithamthavorn", "Yuan-Fang Li"], "year": 2022, "venue": "Information and Software Technology", "source_url": "https://arxiv.org/abs/2211.04673", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) i", "arxiv_id": "2211.04673", "doi": "10.48550/arXiv.2211.04673"}
+{"id": "citationgrounded-code-comprehension-2025", "title": "Citation-Grounded Code Comprehension: Preventing LLM Hallucination Through Hybrid Retrieval and Graph-Augmented Context", "authors": ["Jahidul Arafat"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.12117", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models have become essential tools for code comprehension, enabling developers to query unfamiliar codebases through natural language interfaces. However, LLM hallucination, generating ", "arxiv_id": "2512.12117", "doi": "10.48550/arXiv.2512.12117"}
+{"id": "llm-hallucination-detection-2025", "title": "LLM Hallucination Detection: HSAD", "authors": ["Jinxin Li", "Gang Tu", "Junjie Hu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.23580", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Although Large Language Models have demonstrated powerful capabilities in a wide range of tasks such as language understanding and code generation, the frequent occurrence of hallucinations during the", "arxiv_id": "2509.23580", "doi": "10.48550/arXiv.2509.23580"}
+{"id": "treecut-synthetic-unanswerable-2025", "title": "TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation", "authors": ["Jialin Ouyang"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.13442", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) now achieve near-human performance on standard math word problem benchmarks (e.g., GSM8K), yet their true reasoning ability remains disputed. A key concern is that models ", "arxiv_id": "2502.13442", "doi": "10.48550/arXiv.2502.13442"}
+{"id": "eliminating-hallucinationinduced-errors-2025", "title": "Eliminating Hallucination-Induced Errors in LLM Code Generation with Functional Clustering", "authors": ["C. Ravuri", "Saman Amarasinghe"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.11021", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern code-generation LLMs can already solve a large fraction of programming problems, yet they still hallucinate subtle bugs that make their outputs unsafe for autonomous deployment. We present func", "arxiv_id": "2506.11021", "doi": "10.48550/arXiv.2506.11021"}
+{"id": "hallucination-llmbased-code-2025", "title": "Hallucination in LLM-Based Code Generation: An Automotive Case Study", "authors": ["Marc Pavel", "Nenad Petrovic", "Lukasz Mazur", "Vahid Zolfaghari", "F. Pan"], "year": 2025, "venue": "2025 3rd International Conference on Foundation and Large Language Models (FLLM)", "source_url": "https://arxiv.org/abs/2508.11257", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown significant potential in automating code generation tasks offering new opportunities across software engineering domains. However, their practical application r", "arxiv_id": "2508.11257", "doi": "10.1109/FLLM67465.2025.11391125"}
+{"id": "beyond-functional-correctness-2024", "title": "Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code", "authors": ["Fang Liu", "Yang Liu", "Lin Shi", "Zhen Yang", "Li Zhang"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2404.00971", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rise of Large Language Models (LLMs) has significantly advanced various applications on software engineering tasks, particularly in code generation. Despite the promising performance, LLMs are pro", "arxiv_id": "2404.00971"}
+{"id": "advancing-llmgenerated-code-2026", "title": "Advancing LLM-Generated Code Reliability: A Hybrid Approach for Hallucination Detection", "authors": ["Bo Yang", "Jiayi Dang", "Huai Liu", "Zhi Jin"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/TSE.2025.3640641", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The increasing use of Large Language Models (LLMs) for writing code has raised important concerns about “code hallucinations.” These occur when the generated code looks correct in terms of its structu", "doi": "10.1109/TSE.2025.3640641"}
+{"id": "importing-phantoms-measuring-2025", "title": "Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities", "authors": ["Arjun Krishna", "Erick Galinkin", "Leon Derczynski", "Jeffrey Martin"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2501.19012", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have become an essential tool in the programmer's toolkit, but their tendency to hallucinate code can be used by malicious actors to introduce vulnerabilities to broad swa", "arxiv_id": "2501.19012", "doi": "10.48550/arXiv.2501.19012"}
+{"id": "llmfree-multidimensional-benchmark-2023", "title": "An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation", "authors": ["Junyang Wang", "Yuhang Wang", "Guohai Xu", "Jing Zhang", "Yu Gu"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2311.07397", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequence", "arxiv_id": "2311.07397", "doi": "10.48550/arXiv.2311.07397"}
+{"id": "leveraging-llms-legacy-2025", "title": "Leveraging LLMs for Legacy Code Modernization: Evaluation of LLM-Generated Documentation", "authors": ["Colin Diggs", "Michael Doyle", "Amit Madan", "Eric O. Scott", "Emily Escamilla"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/LLM4Code66737.2025.00027", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Legacy software systems, written in outdated languages like MUMPS and mainframe assembly, pose challenges in efficiency, maintenance, staffing, and security. While LLMs offer promise for modernizing t", "doi": "10.1109/LLM4Code66737.2025.00027"}
+{"id": "fools-certain-wise-2025", "title": "The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion", "authors": ["Zoe Kotti", "Konstantina Dritsa", "D. Spinellis", "Panos Louridas"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.16131", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code completion entails the task of providing missing tokens given a surrounding context. It can boost developer productivity while providing a powerful code discovery tool. Following the Large Langua", "arxiv_id": "2508.16131", "doi": "10.48550/arXiv.2508.16131"}
+{"id": "ucsc-at-semeval2025-2025", "title": "UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output", "authors": ["Sicong Huang", "Jin He", "Shiyu Huang", "Karthik Raja Anandan", "Arkajyoti Chakraborty"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.03030", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucinations pose a significant challenge for large language models when answering knowledge-intensive queries. As LLMs become more widely adopted, it is crucial not only to detect if hallucinations", "arxiv_id": "2505.03030", "doi": "10.48550/arXiv.2505.03030"}
+{"id": "mitigating-code-llm-2024", "title": "On Mitigating Code LLM Hallucinations with API Documentation", "authors": ["Nihal Jain", "Robert Kwiatkowski", "Baishakhi Ray", "M. K. Ramanathan", "Varun Kumar"], "year": 2024, "venue": "2025 IEEE/ACM 47th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)", "source_url": "https://arxiv.org/abs/2407.09726", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this study, we address the issue of API hallucinations in various software engineering contexts. We introduce CloudAPIBench, a new benchmark designed to measure API hallucination occurrences. Cloud", "arxiv_id": "2407.09726", "doi": "10.1109/ICSE-SEIP66354.2025.00027"}
+{"id": "quantifying-rag-advantage-2025", "title": "Quantifying the RAG Advantage: A Multi-Metric Benchmark for LLM-based Code Generation", "authors": ["Gabriel Souza Baggio", "G. M. Lunardi", "G. M. Machado", "José Palazzo Moreira de Oliveira"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.5753/sbbd.2025.247760", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The recent advancement of Large Language Models (LLMs) has demonstrated remarkable capabilities in solving programming challenges. However, despite their proficiency, LLMs often suffer from hallucinat", "doi": "10.5753/sbbd.2025.247760"}
+{"id": "code-hallucination-2024", "title": "Code Hallucination", "authors": ["Mirza Masfiqur Rahman", "Ashish Kundu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.04831", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative models such as large language models are extensively used as code copilots and for whole program generation. However, the programs they generate often have questionable correctness, authent", "arxiv_id": "2407.04831", "doi": "10.48550/arXiv.2407.04831"}
+{"id": "leveraging-llms-legacy-2024", "title": "Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation", "authors": ["Colin Diggs", "Michael Doyle", "Amit Madan", "Siggy Scott", "Emily Escamilla"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.14971", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Legacy software systems, written in outdated languages like MUMPS and mainframe assembly, pose challenges in efficiency, maintenance, staffing, and security. While LLMs offer promise for modernizing t", "arxiv_id": "2411.14971", "doi": "10.48550/arXiv.2411.14971"}
+{"id": "hallujudge-referencefree-hallucination-2026", "title": "HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation", "authors": ["C. Tantithamthavorn", "H. Lin", "Patanamon Thongtanunam", "Wachiraphan Charoenwet", "Minwoo Jeong"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.19072", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language models (LLMs) have shown strong capabilities in code review automation, such as review comment generation, yet they suffer from hallucinations -- where the generated review comments are", "arxiv_id": "2601.19072", "doi": "10.48550/arXiv.2601.19072"}
+{"id": "insights-from-verification-2025", "title": "Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback", "authors": ["Ning Wang", "Bingkun Yao", "Jie Zhou", "Yuchen Hu", "Xi Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.15804", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a signifi", "arxiv_id": "2504.15804", "doi": "10.48550/arXiv.2504.15804"}
+{"id": "testart-improving-llmbased-2024", "title": "TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration", "authors": ["Siqi Gu", "Quanjun Zhang", "Kecheng Li", "Chunrong Fang", "Fangyuan Tian"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2408.03095", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Unit testing is crucial for detecting bugs in individual program units but consumes time and effort. Recently, large language models (LLMs) have demonstrated remarkable capabilities in generating unit", "arxiv_id": "2408.03095"}
+{"id": "small-agent-can-2024", "title": "Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector", "authors": ["Xiaoxue Cheng", "Junyi Li", "Wayne Xin Zhao", "Hongzhi Zhang", "Fuzheng Zhang"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2406.11277", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4. In this paper, we propose an autonomous ", "arxiv_id": "2406.11277", "doi": "10.48550/arXiv.2406.11277"}
+{"id": "hallucination-consensus-multiagent-2025-2", "title": "Hallucination to Consensus: Multi-Agent LLMs for End-to-End Test Generation with Accurate Oracles", "authors": ["Qinghua Xu", "Guancheng Wang", "Lionel C. Briand", "Kui Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2506.02943", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2506.02943"}
+{"id": "aixamine-simplified-llm-2025", "title": "aiXamine: Simplified LLM Safety and Security", "authors": ["Fatih Deniz", "Dorde Popovic", "Yazan Boshmaf", "Euisuh Jeong", "M. Ahmad"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.14985", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Evaluating Large Language Models (LLMs) for safety and security remains a complex task, often requiring users to navigate a fragmented landscape of ad hoc benchmarks, datasets, metrics, and reporting ", "arxiv_id": "2504.14985", "doi": "10.48550/arXiv.2504.14985"}
+{"id": "vulsolver-vulnerability-detection-2025", "title": "VulSolver: Vulnerability Detection via LLM-Driven Constraint Solving", "authors": ["Xiang Li", "Yue Su", "Jiahao Liu", "Zhiwei Lin", "Yunqing Hou"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.00882", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Traditional vulnerability detection methods rely heavily on predefined rule matching, which often fails to capture vulnerabilities accurately. With the rise of large language models (LLMs), leveraging", "arxiv_id": "2509.00882", "doi": "10.48550/arXiv.2509.00882"}
+{"id": "osiris-lightweight-opensource-2025", "title": "Osiris: A Lightweight Open-Source Hallucination Detection System", "authors": ["Alexander Shan", "John Bauer", "Christopher D. Manning"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.04844", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-Augmented Generation (RAG) systems have gained widespread adoption by application builders because they leverage sources of truth to enable Large Language Models (LLMs) to generate more fact", "arxiv_id": "2505.04844", "doi": "10.48550/arXiv.2505.04844"}
+{"id": "landscape-llmpowered-geoanalysis-2025", "title": "The Landscape of LLM-Powered Geoanalysis", "authors": ["Thamir M. Qadah", "Muhammad Hammad", "Emad Felemban"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3627398", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of large language models (LLMs) has opened up new possibilities for geoanalysis, spanning geospatial and geospatio-temporal data analysis, making it easily accessible to non-GIS experts ", "doi": "10.1109/ACCESS.2025.3627398"}
+{"id": "llm-agents-implement-2025", "title": "LLM Agents Implement an NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators", "authors": ["Mateusz Lango", "Ondrej Dusek"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2512.18360", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We present a novel neurosymbolic framework for RDF-to-text generation, in which the model is\"trained\"through collaborative interactions among multiple LLM agents rather than traditional backpropagatio", "arxiv_id": "2512.18360", "doi": "10.18653/v1/2025.emnlp-industry.142"}
+{"id": "a2hcoder-llmdriven-coding-2025", "title": "A2HCoder: An LLM-Driven Coding Agent for Hierarchical Algorithm-to-HDL Translation", "authors": ["Jie Lei", "Ruofan Jia", "J. A. Zhang", "Hao Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2508.10904", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2508.10904"}
+{"id": "pelican-correcting-hallucination-2024", "title": "Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification", "authors": ["Pritish Sahu", "Karan Sikka", "Ajay Divakaran"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2407.02352", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Visual Language Models (LVLMs) struggle with hallucinations in visual instruction following task(s). These issues hinder their trustworthiness and real-world applicability. We propose Pelican – ", "arxiv_id": "2407.02352", "doi": "10.48550/arXiv.2407.02352"}
+{"id": "distinguishing-ignorance-from-2024", "title": "Distinguishing Ignorance from Error in LLM Hallucinations", "authors": ["Adi Simhi", "Jonathan Herzig", "Idan Szpektor", "Yonatan Belinkov"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2410.22071", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are susceptible to hallucinations -- factually incorrect outputs -- leading to a large body of work on detecting and mitigating such cases. We argue that it is important t", "arxiv_id": "2410.22071", "doi": "10.48550/arXiv.2410.22071"}
+{"id": "itergen-iterative-semanticaware-2024", "title": "IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking", "authors": ["Shubham Ugare", "Rohan Gumaste", "Tarun Suresh", "Gagandeep Singh", "Sasa Misailovic"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2410.07295", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are widely used for tasks such as natural language and code generation, but their outputs often suffer from issues like hallucination, toxicity, and incorrect results. Cur", "arxiv_id": "2410.07295"}
+{"id": "llmenhanced-learning-environments-2024", "title": "LLM-Enhanced Learning Environments for CS: Exploring Data Structures and Algorithms with Gurukul", "authors": ["Ashwin Rachha", "Mohammed Seyam"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1109/FIE61694.2024.10893211", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In this Innovative Practice full paper, we introduce Gurukul, an innovative coding platform designed to support teaching Data Structures and Algorithm (DSA) course by integrating advanced Large Langua", "doi": "10.1109/FIE61694.2024.10893211"}
+{"id": "tokenguard-tokenlevel-hallucination-2026", "title": "Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding", "authors": ["Yifan Zhu", "Huiqiang Rong", "Hao Luo"], "year": 2026, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2601.21969", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) often hallucinate, generating content inconsistent with the input. Retrieval-Augmented Generation (RAG) and Reinforcement Learning with Human Feedback (RLHF) can mitigate ", "arxiv_id": "2601.21969", "doi": "10.48550/arXiv.2601.21969"}
+{"id": "understanding-llm-responses-2026", "title": "Understanding LLM Responses in Programming Tasks: A Study on Prompt Quality, Personalization, and RAG Limitation", "authors": ["Meilyna Hutajulu", "Ioka Purba", "M. Siahaan", "Samuel Situmeang", "Mario Simaremare"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.32664/icobits.v1.107", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Programming is a fundamental skill in the field of technology and education in the digital era. However, students’ understanding of programming varies significantly between senior and junior learners.", "doi": "10.32664/icobits.v1.107"}
+{"id": "jarvis-multiagent-code-2025", "title": "JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation", "authors": ["G. Pasandi", "K. Kunal", "Varun Tej", "Kunjal Shan", "Hanfei Sun"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.14978", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents JARVIS, a novel multi-agent framework that leverages Large Language Models (LLMs) and domain expertise to generate high-quality scripts for specialized Electronic Design Automation", "arxiv_id": "2505.14978", "doi": "10.48550/arXiv.2505.14978"}
+{"id": "deepvulhunter-enhancing-code-2025", "title": "DeepVulHunter: enhancing the code vulnerability detection capability of LLMs through multi-round analysis", "authors": ["Yutong Jiao", "Jiaxuan Han", "Cheng Huang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1007/s10844-025-00982-0", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/s10844-025-00982-0"}
+{"id": "codemirage-hallucinations-code-2024", "title": "CodeMirage: Hallucinations in Code Generated by Large Language Models", "authors": ["Vibhor Agarwal", "Yulong Pei", "Salwa Alamir", "Xiaomo Liu"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2408.08333", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation. However, LLMs are prone to generate hallucinations, i.e., they generate text which sounds pla", "arxiv_id": "2408.08333", "doi": "10.48550/arXiv.2408.08333"}
+{"id": "novel-framework-educational-2025", "title": "A novel framework for educational Q&A: Leveraging RAG and Code Interpreters for knowledge retrieval and logical computation", "authors": ["Jin Lu", "Ji Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1371/journal.pone.0337361", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper presents a novel approach to enhancing educational question-answering (Q&A) systems by combining Retrieval-Augmented Generation (RAG) with Large Language Model (LLM) Code Interpreters. Trad", "doi": "10.1371/journal.pone.0337361"}
+{"id": "grounded-ai-code-2025", "title": "Grounded AI for Code Review: Resource-Efficient Large-Model Serving in Enterprise Pipelines", "authors": ["Sayan Mandal", "Hua Jiang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.10290", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automated code review adoption lags in compliance-heavy settings, where static analyzers produce high-volume, low-rationale outputs, and naive LLM use risks hallucination and incurring cost overhead. ", "arxiv_id": "2510.10290", "doi": "10.48550/arXiv.2510.10290"}
+{"id": "applying-rlaif-code-2024", "title": "Applying RLAIF for Code Generation with API-usage in Lightweight LLMs", "authors": ["Sujan Dutta", "Sayantan Mahinder", "R. Anantha", "Bortik Bandyopadhyay"], "year": 2024, "venue": "NLRSE", "source_url": "https://arxiv.org/abs/2406.20060", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains, including mitigating harm in LLM outputs, enhancing text summarization, and mathematical ", "arxiv_id": "2406.20060", "doi": "10.48550/arXiv.2406.20060"}
+{"id": "distilling-desired-comments-2024", "title": "Distilling Desired Comments for Enhanced Code Review with Large Language Models", "authors": ["Yongda Yu", "Lei Zhang", "G. Rong", "Haifeng Shen", "Jiahao Zhang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.20340", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: There has been a growing interest in using Large Language Models (LLMs) for code review thanks to their proven proficiency in code comprehension. The primary objective of most review scenarios is to g", "arxiv_id": "2412.20340", "doi": "10.48550/arXiv.2412.20340"}
+{"id": "automated-unit-test-2024", "title": "Automated Unit Test Improvement using Large Language Models at Meta", "authors": ["N. Alshahwan", "Jubin Chheda", "Anastasia Finogenova", "Beliz Gokkaya", "Mark Harman"], "year": 2024, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2402.09171", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper describes Meta’s TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of fi", "arxiv_id": "2402.09171", "doi": "10.1145/3663529.3663839"}
+{"id": "adaplanner-adaptive-planning-2023", "title": "AdaPlanner: Adaptive Planning from Feedback with Language Models", "authors": ["Haotian Sun", "Yuchen Zhuang", "Lingkai Kong", "Bo Dai", "Chao Zhang"], "year": 2023, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2305.16653", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. However, most existing methods either take actions greedily w", "arxiv_id": "2305.16653", "doi": "10.48550/arXiv.2305.16653"}
+{"id": "comprehensive-taxonomy-hallucinations-2025", "title": "A comprehensive taxonomy of hallucinations in Large Language Models", "authors": ["Manuel Cossio"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.01781", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have revolutionized natural language processing, yet their propensity for hallucination, generating plausible but factually incorrect or fabricated content, remains a crit", "arxiv_id": "2508.01781", "doi": "10.48550/arXiv.2508.01781"}
+{"id": "grapharena-evaluating-exploring-2024", "title": "GraphArena: Evaluating and Exploring Large Language Models on Graph Computation", "authors": ["Jianheng Tang", "Qifan Zhang", "Yuhan Li", "Nuo Chen", "Jia Li"], "year": 2024, "venue": "International Conference on Learning Representations", "source_url": "https://arxiv.org/abs/2407.00379", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The ``arms race'' of Large Language Models (LLMs) demands new benchmarks to examine their progresses. In this paper, we introduce GraphArena, a benchmarking tool designed to evaluate LLMs on real-worl", "arxiv_id": "2407.00379"}
+{"id": "uprise-universal-prompt-2023", "title": "UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation", "authors": ["Daixuan Cheng", "Shaohan Huang", "Junyu Bi", "Yu-Wei Zhan", "Jianfeng Liu"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2303.08518", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are popular for their impressive abilities, but the need for model-specific fine-tuning or task-specific prompt engineering can hinder their generalization. We propose UPR", "arxiv_id": "2303.08518", "doi": "10.48550/arXiv.2303.08518"}
+{"id": "schemaguided-scenegraph-reasoning-2025", "title": "Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System", "authors": ["Yiye Chen", "Harpreet Sawhney", "Nicholas Gyd'e", "Yanan Jian", "Jack Saunders"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.03450", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Scene graphs have emerged as a structured and serializable environment representation for grounded spatial reasoning with Large Language Models (LLMs). In this work, we propose SG^2, an iterative Sche", "arxiv_id": "2502.03450"}
+{"id": "openroad-agent-intelligent-2025", "title": "OpenROAD Agent: An Intelligent Self-Correcting Script Generator for OpenROAD", "authors": ["Bing-Yue Wu", "Utsav Sharma", "Austin Rovinski", "V. Chhabria"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICLAD65226.2025.00039", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are increasingly being used in various domains, including chip design. Recent works have demonstrated the effectiveness of LLMs in EDA tool script generation. However, the", "doi": "10.1109/ICLAD65226.2025.00039"}
+{"id": "beyond-static-pattern-2025", "title": "Beyond Static Pattern Matching? Rethinking Automatic Cryptographic API Misuse Detection in the Era of LLMs", "authors": ["Yifan Xia", "Zichen Xie", "Peiyu Liu", "Kangjie Lu", "Yan Liu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3728875", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While the automated detection of cryptographic API misuses has progressed significantly, its precision diminishes for intricate targets due to the reliance on manually defined patterns. Large Language", "doi": "10.1145/3728875"}
+{"id": "funasr-technical-report-2025", "title": "Fun-ASR Technical Report", "authors": ["Keyu An", "Yanni Chen", "Chong Deng", "Changfeng Gao", "Zhifu Gao"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2509.12508", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large", "arxiv_id": "2509.12508"}
+{"id": "identifying-mitigating-api-2025", "title": "Identifying and Mitigating API Misuse in Large Language Models", "authors": ["Terry Yue Zhuo", "Junda He", "Jiamou Sun", "Zhenchang Xing", "David Lo"], "year": 2025, "venue": "IEEE Transactions on Software Engineering", "source_url": "https://arxiv.org/abs/2503.22821", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: API misuse in code generated by large language models (LLMs) presents a serious and growing challenge in software development, as although LLMs demonstrate impressive code generation capabilities, the", "arxiv_id": "2503.22821", "doi": "10.48550/arXiv.2503.22821"}
+{"id": "tablezoomer-collaborative-agent-2025", "title": "TableZoomer: a collaborative agent framework for large-scale table question answering", "authors": ["Sishi Xiong", "Ziyang He", "Zhongjiang He", "Yu Zhao", "Changzai Pan"], "year": 2025, "venue": "Vicinagearth", "source_url": "https://arxiv.org/abs/2509.01312", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) have shown promise in the table question answering (TQA) task through prompt engineering, they face challenges in industrial applications, including structural heter", "arxiv_id": "2509.01312", "doi": "10.1007/s44336-025-00016-x"}
+{"id": "shield-suppressing-hallucinations-2025", "title": "SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense", "authors": ["Yiyang Huang", "Liang Shi", "Yitian Zhang", "Yi Xu", "Yun Fu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.16596", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Vision-Language Models (LVLMs) excel in diverse cross-modal tasks. However, object hallucination, where models produce plausible but inaccurate object descriptions, remains a significant challen", "arxiv_id": "2510.16596", "doi": "10.48550/arXiv.2510.16596"}
+{"id": "hallucinate-memorize-two-2025", "title": "Hallucinate or Memorize? The Two Sides of Probabilistic Learning in Large Language Models", "authors": ["Junichiro Niimi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.08877", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in citation recom", "arxiv_id": "2511.08877", "doi": "10.48550/arXiv.2511.08877"}
+{"id": "evolutionary-thoughts-integration-2025", "title": "Evolutionary thoughts: integration of large language models and evolutionary algorithms", "authors": ["Antonio Jimeno-Yepes", "Pieter Barnard"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.05756", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have unveiled remarkable capabilities in understanding and generating both natural language and code, but LLM reasoning is prone to hallucination and struggle with complex", "arxiv_id": "2505.05756", "doi": "10.48550/arXiv.2505.05756"}
+{"id": "enhancing-interpretability-ocular-2025", "title": "Enhancing Interpretability of Ocular Disease Diagnosis: A Zero-Shot Study of Multimodal Large Language Models.", "authors": ["Yating Pan", "Janna Hastings"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.3233/SHTI250910", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.3233/SHTI250910"}
+{"id": "funaudioasr-technical-report-2025", "title": "FunAudio-ASR Technical Report", "authors": ["Keyu An", "Yanni Chen", "Chong Deng", "Changfeng Gao", "Zhifu Gao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2509.12508", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2509.12508"}
+{"id": "hallucinations-bibliographic-recommendation-2025", "title": "Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy", "authors": ["Junichiro Niimi"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.25378", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in bibliographic ", "arxiv_id": "2510.25378", "doi": "10.48550/arXiv.2510.25378"}
+{"id": "large-language-models-2025-6", "title": "Large Language Models for Thematic Analysis in Healthcare Research: A Blinded Mixed-Methods Comparison with Human Analysts", "authors": ["C. Hill", "A. Dahil", "G. Simpson", "D. Hardisty", "J. Keast"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.64898/2025.12.25.25343031", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.64898/2025.12.25.25343031"}
+{"id": "large-language-modelenhanced-2025", "title": "Large Language Model-Enhanced Test Case Generation*", "authors": ["Qi Zhang", "Rui Kang", "Yitian Liu", "Xiayu Cao"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/CBASE67452.2025.11335495", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Software testing is essential for ensuring software quality, with test case generation being a core yet labor-intensive task. Traditional methods often suffer from low efficiency, inadequate coverage,", "doi": "10.1109/CBASE67452.2025.11335495"}
+{"id": "survey-automated-data-2025", "title": "A Survey on Automated Data Analysis Techniques Powered by Large Language Models", "authors": ["Guosong Zhan", "Ge Shi", "Xiaoguo Liu", "Ke Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICNC-FSKD67701.2025.11198129", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This survey provides a detailed examination of automated data analysis techniques empowered by Large Language Models (LLMs). We first categorize core data analysis tasks across descriptive, diagnostic", "doi": "10.1109/ICNC-FSKD67701.2025.11198129"}
+{"id": "llms-all-you-2025", "title": "LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models", "authors": ["Linghan Huang", "Peizhou Zhao", "Huaming Chen"], "year": 2025, "venue": "Asia-Pacific Software Engineering Conference", "source_url": "https://arxiv.org/abs/2510.10179", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid development of large language models (LLMs) has revolutionized software testing, particularly fuzz testing, by automating the generation of diverse and effective test inputs. This advancemen", "arxiv_id": "2510.10179", "doi": "10.1109/APSEC66846.2025.00059"}
+{"id": "secure-suspect-investigating-2025", "title": "Secure or Suspect? Investigating Package Hallucinations of Shell Command in Original and Quantized LLMs", "authors": ["Md. Nazmul Haque", "Elizabeth Lin", "Lawrence Arkoh", "B. Tadesse", "Bowen Xu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.08213", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models for code (LLMs4Code) are increasingly used to generate software artifacts, including library and package recommendations in languages such as Go. However, recent evidence shows t", "arxiv_id": "2512.08213", "doi": "10.48550/arXiv.2512.08213"}
+{"id": "clearagent-agentic-binary-2025", "title": "ClearAgent: Agentic Binary Analysis for Effective Vulnerability Detection", "authors": ["Xiang Chen", "Anshunkang Zhou", "Chengfeng Ye", "Charles Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3759425.3763397", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Statically detecting vulnerabilities at the binary level is crucial for the security of Commercial-Off-The-Shelf (COTS) software when source code is not available. However, traditional methods suffer ", "doi": "10.1145/3759425.3763397"}
+{"id": "gamegpt-multiagent-collaborative-2023", "title": "GameGPT: Multi-agent Collaborative Framework for Game Development", "authors": ["Da Chen", "Hanbin Wang", "Yunhao Huo", "Yuzhao Li", "Haoyang Zhang"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.08067", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The large language model (LLM) based agents have demonstrated their capacity to automate and expedite software development processes. In this paper, we focus on game development and propose a multi-ag", "arxiv_id": "2310.08067", "doi": "10.48550/arXiv.2310.08067"}
+{"id": "scalable-model-editing-2024", "title": "Scalable Model Editing via Customized Expert Networks", "authors": ["Zihan Yao", "Yu He", "T. Qi", "Ming Li"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.02699", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Addressing the issues of hallucinations and outdated knowledge in large language models is critical for their reliable application. Model Editing presents a promising avenue for mitigating these chall", "arxiv_id": "2404.02699", "doi": "10.48550/arXiv.2404.02699"}
+{"id": "mactg-multiagent-collaborative-2024", "title": "MaCTG: Multi-Agent Collaborative Thought Graph for Automatic Programming", "authors": ["Zixiao Zhao", "Jing Sun", "Zhe Hou", "Zhiyuan Wei", "Chenghao Cai"], "year": 2024, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2410.19245", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the rapid advancement of Large Language Models (LLMs), LLM-based approaches have demonstrated strong problem-solving capabilities across various domains. However, in automatic programming, a sing", "arxiv_id": "2410.19245"}
+{"id": "dgot-dynamic-graph-2024", "title": "DGoT: Dynamic Graph of Thoughts for Scientific Abstract Generation", "authors": ["Xinyu Ning", "Yutong Zhao", "Yitong Liu", "Hongwen Yang"], "year": 2024, "venue": "International Conference on Language Resources and Evaluation", "source_url": "https://arxiv.org/abs/2403.17491", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The method of training language models based on domain datasets has obtained significant achievements in the task of generating scientific paper abstracts. However, such models face problems of genera", "arxiv_id": "2403.17491", "doi": "10.48550/arXiv.2403.17491"}
+{"id": "when-dataflow-analysis-2024", "title": "When Dataflow Analysis Meets Large Language Models", "authors": ["Chengpeng Wang", "Wuqi Zhang", "Zian Su", "Xiangzhe Xu", "Xiaoheng Xie"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2402.10754", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2402.10754"}
+{"id": "grapheval-knowledgegraph-based-2024", "title": "GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework", "authors": ["Hannah Sansford", "N. Richardson", "Hermina Petric Maretic", "Juba Nait Saada"], "year": 2024, "venue": "KiL@KDD", "source_url": "https://arxiv.org/abs/2407.10793", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Methods to evaluate Large Language Model (LLM) responses and detect inconsistencies, also known as hallucinations, with respect to the provided knowledge, are becoming increasingly important for LLM a", "arxiv_id": "2407.10793", "doi": "10.48550/arXiv.2407.10793"}
+{"id": "measuring-reducing-llm-2024", "title": "Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting", "authors": ["Jiaheng Wei", "Yuanshun Yao", "Jean-François Ton", "Hongyi Guo", "Andrew Estornell"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.10412", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM hallucination, i.e. generating factually incorrect yet seemingly convincing answers, is currently a major threat to the trustworthiness and reliability of LLMs. The first step towards solving this", "arxiv_id": "2402.10412", "doi": "10.48550/arXiv.2402.10412"}
+{"id": "probabilistic-framework-llm-2024", "title": "A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation", "authors": ["Bairu Hou", "Yang Zhang", "Jacob Andreas", "Shiyu Chang"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2406.06950", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper focuses on the task of hallucination detection, which aims to determine the truthfulness of LLM-generated statements. To address this problem, a popular class of methods utilize the LLM's s", "arxiv_id": "2406.06950", "doi": "10.48550/arXiv.2406.06950"}
+{"id": "quantifying-uncertainty-llm-2024", "title": "Quantifying the uncertainty of LLM hallucination spreading in complex adaptive social networks", "authors": ["Guozhi Hao", "Jun Wu", "Qianqian Pan", "Rosario Morello"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41598-024-66708-4", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are becoming a significant source of content generation in social networks, which is a typical complex adaptive system (CAS). However, due to their hallucinatory nature, L", "doi": "10.1038/s41598-024-66708-4"}
+{"id": "mitigating-llm-hallucination-2023", "title": "Towards Mitigating LLM Hallucination via Self Reflection", "authors": ["Ziwei Ji", "Tiezheng Yu", "Yan Xu", "Nayeon Lee", "Etsuko Ishii"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2023.findings-emnlp.123", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.18653/v1/2023.findings-emnlp.123"}
+{"id": "chainpoll-high-efficacy-2023", "title": "Chainpoll: A high efficacy method for LLM hallucination detection", "authors": ["R. Friel", "Atindriyo Sanyal"], "year": 2023, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2310.18344", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have experienced notable advancements in generating coherent and contextually relevant responses. However, hallucinations - incorrect or unfounded claims - are still preva", "arxiv_id": "2310.18344", "doi": "10.48550/arXiv.2310.18344"}
+{"id": "knowledge-injection-counter-2023", "title": "Knowledge Injection to Counter Large Language Model (LLM) Hallucination", "authors": ["A. Martino", "Michael Iannelli", "Coleen Truong"], "year": 2023, "venue": "Unknown", "source_url": "https://doi.org/10.1007/978-3-031-43458-7_34", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1007/978-3-031-43458-7_34"}
+{"id": "repoaudit-autonomous-llmagent-2025", "title": "RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing", "authors": ["Jinyao Guo", "Chengpeng Wang", "Xiangzhe Xu", "Zian Su", "Xiangyu Zhang"], "year": 2025, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2501.18160", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code auditing is the process of reviewing code with the aim of identifying bugs. Large Language Models (LLMs) have demonstrated promising capabilities for this task without requiring compilation, whil", "arxiv_id": "2501.18160", "doi": "10.48550/arXiv.2501.18160"}
+{"id": "head-predict-head-2025", "title": "A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs", "authors": ["Artem Shelmanov", "Ekaterina Fadeeva", "A. Tsvigun", "Ivan Tsvigun", "Zhuohan Xie"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2505.08200", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information. This presents a major challenge, as hallucinations often appear highly co", "arxiv_id": "2505.08200", "doi": "10.48550/arXiv.2505.08200"}
+{"id": "executable-code-actions-2024", "title": "Executable Code Actions Elicit Better LLM Agents", "authors": ["Xingyao Wang", "Yangyi Chen", "Lifan Yuan", "Yizhe Zhang", "Yunzhu Li"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2402.01030", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are", "arxiv_id": "2402.01030", "doi": "10.48550/arXiv.2402.01030"}
+{"id": "haloscope-harnessing-unlabeled-2024", "title": "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection", "authors": ["Xuefeng Du", "Chaowei Xiao", "Yixuan Li"], "year": 2024, "venue": "Neural Information Processing Systems", "source_url": "https://arxiv.org/abs/2409.17504", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The surge in applications of large language models (LLMs) has prompted concerns about the generation of misleading or fabricated information, known as hallucinations. Therefore, detecting hallucinatio", "arxiv_id": "2409.17504", "doi": "10.48550/arXiv.2409.17504"}
+{"id": "llm-internal-states-2024", "title": "LLM Internal States Reveal Hallucination Risk Faced With a Query", "authors": ["Ziwei Ji", "Delong Chen", "Etsuko Ishii", "Samuel Cahyawijaya", "Yejin Bang"], "year": 2024, "venue": "BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP", "source_url": "https://arxiv.org/abs/2407.03282", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don’t kno", "arxiv_id": "2407.03282", "doi": "10.48550/arXiv.2407.03282"}
+{"id": "do-llms-know-2024", "title": "Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States", "authors": ["Hanyu Duan", "Yi Yang", "K. Tam"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.09733", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) can make up answers that are not real, and this is known as hallucination. This research aims to see if, how, and to what extent LLMs are aware of hallucination. More spec", "arxiv_id": "2402.09733", "doi": "10.48550/arXiv.2402.09733"}
+{"id": "thinking-before-looking-2024", "title": "Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination", "authors": ["Haojie Zheng", "Tianyang Xu", "Hanchi Sun", "Shu Pu", "Ruoxi Chen"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2411.12591", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Multimodal large language models (MLLMs) have advanced the integration of visual and linguistic modalities, establishing themselves as the dominant paradigm for visual-language tasks. Current approach", "arxiv_id": "2411.12591", "doi": "10.48550/arXiv.2411.12591"}
+{"id": "askeda-design-assistant-2024", "title": "Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-hallucination", "authors": ["Luyao Shi", "Michael A. Kazda", "Bradley Sears", "Nick Shropshire", "Ruchir Puri"], "year": 2024, "venue": "2024 IEEE LLM Aided Design Workshop (LAD)", "source_url": "https://arxiv.org/abs/2406.06575", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Electronic design engineers are challenged to find relevant information efficiently for a myriad of tasks within design construction, verification and technology development. Large language models (LL", "arxiv_id": "2406.06575", "doi": "10.1109/LAD62341.2024.10691824"}
+{"id": "mitigating-api-hallucination-2025", "title": "Towards Mitigating API Hallucination in Code Generated by LLMs with Hierarchical Dependency Aware", "authors": ["Yujia Chen", "Mingyu Chen", "Cuiyun Gao", "Zhihan Jiang", "Zhongqi Li"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2505.05057", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Application Programming Interfaces (APIs) are crucial in modern software development. Large Language Models (LLMs) assist in automated code generation but often struggle with API hallucination, includ", "arxiv_id": "2505.05057", "doi": "10.1145/3696630.3728569"}
+{"id": "hallucination-by-code-2025", "title": "Hallucination by Code Generation LLMs: Taxonomy, Benchmarks, Mitigation, and Challenges", "authors": ["Yunseo Lee", "John Youngeun Song", "Dongsun Kim", "Jindae Kim", "Mijung Kim"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.20799", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent technical breakthroughs in large language models (LLMs) have enabled them to fluently generate source code. Software developers often leverage both general-purpose and code-specialized LLMs to ", "arxiv_id": "2504.20799", "doi": "10.48550/arXiv.2504.20799"}
+{"id": "vectrans-enhancing-compiler-2025", "title": "VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations", "authors": ["Zhongchun Zheng", "Kan Wu", "Long Cheng", "Lu Li", "Rodrigo C. O. Rocha"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2503.19449", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Auto-vectorization is a fundamental optimization for modern compilers to exploit SIMD parallelism. However, state-of-the-art approaches still struggle to handle intricate code patterns, often requirin", "arxiv_id": "2503.19449"}
+{"id": "fast-reliable-secure-2025", "title": "A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions", "authors": ["Stephen Mell", "Botong Zhang", "David Mell", "Shuo Li", "Ramya Ramalingam"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2506.12202", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Modern large language models (LLMs) are often deployed as agents, calling external tools adaptively to solve tasks. Rather than directly calling tools, it can be more effective for LLMs to write code ", "arxiv_id": "2506.12202", "doi": "10.48550/arXiv.2506.12202"}
+{"id": "llmdriven-code-refactoring-2025", "title": "LLM-Driven Code Refactoring: Opportunities and Limitations", "authors": ["Jonathan Cordeiro", "Shayan Noei", "Ying Zou"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/IDE66625.2025.00011", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.1109/IDE66625.2025.00011"}
+{"id": "healthdial-nocode-llmassisted-2025", "title": "HealthDial: A No-Code LLM-Assisted Dialogue Authoring Tool for Healthcare Virtual Agents", "authors": ["Farnaz Nouraei", "Zhuorui Yong", "Timothy Bickmore"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.15898", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We introduce HealthDial, a dialogue authoring tool that helps healthcare providers and educators create virtual agents that deliver health education and counseling to patients over multiple conversati", "arxiv_id": "2510.15898", "doi": "10.48550/arXiv.2510.15898"}
+{"id": "static-analysis-as-2025", "title": "Static Analysis as a Feedback Loop: Enhancing LLM-Generated Code Beyond Correctness", "authors": ["Scott Blyth", "Sherlock A. Licorish", "Christoph Treude", "Markus Wagner"], "year": 2025, "venue": "IEEE Working Conference on Source Code Analysis and Manipulation", "source_url": "https://arxiv.org/abs/2508.14419", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated impressive capabilities in code generation, achieving high scores on benchmarks such as HumanEval and MBPP. However, these benchmarks primarily assess fu", "arxiv_id": "2508.14419", "doi": "10.1109/SCAM67354.2025.00017"}
+{"id": "retrieval-augmented-generation-2025-2", "title": "Retrieval Augmented Generation Fine-Tuned LLM Model for Code Recommendations to Mitigate Lock Contention", "authors": ["Ashadullah Shawon", "Ramiro Liscano", "Akramul Azim", "Vijay Sundaresan", "Yee-Kang Chang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3680256.3721324", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Lock contention performance faults can lead to degradation in the performance of software applications. Unlike software bugs, per- formance faults do not lead to failures and application crashes but s", "doi": "10.1145/3680256.3721324"}
+{"id": "llmbased-code-generation-2025", "title": "LLM-Based Code Generation: A Systematic Literature Review With Technical and Demographic Insights", "authors": [".. Umama", "Kamaluddeen Usman Danyaro", "Maged Nasser", "A. Zakari", "Shamsu Abdullahi"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ACCESS.2025.3631952", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid emergence of Large Language Models (LLMs) has significantly advanced the field of code generation, sparking growing research interest across both academia and industry. While existing review", "doi": "10.1109/ACCESS.2025.3631952"}
+{"id": "collaboration-all-you-2025", "title": "Collaboration is all you need: LLM Assisted Safe Code Translation", "authors": ["Rabimba Karanjai", "Sam Blackshear", "Lei Xu", "Weidong Shi"], "year": 2025, "venue": "SIGSOFT FSE Companion", "source_url": "https://arxiv.org/abs/2503.11237", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces UniTranslator, a framework, that uses multiple, compact LLMs working together for code translation. By orchestrating the interaction of specialized agents, each focused on differ", "arxiv_id": "2503.11237", "doi": "10.1145/3696630.3728521"}
+{"id": "bridging-llmgenerated-code-2025", "title": "Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights", "authors": ["Ahilan Ayyachamy Nadar Ponnusamy"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.07835", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rise of Large Language Models (LLMs) in software engineering, particularly in code generation, has garnered significant attention. However, assessing the quality of AI-generated code remains a cha", "arxiv_id": "2502.07835", "doi": "10.48550/arXiv.2502.07835"}
+{"id": "refining-critical-thinking-2025", "title": "Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework", "authors": ["Jialin Li", "Jinzhe Li", "Gengxu Li", "Yi Chang", "Yuan Wu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2508.03622", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: With the advancement of code generation capabilities in large language models (LLMs), their reliance on input premises has intensified. When users provide inputs containing faulty premises, the probab", "arxiv_id": "2508.03622", "doi": "10.48550/arXiv.2508.03622"}
+{"id": "cracking-code-hallucination-2024", "title": "Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence", "authors": ["Jinghan He", "Kuan Zhu", "Haiyun Guo", "Junfeng Fang", "Zhenglin Hua"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2412.13949", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large vision-language models (LVLMs) have made substantial progress in integrating large language models (LLMs) with visual inputs, enabling advanced multimodal reasoning. Despite their success, a per", "arxiv_id": "2412.13949", "doi": "10.48550/arXiv.2412.13949"}
+{"id": "llm-help-code-2023", "title": "Using an LLM to Help with Code Understanding", "authors": ["Daye Nam", "A. Macvean", "Vincent J. Hellendoorn", "Bogdan Vasilescu", "B. Myers"], "year": 2023, "venue": "International Conference on Software Engineering", "source_url": "https://arxiv.org/abs/2307.08177", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Understanding code is challenging, especially when working in new and complex development environments. Code comments and documentation can help, but are typically scarce or hard to navigate. Large la", "arxiv_id": "2307.08177", "doi": "10.1145/3597503.3639187"}
+{"id": "etf-entity-tracing-2024", "title": "ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries", "authors": ["Kishan Maharaj", "Vitobha Munigala", "Srikanth G. Tamilselvam", "Prince Kumar", "Sayandeep Sen"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2410.14748", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL", "arxiv_id": "2410.14748", "doi": "10.48550/arXiv.2410.14748"}
+{"id": "tamigo-empowering-teaching-2024", "title": "TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class", "authors": ["Anishka Iiitd", "Diksha Sethi", "Nipun Gupta", "Shikhar Sharma", "Srishti Jain"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.16805", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have significantly transformed the educational landscape, offering new tools for students, instructors, and teaching assistants. This paper investigates the application of", "arxiv_id": "2407.16805", "doi": "10.48550/arXiv.2407.16805"}
+{"id": "llmdriven-framework-dynamic-2024", "title": "An LLM-driven Framework for Dynamic Infrastructure as Code Generation", "authors": ["Junhee Lee", "Sungjoo Kang", "In-Young Ko"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3704440.3704778", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper proposes a Large Language Model (LLM)-driven framework for generation of Infrastructure as Code (IaC) in dynamic environments. While IaC simplifies infrastructure management, static templat", "doi": "10.1145/3704440.3704778"}
+{"id": "humanlike-code-quality-2024", "title": "Human-Like Code Quality Evaluation through LLM-based Recursive Semantic Comprehension", "authors": ["Fangzhou Xu", "Sai Zhang", "Zhenchang Xing", "Xiaowang Zhang", "Yahong Han"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2412.00314", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Code quality evaluation involves scoring generated code quality based on a reference code for a specific problem statement. Currently, there are two main forms of evaluating code quality: match-based ", "arxiv_id": "2412.00314", "doi": "10.48550/arXiv.2412.00314"}
+{"id": "autogenics-automated-generation-2024", "title": "AUTOGENICS: Automated Generation of Context-Aware Inline Comments for Code Snippets on Programming Q&A Sites Using LLM", "authors": ["Suborno Deb Bappon", "Saikat Mondal", "Banani Roy"], "year": 2024, "venue": "IEEE Working Conference on Source Code Analysis and Manipulation", "source_url": "https://arxiv.org/abs/2408.15411", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inline comments in the source code facilitate easy comprehension, reusability, and enhanced readability. However, code snippets in answers on Q&A sites like Stack Overflow (SO) often lack comments bec", "arxiv_id": "2408.15411", "doi": "10.1109/SCAM63643.2024.00013"}
+{"id": "zsllmcode-effective-approach-2024", "title": "zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning", "authors": ["Zixiang Xian", "Chenhui Cui", "Rubing Huang", "Chunrong Fang", "Zhenyu Chen"], "year": 2024, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2409.14644", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2409.14644"}
+{"id": "framework-assess-clinical-2025", "title": "A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation", "authors": ["Elham Asgari", "N. Brown", "Magda Dubois", "Saleh Khalil", "J. Balloch"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1038/s41746-025-01670-7", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Integrating large language models (LLMs) into healthcare can enhance workflow efficiency and patient care by automating tasks such as summarising consultations. However, the fidelity between LLM outpu", "doi": "10.1038/s41746-025-01670-7"}
+{"id": "llmbased-control-code-2023", "title": "LLM-based Control Code Generation using Image Recognition", "authors": ["Heiko Koziolek", "Anne Koziolek"], "year": 2023, "venue": "2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)", "source_url": "https://arxiv.org/abs/2311.10401", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: LLM-based code generation could save significant manual efforts in industrial automation, where control engineers manually produce control logic for sophisticated production processes. Previous attemp", "arxiv_id": "2311.10401", "doi": "10.1145/3643795.3648385"}
+{"id": "llmassisted-codebook-development-2026", "title": "LLM-Assisted Codebook Development for Cybersecurity Interviews with Enhanced Accuracy and Reduced Hallucination", "authors": ["Aisvarya Adeseye", "J. Isoaho", "Seppo Virtanen", "Mohammad Tahir"], "year": 2026, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICAIC67076.2026.11395872", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Beyond what numerical data captures, qualitative cybersecurity interviews reveal human behaviors, lived experiences, trust perceptions and decision-making patterns. However, today’s current manual and", "doi": "10.1109/ICAIC67076.2026.11395872"}
+{"id": "survey-hallucination-large-2023-2", "title": "A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions", "authors": ["Lei Huang", "Weijiang Yu", "Weitao Ma", "Weihong Zhong", "Zhangyin Feng"], "year": 2023, "venue": "ACM Trans. Inf. Syst.", "source_url": "https://arxiv.org/abs/2311.05232", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are pr", "arxiv_id": "2311.05232", "doi": "10.1145/3703155"}
+{"id": "evaluating-object-hallucination-2023", "title": "Evaluating Object Hallucination in Large Vision-Language Models", "authors": ["Yifan Li", "Yifan Du", "Kun Zhou", "Jinpeng Wang", "Wayne Xin Zhao"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2305.10355", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Inspired by the superior language abilities of large language models (LLM), large vision-language models (LVLM) have been recently explored by integrating powerful LLMs for improving the performance o", "arxiv_id": "2305.10355", "doi": "10.48550/arXiv.2305.10355"}
+{"id": "hallucination-inevitable-innate-2024", "title": "Hallucination is Inevitable: An Innate Limitation of Large Language Models", "authors": ["Ziwei Xu", "Sanjay Jain", "Mohan Kankanhalli"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2401.11817", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts hav", "arxiv_id": "2401.11817", "doi": "10.48550/arXiv.2401.11817"}
+{"id": "multitask-multilingual-multimodal-2023", "title": "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity", "authors": ["Yejin Bang", "Samuel Cahyawijaya", "Nayeon Lee", "Wenliang Dai", "Dan Su"], "year": 2023, "venue": "International Joint Conference on Natural Language Processing", "source_url": "https://arxiv.org/abs/2302.04023", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 da", "arxiv_id": "2302.04023", "doi": "10.18653/v1/2023.ijcnlp-main.45"}
+{"id": "halogen-fantastic-llm-2025", "title": "HALoGEN: Fantastic LLM Hallucinations and Where to Find Them", "authors": ["Abhilasha Ravichander", "Shrusti Ghela", "David Wadden", "Yejin Choi"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2501.08292", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Despite their impressive ability to generate high-quality and fluent text, generative large language models (LLMs) also produce hallucinations: statements that are misaligned with established world kn", "arxiv_id": "2501.08292", "doi": "10.48550/arXiv.2501.08292"}
+{"id": "who-validates-validators-2024", "title": "Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences", "authors": ["Shreya Shankar", "J.D. Zamfirescu-Pereira", "Bjorn Hartmann", "Aditya G. Parameswaran", "Ian Arawjo"], "year": 2024, "venue": "ACM Symposium on User Interface Software and Technology", "source_url": "https://arxiv.org/abs/2404.12272", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-", "arxiv_id": "2404.12272", "doi": "10.1145/3654777.3676450"}
+{"id": "dawn-after-dark-2024", "title": "The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models", "authors": ["Junyi Li", "Jie Chen", "Ruiyang Ren", "Xiaoxue Cheng", "Wayne Xin Zhao"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2401.03205", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In the era of large language models (LLMs), hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to trustworthy and reliable deployment of LLMs in real-worl", "arxiv_id": "2401.03205", "doi": "10.48550/arXiv.2401.03205"}
+{"id": "hdlcore-trainingfree-framework-2025", "title": "HDLCoRe: A Training-Free Framework for Mitigating Hallucinations in LLM-Generated HDL", "authors": ["Heng Ping", "Shixuan Li", "Peiyu Zhang", "Anzhe Cheng", "Shukai Duan"], "year": 2025, "venue": "2025 IEEE International Conference on LLM-Aided Design (ICLAD)", "source_url": "https://arxiv.org/abs/2503.16528", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, when applied to hardware description languages (HDL), these models exhibit ", "arxiv_id": "2503.16528", "doi": "10.1109/ICLAD65226.2025.00034"}
+{"id": "llm-critics-help-2024", "title": "LLM Critics Help Catch LLM Bugs", "authors": ["Nat McAleese", "Rai Michael Pokorny", "Juan Felipe Cer'on Uribe", "Evgenia Nitishinskaya", "Maja Trebacz"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2407.00215", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Reinforcement learning from human feedback (RLHF) is fundamentally limited by the capacity of humans to correctly evaluate model output. To improve human evaluation ability and overcome that limitatio", "arxiv_id": "2407.00215", "doi": "10.48550/arXiv.2407.00215"}
+{"id": "unsupervised-realtime-hallucination-2024", "title": "Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models", "authors": ["Weihang Su", "Changyue Wang", "Qingyao Ai", "Hu YiRan", "Zhijing Wu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2403.06448", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practi", "arxiv_id": "2403.06448", "doi": "10.48550/arXiv.2403.06448"}
+{"id": "reducing-hallucination-structured-2024", "title": "Reducing hallucination in structured outputs via Retrieval-Augmented Generation", "authors": ["Patrice Bechard", "Orlando Marquez Ayala"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2404.08189", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: A common and fundamental limitation of Generative AI (GenAI) is its propensity to hallucinate. While large language models (LLM) have taken the world by storm, without eliminating or at least reducing", "arxiv_id": "2404.08189", "doi": "10.18653/v1/2024.naacl-industry.19"}
+{"id": "chainofthought-prompting-obscures-2025", "title": "Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation", "authors": ["Jiahao Cheng", "Tiancheng Su", "Jia Yuan", "Guoxiu He", "Jiawei Liu"], "year": 2025, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2506.17088", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) often exhibit \\textit{hallucinations}, generating factually incorrect or semantically irrelevant content in response to prompts. Chain-of-Thought (CoT) prompting can mitig", "arxiv_id": "2506.17088", "doi": "10.48550/arXiv.2506.17088"}
+{"id": "survey-large-language-2024", "title": "A Survey on Large Language Model Hallucination via a Creativity Perspective", "authors": ["Xuhui Jiang", "Yuxing Tian", "Fengrui Hua", "Chengjin Xu", "Yuanzhuo Wang"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2402.06647", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucinations in large language models (LLMs) are always seen as limitations. However, could they also be a source of creativity? This survey explores this possibility, suggesting that hallucinations", "arxiv_id": "2402.06647", "doi": "10.48550/arXiv.2402.06647"}
+{"id": "knowagent-knowledgeaugmented-planning-2024", "title": "KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents", "authors": ["Yuqi Zhu", "Shuofei Qiao", "Yixin Ou", "Shumin Deng", "Ningyu Zhang"], "year": 2024, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2403.03101", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environment", "arxiv_id": "2403.03101", "doi": "10.48550/arXiv.2403.03101"}
+{"id": "selfcheckgpt-zeroresource-blackbox-2023", "title": "SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models", "authors": ["Potsawee Manakul", "Adian Liusie", "M. Gales"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2303.08896", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-fac", "arxiv_id": "2303.08896", "doi": "10.48550/arXiv.2303.08896"}
+{"id": "benchmarking-hallucination-large-2024", "title": "Benchmarking Hallucination in Large Language Models Based on Unanswerable Math Word Problem", "authors": ["Yuhong Sun", "Zhangyue Yin", "Qipeng Guo", "Jiawen Wu", "Xipeng Qiu"], "year": 2024, "venue": "International Conference on Language Resources and Evaluation", "source_url": "https://arxiv.org/abs/2403.03558", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks. However, they are susceptible to producing unreliable conjectures in ambiguous contexts called hal", "arxiv_id": "2403.03558", "doi": "10.48550/arXiv.2403.03558"}
+{"id": "sirens-song-ai-2023", "title": "Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models", "authors": ["Yue Zhang", "Yafu Li", "Leyang Cui", "Deng Cai", "Lemao Liu"], "year": 2023, "venue": "Computational Linguistics", "source_url": "https://arxiv.org/abs/2309.01219", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: \n While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLM", "arxiv_id": "2309.01219", "doi": "10.1162/coli.a.16"}
+{"id": "vdpo-mitigating-hallucination-2024", "title": "V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization", "authors": ["Yuxi Xie", "Guanzhen Li", "Xiao Xu", "Min-Yen Kan"], "year": 2024, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2411.02712", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large vision-language models (LVLMs) suffer from hallucination, resulting in misalignment between the output textual response and the input visual content. Recent research indicates that the over-reli", "arxiv_id": "2411.02712", "doi": "10.18653/v1/2024.findings-emnlp.775"}
+{"id": "incontext-sharpness-as-2024", "title": "In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation", "authors": ["Shiqi Chen", "Miao Xiong", "Junteng Liu", "Zhengxuan Wu", "Teng Xiao"], "year": 2024, "venue": "International Conference on Machine Learning", "source_url": "https://arxiv.org/abs/2403.01548", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechani", "arxiv_id": "2403.01548", "doi": "10.48550/arXiv.2403.01548"}
+{"id": "mgverilog-multigrained-dataset-2024", "title": "MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation", "authors": ["Yongan Zhang", "Zhongzhi Yu", "Yonggan Fu", "Cheng Wan", "Y. Lin"], "year": 2024, "venue": "2024 IEEE LLM Aided Design Workshop (LAD)", "source_url": "https://arxiv.org/abs/2407.01910", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with", "arxiv_id": "2407.01910", "doi": "10.1109/LAD62341.2024.10691738"}
+{"id": "mitigating-llm-hallucinations-2024", "title": "Mitigating LLM Hallucinations via Conformal Abstention", "authors": ["Yasin Abbasi-Yadkori", "Ilja Kuzborskij", "David Stutz", "András György", "Adam Fisch"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2405.01563", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying\"I don't know\") in a general domain, instead of resorting to possibly", "arxiv_id": "2405.01563", "doi": "10.48550/arXiv.2405.01563"}
+{"id": "paradigmbased-automatic-hdl-2025", "title": "Paradigm-Based Automatic HDL Code Generation Using LLMs", "authors": ["Wenhao Sun", "Bing Li", "Grace Li Zhang", "Xunzhao Yin", "Cheng Zhuo"], "year": 2025, "venue": "IEEE International Symposium on Quality Electronic Design", "source_url": "https://arxiv.org/abs/2501.12702", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While large language models (LLMs) have demonstrated the ability to generate hardware description language (HDL) code for digital circuits, they still face the hallucination problem, which can result ", "arxiv_id": "2501.12702", "doi": "10.1109/ISQED65160.2025.11014391"}
+{"id": "haleval-universal-finegrained-2024", "title": "Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models", "authors": ["Chaoya Jiang", "Wei Ye", "Mengfan Dong", "Hongrui Jia", "Haiyang Xu"], "year": 2024, "venue": "ACM Multimedia", "source_url": "https://arxiv.org/abs/2402.15721", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Vision-Language Models (LVLMs) exhibit remarkable capabilities but struggle with ''hallucinations''-inconsistencies between images and their descriptions. Previous hallucination evaluation studi", "arxiv_id": "2402.15721", "doi": "10.1145/3664647.3680576"}
+{"id": "hallucinot-hallucination-detection-2025", "title": "HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification", "authors": ["Bibek Paudel", "Alexander Lyzhov", "Preetam Joshi", "Puneet Anand"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2504.07069", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: This paper introduces a comprehensive system for detecting hallucinations in large language model (LLM) outputs in enterprise settings. We present a novel taxonomy of LLM responses specific to halluci", "arxiv_id": "2504.07069", "doi": "10.48550/arXiv.2504.07069"}
+{"id": "ragtruth-hallucination-corpus-2023", "title": "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models", "authors": ["Yuanhao Wu", "Juno Zhu", "Siliang Xu", "Kashun Shum", "Cheng Niu"], "year": 2023, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2401.00396", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or c", "arxiv_id": "2401.00396", "doi": "10.48550/arXiv.2401.00396"}
+{"id": "guardian-safeguarding-llm-2025", "title": "GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling", "authors": ["Jialong Zhou", "Lichao Wang", "Xiao Yang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2505.19234", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical s", "arxiv_id": "2505.19234", "doi": "10.48550/arXiv.2505.19234"}
+{"id": "citationenhanced-generation-llmbased-2024", "title": "Citation-Enhanced Generation for LLM-based Chatbots", "authors": ["Weitao Li", "Junkai Li", "Weizhi Ma", "Yang Liu"], "year": 2024, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2402.16063", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) exhibit powerful general intelligence across diverse scenarios, including their integration into chatbots. However, a vital challenge of LLM-based chatbots is that they ma", "arxiv_id": "2402.16063", "doi": "10.18653/v1/2024.acl-long.79"}
+{"id": "grait-gradientdriven-refusalaware-2025", "title": "GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation", "authors": ["Runchuan Zhu", "Zinco Jiang", "Jiang Wu", "Zhipeng Ma", "Jiahe Song"], "year": 2025, "venue": "North American Chapter of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2502.05911", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs) by improving their ability to refuse responses to questions beyond their knowledge, thereby reducing hallucinations", "arxiv_id": "2502.05911", "doi": "10.48550/arXiv.2502.05911"}
+{"id": "largepig-hallucinationfree-query-2025", "title": "LargePiG for Hallucination-Free Query Generation: Your Large Language Model is Secretly a Pointer Generator", "authors": ["ZhongXiang Sun", "Zihua Si", "Xiaoxue Zang", "Kai Zheng", "Yang Song"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3696410.3714800", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Recent research on query generation has focused on using Large Language Models (LLMs), which, despite achieving state-of-the-art performance, also introduce hallucination issues in generated queries. ", "doi": "10.1145/3696410.3714800"}
+{"id": "paint-paying-attention-2025", "title": "PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model", "authors": ["Kazi Hasan Ibn Arif", "Sajib Acharjee Dip", "Khizar Hussain", "Lang Zhang", "Chris Thomas"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2501.12206", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities in understanding and describing visual content, achieving state-of-the-art performance across various vision-language tas", "arxiv_id": "2501.12206"}
+{"id": "cracking-sql-barriers-2025", "title": "Cracking SQL Barriers: An LLM-based Dialect Translation System", "authors": ["Wei Zhou", "Yuyang Gao", "Xuanhe Zhou", "Guoliang Li"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1145/3725278", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Automatic dialect translation reduces the complexity of database migration, which is crucial for applications interacting with multiple database systems. However, rule-based translation tools (e.g., S", "doi": "10.1145/3725278"}
+{"id": "how-llms-react-2025", "title": "How LLMs React to Industrial Spatio-Temporal Data? Assessing Hallucination with a Novel Traffic Incident Benchmark Dataset", "authors": ["Qiang Li", "Mingkun Tan", "Xun Zhao", "Dan Zhang", "Daoan Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.18653/v1/2025.naacl-industry.4", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) hold revolutionary potential to digitize and enhance the Health & Public Services (H&PS) industry. Despite their advanced linguistic abilities, concerns about accuracy, st", "doi": "10.18653/v1/2025.naacl-industry.4"}
+{"id": "refind-at-semeval2025-2025", "title": "REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models", "authors": ["DongGeon Lee", "Hwanjo Yu"], "year": 2025, "venue": "arXiv", "source_url": "https://arxiv.org/abs/2502.13622", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucinations in large language model (LLM) outputs severely limit their reliability in knowledge-intensive tasks such as question answering. To address this challenge, we introduce REFIND (Retrieval", "arxiv_id": "2502.13622"}
+{"id": "beyond-semantic-entropy-2025", "title": "Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity", "authors": ["Dang Nguyen", "Ali Payani", "Baharan Mirzasoleiman"], "year": 2025, "venue": "Annual Meeting of the Association for Computational Linguistics", "source_url": "https://arxiv.org/abs/2506.00245", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Hallucination in large language models (LLMs) can be detected by assessing the uncertainty of model outputs, typically measured using entropy. Semantic entropy (SE) enhances traditional entropy estima", "arxiv_id": "2506.00245", "doi": "10.48550/arXiv.2506.00245"}
+{"id": "llmqe-improving-query-2025", "title": "LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences", "authors": ["Sijia Yao", "Pengcheng Huang", "Zhenghao Liu", "Yu Gu", "Yukun Yan"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.48550/arXiv.2502.17057", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar.", "doi": "10.48550/arXiv.2502.17057"}
+{"id": "consistency-key-detecting-2025", "title": "Consistency Is the Key: Detecting Hallucinations in LLM Generated Text By Checking Inconsistencies About Key Facts", "authors": ["Raavi Gupta", "P. Panicker", "Sumit Bhatia", "Ganesh Ramakrishnan"], "year": 2025, "venue": "IJCNLP-AACL", "source_url": "https://arxiv.org/abs/2511.12236", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs), despite their remarkable text generation capabilities, often hallucinate and generate text that is factually incorrect and not grounded in real-world knowledge. This pose", "arxiv_id": "2511.12236", "doi": "10.48550/arXiv.2511.12236"}
+{"id": "fakes-varying-shades-2024", "title": "Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations", "authors": ["Mahjabin Nahar", "H. Seo", "Eun-Ju Lee", "Aiping Xiong", "Dongwon Lee"], "year": 2024, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2404.03745", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucin", "arxiv_id": "2404.03745", "doi": "10.48550/arXiv.2404.03745"}
+{"id": "commcot-standardized-chainofthought-2025", "title": "Comm-CoT: Standardized Chain-of-Thought Communication Framework for Efficient LLM based Multi-Agent Decision-Making in Real-Time Strategy Games", "authors": ["Runnan Qi", "Yuming Quan", "Yanan Ni", "Zongyuan Li", "Xiaojie Xu"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ECIS65594.2025.11087008", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: In recent years, Large Language Models (LLMs) have become a hotspot in AI research due to their remarkable success in natural language processing. LLM-based Multi-Agent Systems have also achieved sign", "doi": "10.1109/ECIS65594.2025.11087008"}
+{"id": "zofia-zeroshot-fake-2025", "title": "ZoFia: Zero-Shot Fake News Detection with Entity-Guided Retrieval and Multi-LLM Interaction", "authors": ["Lvhua Wu", "Xue Jiang", "Sheng Sun", "Tian Wen", "Yuwei Wang"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2511.01188", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The rapid spread of fake news threatens social stability and public trust, rendering its detection an imperative research priority. Although large language models (LLMs) excel at numerous natural lang", "arxiv_id": "2511.01188", "doi": "10.48550/arXiv.2511.01188"}
+{"id": "d2dllm-unified-translation-2025", "title": "D2D-LLM+: Unified Translation Between Design Rules/Manuals and DRC - Bridging Inconsistencies for Accurate Implementation", "authors": ["Ruoyu Tang", "Chao Wang", "Jiajun Yap", "Zixian Guo", "Yuhang Zhang"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/AICAS64808.2025.11173119", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The accurate and consistent translation of design rules/manuals into machine-readable Design Rule Check (DRC) formats is critical for ensuring the manufacturability and reliability of integrated circu", "doi": "10.1109/AICAS64808.2025.11173119"}
+{"id": "beyond-textual-context-2025", "title": "Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs", "authors": ["Yifang Zhang", "Pengfei Duan", "Yiwen Yang", "Shengwu Xiong"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.22251", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Currently, the main approach for Large Language Models (LLMs) to tackle the hallucination issue is incorporating Knowledge Graphs(KGs).However, LLMs typically treat KGs as plain text, extracting only ", "arxiv_id": "2509.22251", "doi": "10.48550/arXiv.2509.22251"}
+{"id": "look-closer-adversarial-2025", "title": "Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs", "authors": ["Jiayu Hu", "Beibei Li", "Ji Xia", "Yanjun Qin", "Bing Ji"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.21999", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: While Vision-Language Models (VLMs) have garnered increasing attention in the AI community due to their promising practical applications, they exhibit persistent hallucination issues, generating outpu", "arxiv_id": "2512.21999", "doi": "10.48550/arXiv.2512.21999"}
+{"id": "unification-hallucination-detection-2025", "title": "Towards Unification of Hallucination Detection and Fact Verification for Large Language Models", "authors": ["Weihang Su", "Jianming Long", "Changyue Wang", "Shiyu Lin", "Jingyang Xu"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2512.02772", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) frequently exhibit hallucinations, generating content that appears fluent and coherent but is factually incorrect. Such errors undermine trust and hinder their adoption in", "arxiv_id": "2512.02772", "doi": "10.48550/arXiv.2512.02772"}
+{"id": "catdb-datacatalogguided-llmbased-2025", "title": "CatDB: Data-catalog-guided, LLM-based Generation of Data-centric ML Pipelines", "authors": ["Saeed Fathollahzadeh", "Essam Mansour", "Matthias Boehm"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.14778/3742728.3742754", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Data-centric machine learning (ML) pipelines extend traditional ML pipelines—of feature transformations, hyper-parameter tuning, and model training—by additional pre-processing steps for data cleaning", "doi": "10.14778/3742728.3742754"}
+{"id": "knowpath-knowledgeenhanced-reasoning-2025", "title": "KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs", "authors": ["Qi Zhao", "Hongyu Yang", "Qi Song", "Xin Yao", "Xiangyang Li"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2502.12029", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in various complex tasks, yet they still suffer from hallucinations. By incorporating and exploring external knowledge, such as k", "arxiv_id": "2502.12029", "doi": "10.48550/arXiv.2502.12029"}
+{"id": "together-we-better-2025", "title": "Together We are Better: LLM, IDE and Semantic Embedding to Assist Move Method Refactoring", "authors": ["Abhiram Bellur", "Fraol Batole", "Mohammed Raihan Ullah", "Malinda Dilhara", "Yaroslav Zharov"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/ICSME64153.2025.00046", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: MoveMethod is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform Movemet", "doi": "10.1109/ICSME64153.2025.00046"}
+{"id": "aes-cryptography-enabled-2025", "title": "AES Cryptography Enabled Responsible Federated Foundation Model Using Transformer LLM and LSTM for Smart Grid IIoT Networks", "authors": ["M. Hasan", "S. Rayhan Kabir", "Shayla Islam", "Salwani Abdullah", "Huda Saleh Abbas"], "year": 2025, "venue": "Unknown", "source_url": "https://doi.org/10.1109/JIOT.2025.3608807", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: The use of supervisory control and data acquisition (SCADA) and advanced metering infrastructure (AMI) systems in smart grid-based Industrial Internet of Things (SG-IIoT) networks for proper energy su", "doi": "10.1109/JIOT.2025.3608807"}
+{"id": "beyond-token-probes-2025", "title": "Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT", "authors": ["Guy Bar-Shalom", "Fabrizio Frasca", "Yaniv Galron", "Yftah Ziser", "Haggai Maron"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2510.00296", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Detecting hallucinations in Large Language Model-generated text is crucial for their safe deployment. While probing classifiers show promise, they operate on isolated layer-token pairs and are LLM-spe", "arxiv_id": "2510.00296", "doi": "10.48550/arXiv.2510.00296"}
+{"id": "sources-hallucination-by-2023", "title": "Sources of Hallucination by Large Language Models on Inference Tasks", "authors": ["Nick McKenna", "Tianyi Li", "Liang Cheng", "Mohammad Javad Hosseini", "Mark Johnson"], "year": 2023, "venue": "Conference on Empirical Methods in Natural Language Processing", "source_url": "https://arxiv.org/abs/2305.14552", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral stu", "arxiv_id": "2305.14552", "doi": "10.48550/arXiv.2305.14552"}
+{"id": "llm-agentic-approach-2025", "title": "An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software", "authors": ["Sina Gogani-Khiabani", "Ashutosh Trivedi", "Diptikalyan Saha", "Saeid Tizpaz-Niari"], "year": 2025, "venue": "arXiv.org", "source_url": "https://arxiv.org/abs/2509.13471", "source": "semantic_scholar", "status": "queued", "tags": [], "added": "2026-02-27", "notes": "Found via Semantic Scholar. Abstract: Large language models (LLMs) show promise for translating natural-language statutes into executable logic, but reliability in legally critical settings remains challenging due to ambiguity and halluci", "arxiv_id": "2509.13471", "doi": "10.1145/3744916.3764575"}

	ai-research-survey Systematic scan of agentic development research. What's signal, what's noise.
	git clone https://git.shiptheloop.com/ai-research-survey.git
	Log \| Files \| Refs