ACM Conference on AI and Agentic Systems

Archives
April 30, 2026, 7:38 p.m.

ACM CAIS Accepted papers & demos posted!

ACM Conference on AI and Agentic Systems

The initial ACM CAIS accepted papers, demos, and workshops are posted!

We have five co-located workshops with exciting invited speakers:

  1. AI Agents for Discovery in the Wild – Joseph Gonzalez (UC Berkeley), Azalia Mirhoseini (Stanford/Ricursive Intelligence), Graham Neubig (CMU/OpenHands), Mohammad Alizadeh (MIT/Glia AI), James Zou (Stanford), Martin Maas (Google DeepMind), Aditya Akella (UT Austin), Sagar Karandikar (UC Berkeley)

  2. RLEval: Methods and RL Environments for Evaluating AI Agents

  3. Supporting Our AI Overlords – Andy Pavlo (CMU), Aaron Katz (ClickHouse), with a panel featuring Ashish Kumar (MongoDB) and Anant Jhingran (IBM)

  4. Agent Skills 2026 – Dawn Song (UC Berkeley), Manling Li (Northwestern), Ross Taylor (General Reasoning), Kanav Garg (Google DeepMind), Yu Su (Ohio State)

  5. Agentic Software Engineering


We have 61 accepted research papers across 5 pillars (Architectural Patterns & Composition, System Optimization & Efficiency, Engineering & Operations, Evaluation & Benchmarking, Security & Privacy).

ACM CAIS 2026 Accepted papers: https://www.caisconf.org/program/2026/papers/

We have 46 accepted demos across the same 5 pillars:

ACM CAIS 2026 Accepted demos: https://www.caisconf.org/program/2026/demos/

We still have a few registrations left, and will soon be announcing keynote speakers. To register, head over to: https://cvent.me/aDamv3


Here’s a quick taste of the sort of papers that will be presented next month!:

Research Papers

ViBench: A Benchmark on Vibe Coding Peter Zhong (Replit & CMU), Pashootan Vaezipoor (Georgian AI Lab), Fuyang Cui (Georgian AI Lab), Vaibhav Kumar (Replit), James Austin (Replit), Azin Asgarian (Georgian AI Lab), Toby Ho (Replit), Paul Inder (Georgian AI Lab), Imen Kedir (Replit), Zhen Li (Replit), Nicholas Ondo (Replit), Asna Shafiq (Georgian AI Lab), Ibrahim Sheikh (Replit), Edouard Sioufi (Replit), Setareh Soltanieh (Georgian AI Lab), Ben Wilde (Georgian AI Lab), Jacky Zhao (Replit), Ryan Carelli (Replit), Heather Miller (CMU), Michele Catasta (Replit)
What is it? The first open-source benchmark for evaluating AI agents on realistic end-to-end vibe coding, derived from production traces across 15 applications.

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain Léo Boisvert (ServiceNow Research, Mila, Polytechnique Montréal), Abhay Puri (ServiceNow Research), Chandra Kiran Reddy Evuru (ServiceNow), Nazanin Mohammadi Sepahvand (ServiceNow Research), Nicolas Chapados (Mila, Polytechnique Montréal), Quentin Cappart (Polytechnique Montréal), Alexandre Lacoste (ServiceNow Research), Krishnamurthy Dvijotham (ServiceNow Research), Alexandre Drouin (ServiceNow Research)
What is it? Demonstrates that AI agent supply chains are vulnerable to backdoor attacks at three distinct layers: finetuning data poisoning, pre-backdoored base models, and a novel environment poisoning vector.

Why Johnny Can't Use Agents: Industry Aspirations vs. User Realities with AI Agents Pradyumna Shome (CMU), Sashreek Krishnan (CMU), Sauvik Das (CMU)
What is it? A study of 102 commercial AI agents and 31 end-user participants that quantifies the gap between what the industry markets and what real users can actually accomplish.

Securing Agents With Tracked Capabilities Martin Odersky (EPFL), Yaoyu Zhao (EPFL), Yichen Xu (EPFL), Oliver Bračevac (EPFL), Cao Nguyen Pham (EPFL)
What is it? A type-system-based safety harness that uses Scala 3's capture checking to statically prevent prompt injection, data leakage, and unintended side effects at the language level.

Willful Disobedience: Automatically Detecting Failures in Agentic Traces Reshabh K Sharma (University of Washington), Shraddha Barke (Microsoft Research), Benjamin Zorn (Microsoft Research)
What is it? AgentPex automatically detects procedural failures in agentic execution traces, catching critical failures that outcome-only benchmarks miss.

Open Agent Specification: A Unified Representation for AI Agents Soufiane Amini, Yassine Benajiba, Cesare Bernardis, Paul Cayet, Hassan Chafi, Abderrahim Fathan, Louis Faucon, Damien Hilloulin, Sungpack Hong, Ingo Kossyk, Tirthankar Lahiri, Tran Minh Son Le, Rhicheek Patra, Sujith Ravi, Jonas Schweizer, Jyotika Singh, Shailender Singh, Weiyi Sun, Kartik Talamadupula, Jerry Xu (Oracle)
What is it? A framework-agnostic declarative language for defining AI agents and multi-agent workflows, enabling portability and interoperability across agent frameworks.


And a taste of the sort of demos that will be presented next month!:

System Demos

Parallel Environments for Agents Shangyin Tan (UC Berkeley), Jialin Zhang (UC Berkeley), Matei Zaharia (UC Berkeley)
What is it? A framework that lets agents branch execution across parallel isolated environment instances, achieving 48% on SWE-bench Pro with a 15-point gain over single-path baselines.

TRACE: A Multi-Agent System for Natural Language-Driven Social Graph Investigation Arunachaleshwar Ravichandran (Meta), Nicole Chen (Meta), Ankitesh Gupta (Meta), Antonios Broumas (Meta), Ioannis Konstantakopoulos (Meta), Seyoung Park (Meta)
What is it? A multi-agent system for social graph forensics that uses natural-language behavior detection and LLM-driven graph exploration, achieving 91.9% discovery of unknown suspicious entities.

cotomi Act: Learning to Automate Work by Watching You Masafumi Oyamada (NEC), Kunihiro Takeoka (NEC), Kosuke Akimoto (NEC), Ryoma Obara (NEC), Masafumi Enomoto (NEC), Haochen Zhang (NEC), Daichi Haraguchi (NEC), Takuya Tamura (NEC)
What is it? A browser agent that learns organizational work patterns by passively observing user browsing, achieving 80.4% on WebArena while building a shared knowledge workspace.

Agent-Aided Design for Dynamic CAD Models Mitch Adler (Unaffiliated), Matthew Russo (MIT), Michael Cafarella (MIT)
What is it? An agentic system for generating dynamic 3D CAD assemblies with moving parts, using external constraint solvers and visual feedback to overcome LLM spatial reasoning limitations.

Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation Debanshu Das (Google), Lavi Nigam (Google), Sunil Kumar Jang Bahadur (Google), Gopala Dhar (Google)
What is it? A compound AI system that enforces brand consistency in generative video production through retrieval-based brand DNA extraction and an adversarial multi-agent QC loop.

SREGym: A Live Training Ground for AI SRE Agents with High-Fidelity Failure Drills Jackson Clark (UIUC), Yiming Su (UIUC), Saad Mohammad Rafid Pial (BUET), Lily Gniedziejko (UIUC), Tianyin Xu (UIUC)
What is it? A live benchmark for AI SRE agents featuring high-fidelity failure drills with fault injection across OS kernels, hardware, and compound multi-event scenarios.

Agent 4: Teamwork and Collaboration for Vibe-Coding Peter Zhong (Replit & CMU), Jacky Zhao (Replit), Edouard Sioufi (Replit), James Austin (Replit), Bri Pool (Replit), Luis Héctor Chávez (Replit), Adi Dahiya (Replit), Will Ernst (Replit), Dawei Feng (Replit), Devin Halladay (Replit), Toby Ho (Replit), Zade Kaylani (Replit), Imen Kedir (Replit), Vaibhav Kumar (Replit), Zhen Li (Replit), Haya Ode (Replit), Nicholas Ondo (Replit), Darsh Patel (Replit), Alec Wang (Replit), Jordan Walke (Replit), Ibrahim Sheikh (Replit), Poorva Potnis (Replit), Michele Catasta (Replit)
What is it? A multi-agent coding architecture that decomposes tasks into a DAG, executes them on isolated forked environments, and merges via incremental rebasing.

You just read issue #4 of ACM Conference on AI and Agentic Systems. You can also browse the full archives of this newsletter.

Share this email:
Share on Twitter
Twitter
Powered by Buttondown, the easiest way to start and grow your newsletter.