Presentation Information

[2E1-GS-5b-06]Experimental Protocol Design for Desktop AI AgentsAuditability via Grounded Perception-Decision-Action Evidence Chains and Error Injection

〇Yuya Sasaki1,3, Akifumi Ito2, Satoshi Kurihara2 (1. ITOCHU Techno-Solutions Corporation, 2. Keio Univerity, 3. Keio AI Center)

Keywords:

AI Agent,Proactive Support,Human-AI cooperation

Desktop AI agents that operate GUIs across browser, mail, documents, and files are rapidly emerging, but evaluation remains fragile because of model nondeterminism, environment drift, and poor instrumentation. We propose a protocol template for controlled studies that combines (i) task cards with risk tags and executable or semi-executable success checks, (ii) an environment manifest with reset procedures, and (iii) an auditability-oriented evidence chain that links grounded observation traces to decisions and UI actions through explicit IDs. We define tiered traceability metrics (T1-T4) and a comparable error-injection procedure with detection definitions and stop rules, enabling computable detection-latency analysis. As reusable artifacts, we release a checklist, JSON templates, an event schema, and reference scripts for validation and metric computation, providing scaffolding for reproducible comparisons of desktop agent designs.