AI agents have the ability to make decisions and act autonomously, making them much more powerful than AI copilots and similar tools. The x402 protocol expands these capabilities by implementing a framework for agents to make micropayments for access to paid and gated resources. By doing so, it eliminates much of the friction associated with traditional methods of monetizing web resources, such as the need to create a paid account before accessing an API.
By eliminating friction, the x402 protocol unlocks new capabilities, but it also creates new risks. AI agents aren’t perfect and can be vulnerable to intent forgery attacks, where they’re tricked into performing micropayments that their principals never approved.
What are AP2 Mandates?
Agent-to-Agent Payment Protocol (AP2) mandates are signed, scoped authorizations for an AI agent to perform certain transactions on behalf of a human (“principal”). These mandates support delegation, where an AI agent can allow child agents to perform certain actions within the scope of the authorization.
An AP2 mandate is signed by the principal using their private key, recording their intent on-chain. This mandate is provided to an orchestrator agent, who is responsible for interpreting the intent, delegating tasks, and enforcing the restrictions associated with the mandate. These sub-agents can perform various tasks, including making payments directly.
How Intent Forgery Attacks Work
LLMs can make mistakes. These tools, which underlie AI agents, can hallucinate or be tricked into giving answers or taking actions that deviate from their intended design.
Intent forgery attacks take advantage of this to trick AI agents into performing payments that their principals haven’t authorized. Some potential attack vectors include:
- Prompt Injection: Prompt injection attacks use carefully crafted prompts to trick an LLM or AI agent into providing restricted information or taking unauthorized actions. In this case, an attacker could target the orchestrator’s context to trick it into misinterpreting the intent or issuing instructions to sub-agents that fall outside of the allowed scope.
- Poisoned Tool Outputs: AI agents can use various external tools to define transaction parameters, such as price oracles, address books, and routing APIs. An attacker who spoofs a legitimate tool can modify the returned values to look legitimate to the agent while violating the principal’s intent.
- Cross-Agent Impersonation: A malicious agent may masquerade as a trusted peer within the agent hierarchy. Without strong identity verification, the orchestrator may accept instructions coming from this agent as originating from a verified, trusted counterpart.
- Context Window Manipulation: AI agents, like any LLM, have limited context windows and often rely on summarization to support long-running sessions. An attacker may be able to use prompt injection to manipulate these context windows to bury or override mandate constraints shared early in the session.
Why AP2 Mandates are Vulnerable to Attack
AP2 mandates use natural language to express complex financial goals. These high-level intents are then passed to the orchestrator agent, which is responsible for interpreting the principal’s intent and issuing instructions to sub-agents that carry out these goals.
The large degree of flexibility and latitude granted to the orchestrator agent introduces the risk of misunderstandings. By controlling the context in which the orchestrator interprets a vague intent, an attacker can potentially trick it into issuing instructions that fulfill the “letter of the law” while violating the “spirit of the law.”
Additionally, orchestrators delegate actions and elements of the intent mandate to sub-agents, who also have a degree of latitude in how they perform these actions. Without access to the logic behind a mandate, sub-agents may have a distorted view of the goals and act accordingly. Transactions submitted on-chain are checked for adherence to stated parameters, such as spending caps, but otherwise not verified against the principal’s intent.
Managing the Risk of Intent Forgery Attacks
Intent forgery attacks are mainly a risk due to the difference between high-level goals and the concrete steps used to achieve them. Delegation exacerbates potential misunderstandings, and transactions are generally tied to a mandate as a whole, not the element of it that they are intended to achieve.
Some best practices for protecting against intent forgery attacks include:
- Intent Anchoring: Before an orchestrator or other agent begins decomposing a mandate into tasks, the mandate should be converted from natural language to a structured representation of the intent. A signed object specifying details of the action, asset class, value bounds, and other factors recorded on-chain can help ensure that actions actually match the original intent.
- Mandate-Transaction Bindings: A high-level intent likely requires multiple transactions to implement, and it can be difficult to identify fraudulent transactions without knowing what element of the intent they were supposed to achieve. Transactions (and delegations to sub-agents) should contain a recursive reference that identifies what element of the mandate it is intended to achieve.
- Structured Outputs: AI agents are often allowed to generate transaction parameters independently, which introduces the risk of poisoned outputs. A mandate should include schemas that define the types of transactions that can be used to achieve a particular goal and that specify the parameters that an agent can define to do so.
- Secured High-Value Transactions: Allowing high-value transactions to settle immediately on-chain makes them impossible to reverse. Significant transactions should require multi-party signatures and have a time-lock that allows them to be reversed if deemed anomalous or malicious.
- Agent Reasoning Verification: Agents’ reasoning may be influenced by prompt injection and other means. Trusted execution environments (TEEs) and zero-knowledge proofs (ZKPs) can be used to protect reasoning and prove that transaction parameters follow mandate constraints without revealing sensitive information.
- Contract-Level Enforcement: Often, spending caps and other restrictions on agentic payments are left to the agent to respect and enforce. Instead, smart contracts should be provided with scope constraints that allow them to deny unapproved transactions.
- Anomaly Detection: All transactions should be subject to live monitoring for anomalous and suspicious transactions. This allows rogue agents to be terminated and time-locked high-value transactions to be reversed if needed.
Halborn’s security advisory services offer access to expertise in Web3 and AI to help organizations design, implement, and enforce protections against intent forgery and other risks of agentic micropayments. Get in touch to find out more.
