tracingviolet

MCP vs. A2A vs. Tool Calling: Which Protocol Should You Use?

Google's A2A protocol, Anthropic's MCP, and native function calling all let AI agents interact with external services. They solve different problems, they work at different layers, and you probably need more than one. Here's how they compare — with data on what actually happens at the tool-calling layer.

The agent protocol landscape went from "just use function calling" to three competing standards in under a year. If you're building agent-facing services, you're now choosing between:

  • Function calling (OpenAI, Anthropic, Google) — the model natively calls functions you define
  • MCP (Model Context Protocol) — Anthropic's standard for connecting agents to tools and data
  • A2A (Agent-to-Agent) — Google's standard for agents delegating tasks to other agents

The question everyone asks: "Which one should I use?" The answer is that they operate at different layers, and understanding where each one fits matters more than picking a winner.

// key findings
  • They're layers, not alternatives — function calling is the foundation, MCP connects agents to tools, A2A coordinates agents with other agents
  • The protocol doesn't determine success — the tool surface does — description quality, tool count, and error handling matter identically across all three
  • OpenAI models cap at 128 tools regardless of protocol — this applies whether tools arrive via MCP, function calling, or A2A
  • Models skip tools 13% of the time — ranging from 1% (Claude Sonnet) to 27% (GPT-4o), regardless of delivery protocol
  • Error message quality is the strongest predictor of recovery — when an error tells the agent how to fix the problem, it recovers far more often than when the error is opaque

// finding 01

What do MCP, A2A, and function calling actually do?

Function calling: the foundation layer

Every major model provider supports function calling natively. You define tools as JSON schemas, send them with your prompt, and the model returns structured calls with parameters.

User prompt → Model → tool_call(name, parameters) → Your code executes → Result back to model

This is the base layer. Both MCP and A2A ultimately result in function calls at the model level. When an MCP client presents tools to Claude, Claude sees them as function call schemas. When an A2A agent receives a task, the executing agent uses function calls to do the actual work.

Function calling is not a protocol — it's a model capability. You can use it directly without MCP or A2A.

MCP: agent-to-tool communication

MCP standardizes how an agent connects to tools, data, and context. Instead of hardcoding tool definitions in your application, you connect to MCP servers that expose tools dynamically.

Agent ←→ MCP Client ←→ MCP Server ←→ Your API / database / filesystem

MCP adds: - Discovery — the agent can list available tools at runtime - Standardized transport — stdio for local, HTTP+SSE for remote - Three primitives — tools (functions), resources (data), prompts (templates) - Multi-server support — one agent connects to many MCP servers simultaneously

The key insight: MCP is about the connection between an agent and its tools. It standardizes the plumbing so that any MCP-compatible agent can use any MCP server without custom integration code.

A2A: agent-to-agent communication

A2A standardizes how agents delegate tasks to other agents. Instead of one agent doing everything, a coordinator agent can send tasks to specialized agents.

Coordinator Agent → A2A → Specialist Agent → (uses tools via function calling or MCP) → Result

A2A adds: - Agent discovery — agents publish "Agent Cards" describing their capabilities - Task lifecycle — submit, track, complete, cancel tasks between agents - Streaming — real-time progress updates during long tasks - Multi-modal — tasks can include text, files, structured data

The key insight: A2A is about coordination between agents. The specialist agent still needs tools to do actual work — it just receives instructions from another agent instead of directly from a user.

// finding 02

They're layers, not alternatives

The three protocols operate at different levels:

Three Protocols, Three Layers
Three Protocols, Three Layers
┌─────────────────────────────────────────┐
│              A2A                          │  Agent ↔ Agent
│  (coordination, delegation, discovery)   │
├─────────────────────────────────────────┤
│              MCP                          │  Agent ↔ Tools
│  (tool connection, data, context)        │
├─────────────────────────────────────────┤
│         Function Calling                  │  Model ↔ Functions
│  (native model capability)               │
└─────────────────────────────────────────┘

A production system might use all three: 1. A coordinator agent receives a user request via A2A 2. It delegates subtasks to specialist agents via A2A 3. Each specialist connects to tools via MCP 4. Each tool interaction happens through function calling at the model level

Asking "MCP or A2A?" is like asking "HTTP or TCP?" — they work at different layers and serve different purposes.

// finding 03

When should you use MCP vs A2A vs function calling?

Use function calling directly when:

  • You're building a single-purpose application with a fixed set of tools
  • Your tools don't change at runtime
  • You control both the agent and the tools
  • You don't need interoperability with other agents or tool providers

This is the simplest path. Define your tools as JSON schemas, pass them to the model, handle the calls. No protocol overhead, no servers to run. Most AI applications today work this way.

Use MCP when:

  • You want your tools to work with any MCP-compatible agent (Claude, Cursor, Windsurf, etc.)
  • Your tool surface is dynamic or configurable
  • You're publishing tools for others to use
  • You want agents to discover your tools at runtime
  • You need to connect to multiple tool providers simultaneously

MCP is the right choice if you're a tool publisher — if you want agents other than your own to use your service. It's also the right choice if you're building an agent that needs to connect to tools you don't control.

Use A2A when:

  • You have multiple specialized agents that need to coordinate
  • Tasks are complex enough to benefit from delegation to experts
  • You want agents from different providers to collaborate
  • You're building an orchestration layer above individual tool use

A2A is the right choice if you're building multi-agent systems — where the complexity warrants splitting work across specialized agents rather than loading all tools into one context.

// finding 04

What does cross-model data show about tool calling performance?

Regardless of which protocol you use, the tool-calling layer is where success or failure happens. And that layer has consistent, measurable patterns.

We've tested 4,914 real tool interactions across 5 models from 3 providers (Anthropic, OpenAI, Google). The findings apply whether the tool call arrives via MCP, A2A, or direct function calling — because all three ultimately produce the same thing: a model generating a structured call with parameters.

Finding 1: The model chooses based on description, not protocol

In our testing, tool selection is driven almost entirely by description quality and task-shape matching. The protocol that delivered the tool definition is irrelevant to the model's choice — it sees a JSON schema either way.

This means: whether you expose your API via MCP, wrap it in an A2A agent, or define it as a direct function call, the same description quality rules apply. A poorly described tool loses to a well-described tool regardless of transport layer.

Finding 2: Tool count matters at every layer

OpenAI models reject tool surfaces exceeding 128 tools (as of early 2026). This limit applies whether those tools arrive via MCP, function calling, or embedded in an A2A agent's capabilities. In our testing with 184 dev tools loaded simultaneously, GPT-5.4 refused the entire request.

For A2A architectures, this has an important implication (based on protocol specification analysis, not empirical A2A testing): if a coordinator agent needs to discover and reason about specialist agents, the number of available agents functions like tool count. An A2A discovery registry with 200 agent cards may face similar presentability constraints.

Finding 3: Models refuse tools 13% of the time — regardless of protocol

Across our testing, models answered from training data instead of using available tools 13% of the time. This ranged from 1% (Claude Sonnet 4.6) to 27% (GPT-4o). The refusal rate was consistent whether tools were presented via MCP-style schemas or direct function definitions.

For A2A systems (extrapolating from our tool-calling data, not from A2A testing): if the coordinator agent decides it can handle a task itself rather than delegating, the delegation never happens. This would be the A2A equivalent of tool refusal — the coordinator's training data competing with the specialist agent's capabilities, just as a tool's value competes with the model's existing knowledge.

Finding 4: Error handling matters more than transport

In our corpus, error message quality was the strongest predictor of whether models recovered from failures. Opaque errors ("not found") leave the model with nowhere to go, so a failed call sharply lowers the odds the task still gets done. Actionable errors ("asset not found, use /assets?search= to find valid IDs") give the model a concrete recovery path, and recovery rates climb accordingly.

This is protocol-agnostic. Whether the error comes back through an MCP server, an A2A task failure response, or a direct function call exception, the model's ability to recover depends on the error's content, not its transport.

// finding 05

The practical decision tree

Which Protocol Should You Use?
Which Protocol Should You Use?
Are you building tools for others to use?
  → Yes: Use MCP. It's the interoperability standard.
  → No: Are you coordinating multiple agents?
    → Yes: Use A2A for coordination + MCP (or function calling) for tools.
    → No: Use direct function calling. It's simpler.

For most API publishers reading this, the answer is MCP. You want agents — Claude, GPT, Gemini, Cursor, custom frameworks — to discover and use your service. MCP is the standard that makes that possible without per-agent custom integration.

A2A becomes relevant when you're building the orchestration layer above tools — when the question isn't "can agents use my API?" but "can agents delegate tasks to my agent?"

// finding 06

Where is the agent protocol ecosystem heading?

The protocols are converging toward a layered standard:

  • Function calling is the universal substrate — every model supports it
  • MCP is becoming the standard tool-connection layer — adopted by Anthropic, supported in Claude, Cursor, Windsurf, and growing
  • A2A is emerging as the coordination layer — backed by Google, designed for multi-agent workflows

They're complementary. An API publisher should implement MCP today — it's where the agent traffic is. A2A support becomes relevant when multi-agent orchestration frameworks mature and coordinator agents start looking for specialist agents to delegate to.

The critical insight from our testing: the protocol doesn't determine success. The tool surface does. Task-shape matching, description quality, input friction, error handling, and tool architecture drive outcomes at every layer. A well-described MCP tool and a well-described function call should perform equivalently — both present the same JSON schemas to the model. A poorly described tool fails regardless of which protocol delivers it.

Get the tool surface right first. The protocol is plumbing.


// appendix

Methodology and disclosure

Tool interaction data is based on 4,914 observations across Claude Sonnet 4.6, Claude Opus, GPT-4o, GPT-5.4, Gemini 3.1 Pro, and Gemini Flash, tested against real production API endpoints. All tools were presented via native model function calling interfaces (Anthropic tool use, OpenAI function calling, Google function calling). MCP-specific transport behavior was not tested — findings describe the model's interaction with tool schemas, which is protocol-agnostic.

A2A observations are based on protocol specification analysis and ecosystem assessment, not empirical testing of A2A agent interactions. A2A is early-stage; real-world behavioral data is limited.

Results reflect observed behavior during controlled testing. Protocol ecosystems evolve rapidly — specific capabilities and adoption levels may change.

Want to see how agents interact with your tool surface? Run a free Agent Readiness Scan →

Want this run against your own tools?

We test your endpoints live across multiple models and deliver a report with specific, ship-ready fixes.

Book an audit