// the audit
An Agent Readiness Audit is a live, multi-model diagnostic of your MCP server or API — traced from user intent to task outcome, delivered as fixes specific enough to ship.
// the report
Four excerpts from a sample diagnostic audit, recreated below.
03 · executive summary
AGENT READINESS SCORE
05 · tracing the agent's journey
07 · model breakdown
Agent readiness isn't one number — it varies by which model is driving. Each dimension is scored per model; cells use the same four-band scale as the rest of this report.
| DIMENSION | claude-opus-4.7 | claude-sonnet-4.6 | gpt-5 | gemini-3.1 | Overall |
|---|---|---|---|---|---|
| Invocation | 95 | 97 | 91 | 93 | 94 |
| Reliability | 93 | 92 | 89 | 90 | 91 |
| Utility | 62 | 59 | 54 | 57 | 58 |
| Efficiency | 83 | 85 | 79 | 81 | 82 |
| Recoverability | 72 | 66 | 76 | 70 | 71 |
| OVERALL | 71 | 69 | 66 | 67 | 68 |
22 · engineering tickets
Each fix is written as an engineering ticket — the problem, the evidence behind it, the change, and how to confirm it shipped. "Done when" confirms the change is live.
docs.create confirms creation in prose. The agent can't parse an artifact id from a sentence, so it can't verify the document exists or cite it — task completion rests on trusting the prose. Agents either re-query for the document or hand the user an unverifiable answer.
31 of 87 docs.create observations across 4 models.
{
"artifact": {
"id": "doc_8f3k2",
"url": "https://arvossi.dev/d/8f3k2"
},
"status": "created"
}
// how we test
Real models from multiple providers receive your tool surface, mirrored faithfully from your real schemas. We then send them realistic prompts built around what your users actually ask. The API calls they make hit your real endpoints, so the failures we find are the same ones real agents run into. And because we trace each task from intent → tool choice → call sequence → outcome, every finding comes with the exact step where things went wrong and a fix that targets it.
// what we need from you
// pricing
One engagement, one price, no tiers, no surprises.
// questions
// book an audit