tracingviolet

// about

The independent measurement layer for agent readiness

We got curious about a simple question publishers couldn't answer: when an AI agent uses your service, does the user actually get their answer? So we built the harness to measure it.

Thousands of live, traced agent interactions across real production services later, the pattern is consistent. Models send well-formed calls almost every time. The failures live in tool descriptions, input requirements, error messages, and API behavior — publisher-side, and fixable.

We publish the research openly and sell the diagnosis.

Our guiding principles

01

Measure what matters, not what's easy.

A green API call isn't a served user. We measure the outcome your logs can't — whether the person got their answer.

02

Say only what's true.

We'd rather under-claim than over-promise. Our numbers are point-in-time snapshots, not trends; what we observe, we label as observed.

03

Scientific and data-driven.

We don't lint schemas or score docs from the outside. We run controlled tests across many models and real endpoints, repeat them for stability, and report what the data shows — built on a large, growing dataset of real agent behavior.

04

Specific enough to ship.

Every finding comes with the exact fix — exact schema, exact text, exact error format. Not a category of change; the change itself.

05

Independent by design.

We don't host, generate, or broker tools. We only measure them.

Curious whether agents can use your service?

We test your tools live across multiple models and deliver a report with specific fixes you can ship.

or hello@tracingviolet.dev