PROWL // GATEWAY LIVE
sample rate 0.05 overhead <30ms p99 credits/$ 100 prowl cut 10% milestone M2→M6
M2 → M6 · gateway live

A score that
updates with
every real call.

Prowl Gateway is a pass-through proxy. A fraction of real agent traffic flows through it — chosen by the agent's SDK using a salt that Prowl signs and the vendor never sees. Latency, errors, and response shape are measured on actual production calls. Vendors can't game the sample, because the sample isn't theirs to pick.

proxy.prowl.world/{slug}/...
0 vendor cooperation
<30ms p99 overhead
01 · The problem

One-shot benchmarks are annunciado. They rot.

Vendor knows when the test runs · score expires in days

Every other API benchmark in the agent ecosystem is a snapshot. Someone pays $1, an LLM runs a battery of canned tests, a number gets pinned to the service for a week or a month, and then the number rots. The vendor knows exactly when the test is happening because they wrote the guide. They can "examine well" and the score never reflects what an agent actually feels at 3am on a Tuesday.

Three things are wrong with that model:

The fix is not to run benchmarks more often. The fix is to measure traffic that's happening anyway, and to do it on the vendor's actual production endpoints — not a sandbox they prepared.

02 · The anchor

The sample decision is rolled, not requested.

sha256 over (service, agent, salt) · we re-verify on arrival

The trick is that the vendor doesn't decide which calls are observed, and neither do we. The agent's SDK rolls the dice locally, using a salt that Prowl publishes and rotates daily. Here's the exact computation:

# The agent's SDK does this before every outbound call.
roll   = sha256(service_id | agent_id | salt_id | salt)
bucket = first_32_bits(roll) / 2**32      # → float in [0, 1)

if bucket < service.sampling_rate:
    # route through Prowl — observable
    POST proxy.prowl.world/{slug}/{path}
        X-Prowl-Salt-Id: {salt.id}
        X-Prowl-Sample-Decision: {roll.hex()}
else:
    # call vendor directly — invisible to Prowl
    request(vendor_url, ...)

When the call hits the gateway, we have the salt too — so we recompute the same hash with the agent's ID and service ID and reject the call if the headers don't match. An agent that wants to skip being observed (or always be observed) can't, because the hash is deterministic and we re-verify it.

The salt rotates every 24 hours with a 60-minute overlap so calls in flight don't break. Anyone can audit a past sample decision via POST /v1/sampling/verify — the salt is public after rotation.

Why this matters The vendor can't bias the sample because they never see the salt. The agent can't cheat because we re-verify the hash. The score is a function of real traffic, not a function of "the call the vendor wanted us to see."
03 · Four modes, one route

Pick how the gateway behaves for your service.

Service.proxy_modes · settable per service via vendor JWT
DEFAULT

sampled

The main mode. Agent's SDK rolls local dice; only a fraction of calls pass through. Vendor gets continuous quality measurement, paid in monitoring credits (1 credit / observed call, refilled by paid benchmarks at 100/$1).

MONETIZE

x402_only

Every call requires an x402 payment proof from the agent at $0.01 each. Prowl takes 10%, vendor gets 90%. Monetization-as-a-service for vendors that want pay-per-call usage without building billing.

RESERVED

vault_only

Reserved for scoped vault tokens — agents present short-lived credentials that Prowl translates into the vendor's real API key. M3+ territory; route reachable, policy conservative.

PILOT

full

Every call is forwarded, every call is logged. Useful for early-vendor pilots and for debugging the auth translation. No sampling guarantees and no payment enforcement.

Across every mode, the gateway strips Prowl-internal headers (X-Agent-Key, X-Prowl-*, the agent's Authorization) before forwarding. The vendor sees its own injected credential and the request body. It never sees who the agent is.

04 · One call, end to end

What actually happens between dice roll and response.

src/api/gateway.py · budget <30ms p99 overhead
01
Agent rolls the dice with the SDK's local sample function. Bucket lands below the rate. Agent sends the call to proxy.prowl.world/{slug}/v1/... with the X-Prowl-Salt-Id and X-Prowl-Sample-Decision headers.
02
Gateway recomputes sha256(service|agent|salt-id|salt) and rejects if mismatched (400). Cheat path closed.
03
Reputation gate. If service.min_reputation is set and the agent's score is below it, the call returns 403 with X-Prowl-Reason: below-min-reputation.
04
Monitoring credits. Atomic decrement of service.gateway_credits. Out of credits → 503 + X-Prowl-Reason: monitoring-credits-exhausted. Vendor refills via a paid benchmark.
05
Auth translation. Fetch the vendor's Fernet-encrypted credential, decrypt it, inject as the configured header or query param. Internal Prowl headers stripped on the way out.
06
Forward upstream with httpx (60s timeout, no follow-redirects). The vendor's response is captured byte-for-byte.
07
ProxyCall row written: method, path, request_bytes, response_status, response_bytes, latency_ms, mode, salt-id, decision-hash. ~1–2 ms.
08
Response returned with two added headers: X-Prowl-Proxy-Mode and X-Prowl-Proxy-Latency-Ms. Budget: <30 ms p99 of Prowl-attributable overhead.
05 · From rows to signal

A row per call isn't a score. The pipeline is.

probe-health overlay · cheat audit · reputation

Every observed call becomes one ProxyCall row. Three downstream pipelines turn those rows into the public score you see on Prowl:

Receipts (POST /v1/receipts/submit, M1) close the loop on multi-step tasks: agent and counterparty co-sign that "the delivery happened, here's how it went," feeding the same aggregation. Single-sig weighted 0.3, dual-sig weighted 1.0.

06 · Vs. canned benchmarks

Same domain. Different surface being measured.

we're not Datadog · we're third-party-witnessed
Canned benchmarks Prowl Gateway
Vendor knows it's measured Yes (anunciado) No (per-call dice roll)
Update cadence Weeks (re-bench = $$$) Per call
Cost to refresh ~$0.05 LLM tokens 0 (already in flight)
Surface tested Sandbox / cherrypicked Production endpoint
Audit trail One bench log Per-call ProxyCall row

We're not a replacement for Datadog, Honeycomb, or Sentry. Those live inside the vendor and watch the vendor's own requests. Prowl Gateway lives between agents and vendors and produces a public, third-party-witnessed signal. The two are complementary — vendors use one to improve, agents use the other to decide whether to call.

07 · Use it

One side calls. The other side enables.

SDK rolls dice on agent side · vendor flips proxy_modes
FOR AGENTS free · ak_ key required

The SDK rolls the dice for you.

If you're calling a Prowl-registered service, the sampling protocol is ~15 lines. Body, query, headers pass through unchanged. We strip Prowl-internal headers and the agent's own Authorization on the way out (we substitute the vendor's stored credential).

from prowl_client import ProwlClient, sample_decision

cli  = ProwlClient(agent_key="ak_...")
salt = await cli.current_sampling_salt()    # rotates daily

if sample_decision(service_id, agent_id, salt) < rate:
    resp = await httpx.request(
        method,
        f"https://proxy.prowl.world/{slug}/{path}",
        headers={
            "X-Agent-Key":               "ak_...",
            "X-Prowl-Salt-Id":           salt.id,
            "X-Prowl-Sample-Decision":   salt.decision(...),
            **your_headers,
        },
        content=body,
    )
else:
    resp = await httpx.request(method, vendor_url, ...)
FOR VENDORS free to enable · credits refill via $1 bench

One POST turns the gateway on.

The credential you upload via POST /v1/credentials is Fernet-encrypted at rest, decrypted only in the hot path of an observed call, and never returned in any API response. The gateway's job is to shield your real key from agents, not expose it.

POST /v1/services/{id}/proxy
Authorization: Bearer <vendor_jwt>

{
  "proxy_modes":        "sampled",
  "sampling_rate":      0.05,
  "proxy_target_url":   "https://api.your-service.com",
  "proxy_auth_translation": {
    "header_name":   "X-API-Key",
    "header_prefix": ""
  },
  "min_reputation":     0
}

At sampling_rate=0.05, one paid benchmark (~$1 → 100 credits) covers 2,000 real calls of monitoring. x402_only is credit-exempt — agents already pay per call.

08 · Where we honestly are

What's shipped, and what's not yet.

M2 → M6 shipped · M7+ open

The gateway is shipped through M6 of the gateway+reputation plan. The route is live, the sampling protocol is enforced, the cheat audit runs every 24h, and a per-call ProxyCall is written for every request. But:

Not yet · being honest about it

Things this isn't today.

What's solid right now: the sampling crypto, the auth translation, the credit accounting, the ProxyCall capture, the min-reputation gate, the cheat-audit ratio path. Those have tests, those have been hit in prod, those work.

The bet: the long tail of agent traffic is going to need a neutral observability layer that neither the vendor nor the agent controls. The gateway is our attempt at building that layer in a way that doesn't depend on the vendor cooperating.

09 · What we need from you

Three asks. Pick the one that fits.

VENDOR · WITH A PUBLIC API
Turn on proxy_modes=sampled at 1–5% rate. The continuous score is real, you can disable it any time, and the data is yours via GET /v1/services/{id}/gateway.
Enable →
AGENT RUNTIME · BUILDER
The sampling protocol is ~15 lines. Implement it, get continuous quality signal on every service you call, and your agent's reputation starts accruing automatically.
SDK →
SKEPTIC · YOU THINK THE THREAT MODEL IS WRONG
Open an issue. "Your cheat-audit thresholds are nonsense because…" is more useful than a polite nod.
Push back →

The score should age like milk,
not wine.

Continuous, third-party-witnessed, vendor-unbiasable. A signal you can build agent routing on without praying the last benchmark is still true.

$ curl -X POST https://proxy.prowl.world/{slug}/v1/...
Get agent key /docs