Managed multi-provider LLM gateway

One endpoint.
Every model. Full flow.

Penstock unifies Claude and OpenAI behind one secure endpoint — with intelligent, TPM-aware routing, managed capacity, and zero data retention. Ship on AI without babysitting rate limits, juggling accounts, or wiring up failover.

Zero data retention on the proxy path. We never use your prompts to train models, and never share them across customers.

Request beta access See how it works

Private, invite-gated beta.

Drop-in. SDKs unchanged.

# Point your existing client at Penstock
export OPENAI_BASE_URL="https://api.penstock.run"
export ANTHROPIC_BASE_URL="https://api.penstock.run"

# One call, routed and smoothed across providers
curl "https://api.penstock.run/messages" \
  -H "Authorization: Bearer $PENSTOCK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-5","max_tokens":256,
       "messages":[{"role":"user","content":"Hello from Penstock!"}]}'

How it works

Channel your traffic in three steps.

1
Point at one endpoint

Set base_url to your Penstock endpoint. Your existing OpenAI and Anthropic SDKs work unchanged — no rewrite, no new client.
2
We route and smooth the flow

Requests are routed across providers with TPM-aware logic that respects rate limits, retries with exponential backoff, and fails over gracefully. Spikes get queued and smoothed, not dropped.
3
You get reliable capacity

Managed, pooled capacity sized to your tier — we run the provider accounts so you don't. Watch your own throughput, latency, and limits in one place.

Features

Infrastructure you trust with your traffic.

One unified endpoint

Claude and OpenAI behind a single secure base URL. Existing SDKs work unchanged — switch providers without switching clients.

Intelligent routing & failover

TPM-aware routing that respects per-provider rate limits, with automatic retries, exponential backoff, and graceful failover when a provider degrades.

Managed, pooled capacity

We run the provider accounts so you don't. Your tier gets a managed allocation of throughput; traffic spikes are smoothed against pooled capacity, not dropped.

Zero data retention

Zero data retention on the proxy path. No model substitution, a full audit trail, and we never harvest your prompts. We source capacity only from verified first-party providers — your data's custody is our commitment, contractual for beta partners.

Penstock Context Private beta

An opt-in managed memory layer that enriches each request with your team's own knowledge — no retrieval pipeline to build or operate. Write shorter prompts, get sharper, project-aware answers.

Observability built in

See your throughput, latency (p50/p95), error rates, and limits in one dashboard. Your usage and your data only — clear signal when something needs attention.

Pricing

Usage-based pricing. Pay for what you use.

Three tiers scale with your throughput and the capabilities you need. You're billed on usage, with volume discounts — and you skip the cost of building and operating failover, governance, and a context layer yourself.

Starter

For builders shipping a first AI feature.

One unified endpoint for Claude & OpenAI
Baseline throughput allocation
Intelligent routing, retries & failover
Usage dashboard
Community support

Request access

Pro

For teams running AI in production.

Everything in Starter, plus:
Higher throughput allocation & priority routing
Team virtual keys with per-key budgets
Cost limiting, not just rate limiting
Penstock Context beta
Email support

Request access

Enterprise

For compliance-sensitive organizations.

Everything in Pro, plus:
Custom throughput allocation
SSO & org/role controls
Data-handling review & audit trail
Custom commercial terms
Dedicated support

Reliability target: 99.5% uptime, designed to fail over gracefully. Tier capabilities shown describe structure; final pricing is set with you during beta.

Ready to channel your traffic?

Penstock is in private, invite-gated beta. Tell us what you're building and we'll get you an endpoint and a key.