One unified endpoint
Claude and OpenAI behind a single secure base URL. Existing SDKs work unchanged — switch providers without switching clients.
Managed multi-provider LLM gateway
Penstock unifies Claude and OpenAI behind one secure endpoint — with intelligent, TPM-aware routing, managed capacity, and zero data retention. Ship on AI without babysitting rate limits, juggling accounts, or wiring up failover.
Zero data retention on the proxy path. We never use your prompts to train models, and never share them across customers.
Private, invite-gated beta.
How it works
Set base_url to your Penstock endpoint. Your existing
OpenAI and Anthropic SDKs work unchanged — no rewrite, no new client.
Requests are routed across providers with TPM-aware logic that respects rate limits, retries with exponential backoff, and fails over gracefully. Spikes get queued and smoothed, not dropped.
Managed, pooled capacity sized to your tier — we run the provider accounts so you don't. Watch your own throughput, latency, and limits in one place.
Features
Claude and OpenAI behind a single secure base URL. Existing SDKs work unchanged — switch providers without switching clients.
TPM-aware routing that respects per-provider rate limits, with automatic retries, exponential backoff, and graceful failover when a provider degrades.
We run the provider accounts so you don't. Your tier gets a managed allocation of throughput; traffic spikes are smoothed against pooled capacity, not dropped.
Zero data retention on the proxy path. No model substitution, a full audit trail, and we never harvest your prompts. We source capacity only from verified first-party providers — your data's custody is our commitment, contractual for beta partners.
An opt-in managed memory layer that enriches each request with your team's own knowledge — no retrieval pipeline to build or operate. Write shorter prompts, get sharper, project-aware answers.
See your throughput, latency (p50/p95), error rates, and limits in one dashboard. Your usage and your data only — clear signal when something needs attention.
Pricing
Three tiers scale with your throughput and the capabilities you need. You're billed on usage, with volume discounts — and you skip the cost of building and operating failover, governance, and a context layer yourself.
For builders shipping a first AI feature.
Most popular
For teams running AI in production.
For compliance-sensitive organizations.
Reliability target: 99.5% uptime, designed to fail over gracefully. Tier capabilities shown describe structure; final pricing is set with you during beta.
Penstock is in private, invite-gated beta. Tell us what you're building and we'll get you an endpoint and a key.