Unified AI gateway
with intelligent fallback
One interface. Multiple providers. SSE streaming, honest telemetry and server-side orchestration without leaking API keys into the frontend.
Real-time observability
The visual structure mirrors the source design, but every core panel is now backed by real status and usage endpoints.
Provider health
A live snapshot of the routing chain.
Live stream output
A preview of the real transport layer without fake promise metrics.
Built for resilient AI delivery
The cards keep the character of the source visual design while only promising behavior the current system actually implements.
One unified API surface
One public contract hides provider differences without exposing secrets in the browser.
Real-time streaming
The Worker emits normalized SSE events while the interface renders tokens, metadata and errors in one stream.
Orchestrated fallback
The Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock chain activates only when the upstream path cannot serve a request cleanly.
Normalized error handling
A consistent error shape across validation, timeouts, rate limits and interrupted streams simplifies UI and debugging.
Routing visibility
Status, logs and usage are wired to live Worker endpoints instead of decorative placeholder numbers.
Free-tier-first architecture
The frontend stays public while secrets and orchestration remain on the Worker.
How a request moves through the system
This section keeps the rhythm of the source design while describing real Worker orchestration instead of abstract marketing.
Validate request
The contract is validated before any provider call.
Select strategy
Auto and manual modes share the same payload.
Open upstream path
The Worker starts chat or stream on the selected provider.
Normalize events
Tokens and meta events arrive in a predictable shape.
Fallback on failure
Rate limits, timeouts and malformed responses promote the request to the next tier.
Return final metadata
The UI receives final provider, latency, mode and fallback metadata.
Serverless-first without false promises
The architecture reflects the real project layers: web client, Worker and shared contracts. No KV marketing and no durable-state promises where none exist.
Web client
The Next.js client handles routes, rendering and client-side data fetching without access to secrets.
Worker orchestration
The Cloudflare Worker owns chat API, SSE streaming, fallback orchestration and error normalization.
Shared contracts and env
Request, response and error types live in a shared package while public and server env values stay separated.
System flow
Client and server layers remain separated by responsibility.
Live telemetry, not decorative charts
These charts use real usage contracts and show final-provider distribution after fallback.
Requests and latency
Timeseries from the usage endpoint.
Provider distribution
The final route after fallback.