A live AI gateway with real fallback

Unified AI gateway
with intelligent fallback

One interface. Multiple providers. SSE streaming, honest telemetry and server-side orchestration without leaking API keys into the frontend.

Open workspace Open documentation

Live streaming

Multi-provider routing

Server-side secrets only

Response stream active

~544ms

Groq Free

SambaNova Cloud

Cerebras Inference

Gemini API

OpenRouter Free

Mock / Demo

[meta] provider: groq

[meta] model: llama-3.1-8b-instant

[token] RelayForge keeps the response moving...

[meta] latency: 544ms

Platform dashboard

Real-time observability

The visual structure mirrors the source design, but every core panel is now backed by real status and usage endpoints.

Total requests

128

Live gateway traffic from the usage endpoint.

Average latency

544 ms

Calculated from completed Worker responses.

Fallback activations

Success rate94.5%

Provider health

A live snapshot of the routing chain.

Live snapshot

Groq Free

420ms avg

SambaNova Cloud

610ms avg

Cerebras Inference

250ms avg

Gemini API

930ms avg

OpenRouter Free

860ms avg

Mock / Demo

Always ready

The fallback chain is confirmed by the status endpoint.

Live stream output

A preview of the real transport layer without fake promise metrics.

Streaming

[meta]request_count: 128

[meta]provider: groq

[meta]model: llama-3.1-8b-instant

[meta]mode: degraded

[token]The response flows through the active provider chain without losing metadata.

[token]If the primary path degrades, RelayForge promotes the request before the user sees a dead stream.

[meta]latency: 544ms

[meta]fallbacks: 19

[done]stream_complete

SSE

121 successful

544ms

Built for resilient AI delivery

The cards keep the character of the source visual design while only promising behavior the current system actually implements.

One unified API surface

One public contract hides provider differences without exposing secrets in the browser.

Real-time streaming

The Worker emits normalized SSE events while the interface renders tokens, metadata and errors in one stream.

Orchestrated fallback

The Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock chain activates only when the upstream path cannot serve a request cleanly.

Normalized error handling

A consistent error shape across validation, timeouts, rate limits and interrupted streams simplifies UI and debugging.

Routing visibility

Status, logs and usage are wired to live Worker endpoints instead of decorative placeholder numbers.

Free-tier-first architecture

The frontend stays public while secrets and orchestration remain on the Worker.

How a request moves through the system

This section keeps the rhythm of the source design while describing real Worker orchestration instead of abstract marketing.

Validate request

The contract is validated before any provider call.

Select strategy

Auto and manual modes share the same payload.

Open upstream path

The Worker starts chat or stream on the selected provider.

Normalize events

Tokens and meta events arrive in a predictable shape.

Fallback on failure

Rate limits, timeouts and malformed responses promote the request to the next tier.

Return final metadata

The UI receives final provider, latency, mode and fallback metadata.

Average latency is currently holding at 544 ms.

Engineering architecture

Serverless-first without false promises

The architecture reflects the real project layers: web client, Worker and shared contracts. No KV marketing and no durable-state promises where none exist.

Web client

The Next.js client handles routes, rendering and client-side data fetching without access to secrets.

App Router with a shared workspace shell

Client-side language and theme switching

Worker orchestration

The Cloudflare Worker owns chat API, SSE streaming, fallback orchestration and error normalization.

/chat, /stream, /status, /logs and /usage routes

Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock fallback

Shared contracts and env

Request, response and error types live in a shared package while public and server env values stay separated.

NEXT_PUBLIC_API_BASE_URL stays client-safe

Provider keys remain Worker-only

System flow

Client and server layers remain separated by responsibility.

Live data path

User client

Browser + UI shell

Next.js web

Routes and data fetching

Cloudflare Worker

Chat, streaming, fallback and secrets

Groq Free

SambaNova Cloud

Cerebras Inference

Gemini API

OpenRouter Free

Mock / Demo

providers in the chain

544 ms

current average latency

94.5%

successful completions

Live telemetry, not decorative charts

These charts use real usage contracts and show final-provider distribution after fallback.

Requests and latency

Timeseries from the usage endpoint.

Provider distribution

The final route after fallback.

Success rate

94.5%

121/128

Latest request volume

last timeseries point

Providers online

6/6

available for routing

Streaming paths

support SSE

A live AI gateway with real fallback

Unified AI gateway
with intelligent fallback

One interface. Multiple providers. SSE streaming, honest telemetry and server-side orchestration without leaking API keys into the frontend.

Open workspace Open documentation

Live streaming

Multi-provider routing

Server-side secrets only

Response stream active

~544ms

Groq Free

SambaNova Cloud

Cerebras Inference

Gemini API

OpenRouter Free

Mock / Demo

[meta] provider: groq

[meta] model: llama-3.1-8b-instant

[token] RelayForge keeps the response moving...

[meta] latency: 544ms

Platform dashboard

Real-time observability

The visual structure mirrors the source design, but every core panel is now backed by real status and usage endpoints.

Total requests

128

Live gateway traffic from the usage endpoint.

Average latency

544 ms

Calculated from completed Worker responses.

Fallback activations

Success rate94.5%

Provider health

A live snapshot of the routing chain.

Live snapshot

Groq Free

420ms avg

SambaNova Cloud

610ms avg

Cerebras Inference

250ms avg

Gemini API

930ms avg

OpenRouter Free

860ms avg

Mock / Demo

Always ready

The fallback chain is confirmed by the status endpoint.

Live stream output

A preview of the real transport layer without fake promise metrics.

Streaming

[meta]request_count: 128

[meta]provider: groq

[meta]model: llama-3.1-8b-instant

[meta]mode: degraded

[token]The response flows through the active provider chain without losing metadata.

[token]If the primary path degrades, RelayForge promotes the request before the user sees a dead stream.

[meta]latency: 544ms

[meta]fallbacks: 19

[done]stream_complete

SSE

121 successful

544ms

Built for resilient AI delivery

The cards keep the character of the source visual design while only promising behavior the current system actually implements.

One unified API surface

One public contract hides provider differences without exposing secrets in the browser.

Real-time streaming

The Worker emits normalized SSE events while the interface renders tokens, metadata and errors in one stream.

Orchestrated fallback

The Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock chain activates only when the upstream path cannot serve a request cleanly.

Normalized error handling

A consistent error shape across validation, timeouts, rate limits and interrupted streams simplifies UI and debugging.

Routing visibility

Status, logs and usage are wired to live Worker endpoints instead of decorative placeholder numbers.

Free-tier-first architecture

The frontend stays public while secrets and orchestration remain on the Worker.

How a request moves through the system

This section keeps the rhythm of the source design while describing real Worker orchestration instead of abstract marketing.

Validate request

The contract is validated before any provider call.

Select strategy

Auto and manual modes share the same payload.

Open upstream path

The Worker starts chat or stream on the selected provider.

Normalize events

Tokens and meta events arrive in a predictable shape.

Fallback on failure

Rate limits, timeouts and malformed responses promote the request to the next tier.

Return final metadata

The UI receives final provider, latency, mode and fallback metadata.

Average latency is currently holding at 544 ms.

Engineering architecture

Serverless-first without false promises

The architecture reflects the real project layers: web client, Worker and shared contracts. No KV marketing and no durable-state promises where none exist.

Web client

The Next.js client handles routes, rendering and client-side data fetching without access to secrets.

App Router with a shared workspace shell

Client-side language and theme switching

Worker orchestration

The Cloudflare Worker owns chat API, SSE streaming, fallback orchestration and error normalization.

/chat, /stream, /status, /logs and /usage routes

Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock fallback

Shared contracts and env

Request, response and error types live in a shared package while public and server env values stay separated.

NEXT_PUBLIC_API_BASE_URL stays client-safe

Provider keys remain Worker-only

System flow

Client and server layers remain separated by responsibility.

Live data path

User client

Browser + UI shell

Next.js web

Routes and data fetching

Cloudflare Worker

Chat, streaming, fallback and secrets

Groq Free

SambaNova Cloud

Cerebras Inference

Gemini API

OpenRouter Free

Mock / Demo

providers in the chain

544 ms

current average latency

94.5%

successful completions

Live telemetry, not decorative charts

These charts use real usage contracts and show final-provider distribution after fallback.

Requests and latency

Timeseries from the usage endpoint.

Provider distribution

The final route after fallback.

Success rate

94.5%

121/128

Latest request volume

last timeseries point

Providers online

6/6

available for routing

Streaming paths

support SSE

Unified AI gatewaywith intelligent fallback

Real-time observability

Provider health

Live stream output

Built for resilient AI delivery

One unified API surface

Real-time streaming

Orchestrated fallback

Normalized error handling

Routing visibility

Free-tier-first architecture

How a request moves through the system

Validate request

Select strategy

Open upstream path

Normalize events

Fallback on failure

Return final metadata

Serverless-first without false promises

Web client

Worker orchestration

Shared contracts and env

System flow

Live telemetry, not decorative charts

Requests and latency

Provider distribution

Unified AI gatewaywith intelligent fallback

Real-time observability

Provider health

Live stream output

Built for resilient AI delivery

One unified API surface

Real-time streaming

Orchestrated fallback

Normalized error handling

Routing visibility

Free-tier-first architecture

How a request moves through the system

Validate request

Select strategy

Open upstream path

Normalize events

Fallback on failure

Return final metadata

Serverless-first without false promises

Web client

Worker orchestration

Shared contracts and env

System flow

Live telemetry, not decorative charts

Requests and latency

Provider distribution

Unified AI gateway
with intelligent fallback

Unified AI gateway
with intelligent fallback