RelayForge AI
Unified AI Gateway
FeaturesArchitectureDocsStatus
Get Started
RelayForge AI
Unified AI Gateway
FeaturesArchitectureDocsStatus
Get Started
A live AI gateway with real fallback

Unified AI gateway
with intelligent fallback

One interface. Multiple providers. SSE streaming, honest telemetry and server-side orchestration without leaking API keys into the frontend.

Open workspaceOpen documentation
Live streaming
Multi-provider routing
Server-side secrets only
Response stream active
~544ms
Groq Free
SambaNova Cloud
Cerebras Inference
Gemini API
OpenRouter Free
Mock / Demo
[meta] provider: groq
[meta] model: llama-3.1-8b-instant
[token] RelayForge keeps the response moving...
[meta] latency: 544ms
Platform dashboard

Real-time observability

The visual structure mirrors the source design, but every core panel is now backed by real status and usage endpoints.

Total requests
128
Live gateway traffic from the usage endpoint.
Average latency
544 ms
Calculated from completed Worker responses.
Fallback activations
19
Success rate94.5%

Provider health

A live snapshot of the routing chain.

Live snapshot
Groq Free
420ms avg
SambaNova Cloud
610ms avg
Cerebras Inference
250ms avg
Gemini API
930ms avg
OpenRouter Free
860ms avg
Mock / Demo
Always ready
The fallback chain is confirmed by the status endpoint.

Live stream output

A preview of the real transport layer without fake promise metrics.

Streaming
[meta]request_count: 128
[meta]provider: groq
[meta]model: llama-3.1-8b-instant
[meta]mode: degraded
[token]The response flows through the active provider chain without losing metadata.
[token]If the primary path degrades, RelayForge promotes the request before the user sees a dead stream.
[meta]latency: 544ms
[meta]fallbacks: 19
[done]stream_complete
SSE
121 successful
544ms

Built for resilient AI delivery

The cards keep the character of the source visual design while only promising behavior the current system actually implements.

One unified API surface

One public contract hides provider differences without exposing secrets in the browser.

Real-time streaming

The Worker emits normalized SSE events while the interface renders tokens, metadata and errors in one stream.

Orchestrated fallback

The Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock chain activates only when the upstream path cannot serve a request cleanly.

Normalized error handling

A consistent error shape across validation, timeouts, rate limits and interrupted streams simplifies UI and debugging.

Routing visibility

Status, logs and usage are wired to live Worker endpoints instead of decorative placeholder numbers.

Free-tier-first architecture

The frontend stays public while secrets and orchestration remain on the Worker.

How a request moves through the system

This section keeps the rhythm of the source design while describing real Worker orchestration instead of abstract marketing.

01

Validate request

The contract is validated before any provider call.

02

Select strategy

Auto and manual modes share the same payload.

03

Open upstream path

The Worker starts chat or stream on the selected provider.

04

Normalize events

Tokens and meta events arrive in a predictable shape.

05

Fallback on failure

Rate limits, timeouts and malformed responses promote the request to the next tier.

06

Return final metadata

The UI receives final provider, latency, mode and fallback metadata.

Average latency is currently holding at 544 ms.
Engineering architecture

Serverless-first without false promises

The architecture reflects the real project layers: web client, Worker and shared contracts. No KV marketing and no durable-state promises where none exist.

Web client

The Next.js client handles routes, rendering and client-side data fetching without access to secrets.

App Router with a shared workspace shell
Client-side language and theme switching

Worker orchestration

The Cloudflare Worker owns chat API, SSE streaming, fallback orchestration and error normalization.

/chat, /stream, /status, /logs and /usage routes
Groq -> SambaNova -> Cerebras -> Gemini -> OpenRouter -> Mock fallback

Shared contracts and env

Request, response and error types live in a shared package while public and server env values stay separated.

NEXT_PUBLIC_API_BASE_URL stays client-safe
Provider keys remain Worker-only

System flow

Client and server layers remain separated by responsibility.

Live data path
User client
Browser + UI shell
Next.js web
Routes and data fetching
Cloudflare Worker
Chat, streaming, fallback and secrets
Groq Free
SambaNova Cloud
Cerebras Inference
Gemini API
OpenRouter Free
Mock / Demo
6
providers in the chain
544 ms
current average latency
94.5%
successful completions

Live telemetry, not decorative charts

These charts use real usage contracts and show final-provider distribution after fallback.

Requests and latency

Timeseries from the usage endpoint.

Provider distribution

The final route after fallback.

Success rate
94.5%
121/128
Latest request volume
25
last timeseries point
Providers online
6/6
available for routing
Streaming paths
6
support SSE
Next step

Move from the landing page into the working workspace

The interface is already connected to the Worker, so you can send requests and inspect provider status, usage and fallback history live.

Open workspaceAPI documentation
RelayForge AI
Unified AI Gateway

A unified AI gateway with streaming, fallback orchestration and real observability on top of the Worker API.

SSE
Worker-held secrets

Product

FeaturesArchitectureDocumentationWorkspace

Runtime

Next.js App Router
Cloudflare Worker
TanStack Query
@relayforge/shared

System boundaries

Public env only for the API base URL
Logs and usage are in-memory in the current build
The mock provider preserves a working demo path
© 2026 RelayForge AI
The design is now bound to the live product layer: routing, streaming, status, logs and usage.