A single request header — x-smartflow-provider: anthropic (or openai, gemini, ollama) — now routes any request to any provider regardless of the model name in the body. Smartflow automatically translates the model name to the closest capability-tier equivalent on the target provider.
# Send an OpenAI model name → routed to Anthropic
curl https://YOUR_HOST/v1/chat/completions \
-H "Authorization: Bearer sk-sf-your-key" \
-H "x-smartflow-provider: anthropic" \
-d '{"model":"gpt-5.4","messages":[...]}'
# → Smartflow sends claude-opus-4-6 to Anthropic
# → Response echoes back "gpt-5.4" in OpenAI format
# Route a Claude model to OpenAI
-H "x-smartflow-provider: openai" \
-d '{"model":"claude-sonnet-4-6","messages":[...]}'
# → Smartflow sends gpt-4.1 to OpenAI
A/B testing across providers, automatic failover to a different provider, and cost arbitrage — without changing any client code. One header, any destination.
Full bidirectional tier mappings for all current flagship, mid, and fast models:
| OpenAI Model | Tier | → Anthropic |
|---|---|---|
gpt-5.4-pro | Flagship Pro | claude-opus-4-6 |
gpt-5.4 | Flagship | claude-opus-4-6 |
gpt-4.1 | Upper Mid | claude-sonnet-4-6 |
gpt-4.1-mini / nano | Fast / Cheap | claude-haiku-4-5-20251001 |
o3 / o4 | Reasoning | claude-opus-4-6 |
| Anthropic Model | Tier | → OpenAI |
|---|---|---|
claude-opus-4-6 | Flagship | gpt-5.4 |
claude-sonnet-4-6 | Upper Mid | gpt-4.1 |
claude-haiku-4-5-20251001 | Fast | gpt-4.1-mini |
Ollama local models also mapped: llama3.3, phi4, deepseek-r1, qwen2.5:14b via x-smartflow-provider: ollama.
New dashboard page (/dashboard/traces.html) surfaces every proxied request as a clickable trace row — timestamp, request ID, provider / model, latency, token count, cache outcome, and compliance status at a glance.
Clicking any row expands a five-tab detail panel: Overview (provider, model, latency breakdown, token usage), Prompt / Response (full content), Stage Timeline (per-stage latency bars), Guardrails (policy names, PII flags, risk score), and Routing (chain evaluated, fallback used). Toolbar supports search, provider filter, cache-hit filter, and sortable columns. Auto-refreshes every 30 seconds.
Debug individual requests end-to-end without touching server logs. Identify slow stages, unexpected routing decisions, compliance false positives, and cache misses — all from the dashboard.
GET /metrics on the management port (7778) now returns standard Prometheus text exposition format (v0.0.4). Scrape it directly with Prometheus, Grafana Agent, OTel Collector, Datadog Agent, or Victoria Metrics.
Metrics exported: smartflow_requests_total, smartflow_cache_hits_total, smartflow_cache_misses_total, smartflow_latency_p50/p90/p99_seconds, smartflow_tokens_input/output_total, smartflow_provider_errors_total, smartflow_cache_hit_rate, smartflow_cache_size_bytes.
Computed from live MetaCache Redis stats and the last 1,000 VAS log entries. No external dependency required. Caddy route already added on deployed instances: handle /metrics → localhost:7778.
Drop Smartflow into any existing Prometheus + Grafana observability stack. Wire latency SLOs, cache hit rate alerts, and token budget dashboards without any custom integration work.
GET /api/vas/logs/{id}New endpoint returns the full VASLog record for a single request by ID — prompt content, response, all stage timings, policy IDs, cache metadata, and provider selection details. Used by the Trace UI detail panel and available for direct API access.
{"success":true,"data":VASLog} or 404.New dashboard page (/dashboard/sso_config.html) lets administrators configure Entra ID / Okta / any OIDC provider entirely from the browser — no CLI, no Redis commands, no SSH access needed.
The page covers: provider selection, Tenant ID, Client ID, JWKS URI (auto-derived for Entra), Redirect URI, auto-provisioned team defaults (budget cap, TPM limit, allowed models), and sync-on-every-signin toggle. Reads from GET /api/auth/sso/config, saves via POST /api/auth/sso/config. Built-in Test Connection button, synced teams table, and CLI reference panel for automation.
SSO setup in under 2 minutes from any browser. The only server-side requirement is setting OIDC_CLIENT_SECRET as a container env var.
The management API now implements the complete OIDC authorization-code flow for browser-based single sign-on.
{"auth_url":"..."} for client redirect.code for id_token, syncs groups via SsoGroupSync, issues Smartflow JWT.Flow: sso_login.html → /api/auth/sso/login → IdP redirect → callback → /api/auth/sso/callback → Smartflow JWT → dashboard. Entra groups auto-synced as Smartflow teams on sign-in.
Full enterprise SSO for the dashboard. New employees get access the moment they are added to the Entra group — zero admin touch per user.
sso_login.htmlNew dashboard login page dynamically shows SSO buttons for configured providers (Microsoft, Okta, generic OIDC) based on GET /api/auth/sso/config. Handles the OIDC callback automatically — detects ?code=, exchanges via /api/auth/sso/callback, and redirects to the dashboard on success. Falls back to API key login for non-SSO deployments or service accounts.
x-smartflow-provider overrode the target provider, the proxy called get_api_key() on the new provider but this path returned None when any auth header was present in the original request — resulting in "no API key" errors for cross-provider routing.final_api_key directly from the key store for the target provider, bypassing the auth header check entirely.
Any request using x-smartflow-provider to route Anthropic→OpenAI (or vice versa) returned a 401/500 error. Cross-provider routing was non-functional.
max_completion_tokens for GPT-5.x and O3/O4 Modelsmax_tokens for all models. OpenAI's updated API contract for GPT-5.x and O3/O4 reasoning models requires max_completion_tokens — max_tokens is silently ignored or returns a validation error.max_completion_tokens instead. GPT-4.x and earlier continue to use max_tokens for backward compatibility.
Any cross-provider request targeting GPT-5.4 or GPT-5.4-pro ignored the token limit, potentially returning full untruncated model responses and inflating costs.
| # | Feature / Fix | Area |
|---|---|---|
| 1 | Any-to-any provider routing — x-smartflow-provider header, automatic model tier translation (OpenAI ↔ Anthropic ↔ Ollama) | Routing |
| 2 | March 2026 model table — GPT-5.4/pro, GPT-4.1/mini/nano, Claude Opus 4.6, Sonnet 4.6, Haiku 4.5, O3/O4 reasoning | Models |
| 3 | Per-request Trace UI — traces.html with 5-tab detail panel, stage timeline, guardrail view, routing drill-down | Dashboard |
| 4 | Prometheus /metrics endpoint — GET /metrics on port 7778, standard text/plain 0.0.4, 8 metrics, Grafana-ready | Observability |
| 5 | VAS log detail API — GET /api/vas/logs/{id} returns full VASLog record by request ID | API |
| 6 | SSO config dashboard — sso_config.html: browser-based OIDC config, test connection, synced teams, CLI reference | Identity |
| 7 | OIDC authorization-code flow — GET /api/auth/sso/login + /api/auth/sso/callback: full browser SSO with group sync and JWT issuance | Identity |
| 8 | SSO login page — sso_login.html: dynamic provider buttons, OIDC callback handler, API key fallback | Dashboard |
| 9 | GET + POST /api/auth/sso/config — read/write SSO config in Redis; idempotent, no restart required | API |
| 10 | Fix: Provider key injection on x-smartflow-provider override — cross-provider routing now injects correct API key | Bug Fix |
| 11 | Fix: max_completion_tokens for GPT-5.x / O3/O4 — body transformer emits correct parameter for new OpenAI models | Bug Fix |