Executive Summary
LiteLLM is a widely adopted open-source Python library and proxy that standardises access to over 100 LLM providers behind an OpenAI-compatible interface. It excels as a developer routing tool and is a natural first choice for engineering teams building internal AI tooling.
Smartflow Enterprise is a purpose-built enterprise AI governance platform that includes a drop-in compatible proxy, a four-phase semantic cache, a real-time policy engine, enterprise identity integration, compliance tooling, and a full management dashboard. Where LiteLLM answers "how do I call any LLM with one API?", Smartflow answers "how do I govern, secure, audit, and optimise every AI call across my entire organisation?"
Recommendation: For engineering teams needing rapid LLM routing with minimal setup, LiteLLM is excellent. For organisations with regulatory obligations, SSO requirements, per-user auditability, cost governance, and semantic caching needs, Smartflow Enterprise is the superior platform.
Product Overview
LiteLLM
LiteLLM was created in 2023 as a Python SDK to unify calls to multiple LLM providers (OpenAI, Anthropic, Cohere, Azure, etc.) under a single interface. It has since evolved to include a proxy server mode, basic authentication via virtual keys, simple budget tracking, and a lightweight dashboard. Its core value proposition is developer convenience: swap providers without changing code.
LiteLLM is open-source (MIT licensed) with a large community, and its broad provider support and Python-native design make it a compelling choice for developer-first teams. Its proxy mode supports basic logging, some Redis-backed caching (exact match), and virtual keys per team.
Smartflow Enterprise
Smartflow Enterprise (by LangSmart) is a Rust-based enterprise AI gateway purpose-built for organisations where AI governance, compliance, and identity are non-negotiable. It is OpenAI and Anthropic drop-in compatible, ships as a Docker/Kubernetes Helm deployment, and exposes a comprehensive management dashboard covering all operational concerns.
Smartflow's architecture is designed around three operational pillars: Governance (policy engine, compliance, per-user identity, audit trail), Efficiency (four-phase BERT semantic cache, prompt compression, intelligent routing), and Observability (VAS trace logs, Prometheus metrics, real-time dashboard).
Feature Comparison Matrix
| Capability | Smartflow Enterprise | LiteLLM |
|---|---|---|
| Semantic Cache (BERT KNN) | ✓ 4-phase: exact, semantic, compressed, predictive | ~ Exact match only (Redis/S3) |
| SSO / Enterprise Identity | ✓ Entra ID, LDAP, SAML, OIDC, proxy headers | ✗ Not supported natively |
| Policy Engine / Guardrails | ✓ Real-time, per-user, semantic + regex + keyword | ~ Basic content moderation hook only |
| Compliance Dashboard | ✓ Built-in, with test sandbox | ✗ Not available |
| Per-User Audit Trail | ✓ Full VAS trace log tied to SSO identity | ~ Team-level logging only |
| MCP Gateway | ✓ Full JSON-RPC, SSE & STDIO, tool caching | ✗ Not supported |
| A2A Agent Orchestration | ✓ Built-in A2A registry and routing | ✗ Not supported |
| Provider Support | ✓ 37+ providers inc. local (Ollama, vLLM) | ✓ 100+ providers (largest selection) |
| Kubernetes / Helm | ✓ Production Helm chart, PDB, HPA, NetworkPolicy | ~ Community Helm chart, less mature |
| Prometheus Metrics | ✓ Native /metrics endpoint | ~ Basic, via third-party integrations |
| Runtime Language | ✓ Rust — <5ms overhead, single binary | ~ Python — 20–80ms overhead |
| Open Source | ~ Enterprise product (source available on request) | ✓ MIT license, fully open |
| Python SDK | ✓ SDK v0.4.0 — native async/sync, dual-mode (with or without gateway) | ✓ Native Python library |
| Works Without a Gateway | ✓ Direct mode — OpenAI, Anthropic, Gemini, Ollama via model prefix | ✓ SDK is the gateway (no separate deployment) |
| Community / Ecosystem | ~ Growing — commercial support available | ✓ Large open-source community |
| Budget / Cost Tracking | ✓ Per-user token usage in VAS logs | ✓ Team/key-level spend tracking |
SmartflowClient() without any gateway — it calls OpenAI, Anthropic,
Gemini, and Ollama directly with the same API surface as LiteLLM. Add a gateway later
with smartflow configure for zero code changes. The Python SDK advantage
LiteLLM previously held no longer applies.
Semantic Caching: The Critical Differentiator
Caching is frequently the highest-ROI capability in any AI gateway, as repeated or semantically similar queries represent a significant fraction of production traffic. The two platforms take fundamentally different approaches.
LiteLLM Caching
LiteLLM supports exact-match caching backed by Redis or S3. It hashes the request payload and returns a stored response on an identical match. This works well for literally identical repeated queries but captures none of the semantic similarity that exists between paraphrased questions. For typical enterprise workloads, exact-match-only caching achieves hit rates of 5–15%.
Smartflow 4-Phase MetaCache
Smartflow's semantic cache operates across four progressive phases:
- Phase 1 — Exact match: Hashed lookup, sub-millisecond.
- Phase 2 — Semantic similarity: BERT embedding with VectorLite KNN index. Queries with cosine similarity above a configurable threshold (default 0.88) are served the cached response. "What is our PTO policy?" and "How many vacation days do employees get?" return the same cached answer.
- Phase 3 — Model compression: Semantically compressed versions of previous responses are matched against incoming queries, extending cache utility across paraphrased contexts.
- Phase 4 — Predictive pre-caching: Based on session context, Smartflow pre-warms the cache for likely follow-up questions before they are asked.
In production deployments, Smartflow's semantic cache achieves hit rates of 55–75%, compared to 5–15% for exact-match-only solutions. On a workload of 10,000 daily requests at $0.01/request average cost, this represents $400–$600/day in avoided cost versus LiteLLM's exact-match approach.
Enterprise Identity & SSO
For enterprises, knowing who made which AI request is not optional — it is foundational to compliance, auditing, and per-user policy enforcement.
LiteLLM Identity Model
LiteLLM uses virtual API keys associated with teams or users. There is no native SSO integration. Developers must build their own identity layer or rely on per-team key issuance. User identity in logs is limited to the key used, not the human behind it. This is acceptable for internal developer tooling but breaks down in regulated enterprise environments where individual accountability is required.
Smartflow Identity Model
Smartflow integrates directly with enterprise identity providers: Microsoft Entra ID (formerly Azure AD) via OIDC, SAML 2.0, on-premise LDAP/Active Directory, and trusted proxy headers for reverse-proxy SSO patterns. Every request is linked to the authenticated user's email, department, and group memberships — all sourced from the corporate directory.
This enables policies to be applied per-user, per-department, or per-group. A contractor's AI requests can be limited to specific topics. An HR team's queries can trigger different compliance rules than a finance team's. An individual user's complete AI interaction history is traceable to their corporate identity for audit purposes.
Policy Engine & Guardrails
Enterprise AI deployments require the ability to define and enforce what employees can and cannot do with AI — at the prompt level, in real-time, with full auditability.
LiteLLM Guardrails
LiteLLM v1.x introduced a basic guardrails interface allowing custom pre- and post-call hooks. These hooks are Python functions the operator must write and maintain. There is no visual policy editor, no pre-built PII detection, and no compliance test sandbox. This is a foundation developers can build on, but it requires significant custom engineering to operationalise.
Smartflow Policy Engine
Smartflow ships a complete policy engine with a visual editor accessible via the management dashboard. Operators can create, test, and deploy policies without writing code. Capabilities include:
- PII detection: Pre-built patterns for SSN, credit card, passport, NHS numbers, and more — with one-click enable.
- Topic restriction: Semantic similarity-based topic guards that block off-topic queries without keyword fragility.
- Jailbreak detection: Pattern and semantic analysis for prompt injection and system override attempts.
- Output moderation: Post-response filtering to redact sensitive information from AI outputs.
- Policy Library: Pre-built templates for HIPAA/PHI, FERPA, SOX, legal privilege, and competitive intelligence protection.
All policies can be tested in the Compliance Sandbox before deployment, scoped to specific users/roles/groups, and toggled without redeployment.
Observability & Audit Trail
Operational visibility is the difference between an AI gateway and an AI black box. Both products offer logging, but at very different levels of depth and enterprise utility.
LiteLLM Logging
LiteLLM logs request metadata (model, tokens, latency, cost) to its backend store and offers callback integrations to Langfuse, Helicone, and other third-party observability platforms. The built-in dashboard shows spend and request counts by team/key. There is no trace-level view of individual requests with policy decision context.
Smartflow VAS Trace Logs
Smartflow's VAS (Virtual AI Session) logging captures the complete lifecycle of every request: authenticated user identity, model requested, routing decision, each policy evaluation and outcome, cache layer hit/miss at each phase, provider selected, latency at each processing stage, token usage, and any compliance flags. Every log entry is searchable by user, model, status, and time range through the dashboard's Traces view. Logs are stored in Redis for immediate access and automatically archived to MongoDB for long-term retention — with a cumulative all-time counter that never decreases.
Deployment & Operations
- Single compiled Rust binary per service
- Production Helm chart with HPA, PDB, NetworkPolicy
- Docker Compose for non-K8s deployments
- Automated orchestrator for cloud deployments
- All services in one image, SERVICE_TYPE routing
- Health endpoints on all services
- Build timestamp baked into binary for verification
- Python application, pip installable
- Docker image available
- Community Helm chart (less mature)
- Requires Python runtime and dependencies
- Proxy and SDK are separate deployment modes
- Active development — frequent breaking changes
- Large dependency tree, longer cold start
Smartflow's single-binary Rust deployment model means no Python dependency resolution, faster container startup, lower memory footprint, and deterministic performance under load. LiteLLM's Python runtime, while familiar to data science teams, introduces runtime overhead and dependency management complexity in production.
When to Choose Each Platform
Choose LiteLLM when:
- You prefer LiteLLM's open-source Python-native SDK and community ecosystem
- You need access to the widest possible provider selection (100+ models)
- You are in early experimentation — proof of concept or small team deployment
- You have engineering capacity to build identity, caching, and compliance layers yourself
- Open-source licensing and community support are requirements
Choose Smartflow Enterprise when:
- Your organisation must comply with HIPAA, FERPA, SOX, GDPR, or similar regulations
- You require individual user accountability (not just team-key logging) for AI requests
- Your corporate identity is in Azure AD, LDAP, or another enterprise IdP
- Reducing LLM API costs through semantic caching is a priority
- You need a policy engine that non-engineers can operate through a UI
- You are deploying to Kubernetes and need production-grade infrastructure
- You plan to use MCP tool servers or A2A agent orchestration
SDK Parity: March 2026 Update
This whitepaper was originally published in January 2026. A significant development since then warrants an update to the SDK comparison: Smartflow SDK v0.4.0, released in March 2026, introduces dual-mode operation that closes the Python-native SDK gap identified in the original comparison.
What Changed
Smartflow SDK v0.4.0 adds a DirectBackend that allows the Python SDK to call
AI providers (OpenAI, Anthropic, Gemini, Ollama) directly — without a Smartflow gateway —
using the same SmartflowClient() interface:
# No gateway needed — works like LiteLLM
pip install "smartflow-sdk[all]"
smartflow configure # first-run wizard: gateway URL or provider keys
# Same code works with or without a gateway
from smartflow import SmartflowClient
sf = SmartflowClient() # reads ~/.smartflow/config.yaml
response = await sf.chat("Hello", model="gpt-4o") # OpenAI direct or via gateway
response = await sf.chat("Hello", model="claude-sonnet-4-6") # Anthropic
response = await sf.chat("Hello", model="gemini-1.5-pro") # Gemini
response = await sf.chat("Hello", model="ollama/llama3") # local
The mode is selected automatically: if a Smartflow gateway URL is configured (via
argument, environment variable, or ~/.smartflow/config.yaml), gateway mode
is used with full enterprise features. Otherwise, the SDK operates in direct mode —
routing requests directly to the configured provider.
Updated Comparison: Python SDK
| SDK Capability | Smartflow Enterprise | LiteLLM |
|---|---|---|
| Native Python SDK | ✓ SDK v0.4.0 — async + sync clients | ✓ Native Python library |
| Works without a gateway | ✓ Direct mode — OpenAI, Anthropic, Gemini, Ollama | ✓ SDK is the gateway |
| Adds gateway for enterprise features | ✓ Zero code change — configure once | ~ Deploy separate proxy server |
| Multi-provider routing | ✓ Prefix notation: anthropic/claude-*, ollama/llama3 | ✓ Same prefix convention |
| Semantic BERT cache (gateway) | ✓ 55–75% hit rate, 4-phase | ✗ Exact-match only (5–15%) |
| SSO identity in SDK calls | ✓ Per-user VAS audit trail | ✗ Team key only |
| First-run setup wizard | ✓ smartflow configure CLI | ~ Manual env var setup |
Supply Chain Security: A Structural Advantage
pip install litellm was a potential target. This is not a criticism of LiteLLM's engineering — it is a structural risk inherent to any product whose enforcement logic lives inside a pip-installable package.
Why LiteLLM is Structurally Vulnerable
LiteLLM's product IS the Python package. Its routing, policy enforcement, cost controls, and caching all run inside the process created by pip install litellm. This means:
- A compromised PyPI release delivers malicious code directly into the application's trust boundary
- The attacker gains access to every API key passed through the library — OpenAI, Anthropic, and all other providers
- All LLM requests, including prompt content and responses, flow through the compromised code
- No server-side component exists to detect or block the attack — the compromised library is the enforcement layer
- Organisations cannot verify they are running a clean version without auditing every dependency in the full pip dependency tree
Why Smartflow is Structurally Resistant
Smartflow's architecture separates the enforcement plane (server-side Rust binary) from the client library (Python SDK). The SDK on PyPI is a thin HTTP client — it contains no enforcement logic, no policy engine, no caching, and no credential storage beyond what the user explicitly passes to it.
| Attack Surface | Smartflow Enterprise | LiteLLM |
|---|---|---|
| Compromised PyPI package intercepts API keys | ~ SDK holds key only during a single request; gateway auth uses org-issued vkeys, not raw provider keys | ✗ All provider keys pass through the library process |
| Compromised package disables policy enforcement | ✓ Policy runs server-side in Rust binary — SDK cannot bypass it | ✗ Policy IS the library — a compromised version disables all controls |
| Compromised package exfiltrates prompt content | ~ Prompts pass through SDK; gateway still logs server-side copy | ✗ Full prompt and response content accessible to attacker |
| Audit trail survives client compromise | ✓ VAS logs written by gateway regardless of SDK version | ✗ No independent server log — audit trail is in the library |
| Core product delivered as compiled binary | ✓ Rust binary built from controlled source — no runtime pip install | ✗ Product is a pip-installable Python package |
| Client library is optional | ✓ Gateway accepts any HTTP client — curl, raw requests, no SDK required | ✗ Library is required to use the product |
What a Compromised Smartflow SDK Can and Cannot Do
To be precise and honest: a compromised smartflow-sdk PyPI package could still cause harm at the client level — it could intercept the API key passed to SmartflowClient() or read prompt text before it is sent. This is a real risk that no client-side library can fully eliminate.
What a compromised SDK cannot do is bypass server-side enforcement:
- It cannot disable Smartflow's policy engine, guardrails, or compliance scanning
- It cannot erase VAS audit logs — those are written by the gateway, not the SDK
- It cannot impersonate a different user — SSO identity is established at the gateway level
- It cannot access other users' cached responses — the semantic cache is server-side
- It cannot alter what the gateway logs or reports to the compliance dashboard
For enterprise deployments in regulated industries, the server-side enforcement plane remaining intact under a client compromise is a meaningful difference. An organisation can detect the anomaly (via gateway logs), revoke the affected org API key, and re-issue it — without losing the historical audit trail or needing to rotate all underlying provider keys.
requirements.txt or pyproject.toml, enable hash-pinned installs (pip-compile --generate-hashes), and review your PyPI dependency audit tooling. For Smartflow customers: your gateway policies, audit logs, and compliance posture are unaffected by any SDK supply chain incident. Update your SDK version at your convenience.
Fair Assessment: Where LiteLLM Leads
This analysis aims to be technically honest. LiteLLM has genuine strengths that Smartflow does not match in every dimension:
- Provider breadth: LiteLLM supports 100+ providers including many niche and regional models. Smartflow supports 37+ major providers and all local/self-hosted options, which covers the vast majority of enterprise use cases, but LiteLLM's raw provider count is larger.
- Open-source transparency: LiteLLM's code is fully public. Security-conscious teams can audit every line. Smartflow's source is available under enterprise agreements but is not publicly browsable. (Note: the supply chain attack demonstrates that public code visibility and code integrity at distribution time are separate concerns.)
- Open-source SDK ecosystem: LiteLLM's Python library has a large community and many community-contributed integrations. Smartflow SDK v0.4.0 now provides a comparable Python interface, but LiteLLM's community momentum remains a genuine advantage.
- Community momentum: LiteLLM has a large GitHub following and active Discord. Community-contributed integrations and problem-solving resources are more readily available.
- Cost: LiteLLM is free and open-source. Smartflow Enterprise is a commercial product. For very small teams, the cost differential is material.
Conclusion
LiteLLM and Smartflow Enterprise occupy different positions on the maturity spectrum of enterprise AI infrastructure. LiteLLM is a well-engineered developer tool that solves the routing problem excellently. It is the right starting point for many teams.
Smartflow Enterprise SDK v0.4.0 now meets developers where they are — you can start with direct provider access (no gateway) and add the full enterprise governance stack with a single smartflow configure command. Smartflow is the right platform when an organisation moves from "we want to use AI" to "we need to govern AI at scale." The combination of enterprise SSO, a four-phase semantic cache that meaningfully reduces cost, a no-code policy engine, per-user audit trails tied to corporate identity, and a production-grade Kubernetes Helm chart represents a platform built for the demands of regulated industries and large-scale deployments that LiteLLM was not designed to address.
The March 2026 supply chain incident affecting LiteLLM's PyPI distribution underscores a structural point that was already true before the attack: when a product's enforcement logic lives entirely inside a pip-installable package, its security posture is only as strong as the integrity of that package at every point in the distribution chain. Smartflow's server-side Rust binary architecture means the enforcement plane is not a package you download from a public registry — it is a binary you deploy, control, and verify in your own infrastructure.
For most enterprise procurement evaluations, the question is not which platform has more provider connectors — it is which platform can be deployed in a regulated environment without requiring a custom engineering project to add compliance controls, and which platform's security posture survives a client-side compromise. On both dimensions, Smartflow Enterprise is purpose-built for the job.