Executive Summary
Azure AI Gateway (part of Azure API Management) is a capable AI traffic management layer for organizations already committed to Azure. It handles token rate limiting, load balancing, and basic semantic caching — but only within the Azure ecosystem, only for Azure-native identity, and with compliance enforcement that relies on external cloud API calls.
Smartflow is built for organizations where those constraints are deal-breakers: regulated industries requiring on-premises data residency, multi-cloud or hybrid environments, Splunk-centric security operations, or workloads where an external cloud dependency in the compliance path is unacceptable.
These are not always competing products. Smartflow can operate alongside Azure APIM today — providing the enforcement, compliance, and observability layer that APIM doesn't offer — and can be adopted incrementally as the primary AI governance layer as Azure APIM's limitations become constraints.
What Azure AI Gateway Is
Azure AI Gateway is a feature set within Azure API Management (APIM) that adds AI-specific capabilities on top of APIM's existing API proxy infrastructure. It is not a standalone product. It requires an APIM instance, an Azure subscription, and for advanced features like semantic caching, additional Azure services such as Azure Managed Redis with the RediSearch module.
Key Azure AI Gateway features (per Microsoft documentation, April 2026):
- Token rate limiting — XML policy-based TPM limits per subscription key, IP, or expression
- Semantic caching — Vector similarity lookup via Azure Managed Redis + RediSearch; requires a separately provisioned Redis instance
- Load balancing — Round-robin, weighted, priority across Azure OpenAI endpoints and other model backends
- Content safety — Routes prompts through Azure AI Content Safety (a cloud API call with latency and egress implications)
- MCP server passthrough — Preview feature; Azure-hosted only
- Observability — Azure Monitor and Application Insights; no native Splunk integration
Deployment & Portability
This is the most decisive structural difference. Azure AI Gateway exists only in Azure. Smartflow is a compiled Rust binary deployable on any infrastructure.
- On-premises data center (bare metal, VMware)
- Any cloud — AWS, Azure, GCP, DigitalOcean
- Docker container, Kubernetes cluster
- Air-gapped and SCIF environments
- Private cloud and sovereign cloud
- Single binary — identical behavior in all environments
- Zero Python, zero package manager surface area
- Azure only — no on-premises option
- Requires Azure subscription and APIM instance
- Additional Azure services for advanced features
- Air-gap deployment: impossible
- Self-hosted APIM gateway available but limited
- Managed service — Microsoft controls the runtime
- Policy configuration in XML via Azure portal
Compliance Enforcement
Azure AI Gateway's compliance approach relies on Azure AI Content Safety — an external API call that evaluates prompts for harmful content. This means:
- Every compliance check adds a round-trip network call to a Microsoft cloud endpoint
- Your prompt content is sent to a third-party content moderation service
- The check happens after your gateway receives the request, not before it reaches your network boundary
- There is no concept of pre-flight policy evaluation, information barriers, or department-level access rules
Smartflow's compliance engine is built into the proxy binary and runs inline:
- Pre-flight checks — policy evaluation happens before the request is forwarded to any model; non-compliant requests never reach the AI provider
- MAESTRO orchestration — multi-step enforcement pipeline (PII detection → policy match → information barrier check → audit log) runs in a single in-process pass
- Information barriers — enforces which users, groups, or departments can send or receive which categories of AI output — not a feature Azure APIM offers at all
- Zero egress compliance — no external API call, no added latency on the compliance path, no content leaving your perimeter to be evaluated
Semantic Cache Architecture
Both platforms support semantic caching, but the architecture and operational overhead differ significantly.
Azure AI Gateway — Single-layer Redis semantic cache
- Requires a separately provisioned Azure Managed Redis instance with the RediSearch module (cannot be enabled on existing caches)
- Uses an embeddings API call to generate query vectors, then looks up similarity in Redis
- Single lookup layer — no exact match fast path, no behavioral pattern layer
- Cache is scoped to subscription key via
vary-bydirective; cross-user deduplication requires custom policy - Configuration via XML policy blocks in APIM; no built-in cache analytics dashboard
Smartflow — L1–L4 Layered Cache
- L1 — Exact match: Hash-based lookup, sub-millisecond response, zero model calls
- L2 — Semantic match: Embedding vector proximity, configurable similarity threshold
- L3 — Behavioral pattern: Detects functionally equivalent prompts across rephrasing
- L4 — Cross-user deduplication: Safe reuse of responses across users where policy permits, dramatically reducing token spend in enterprise deployments
- No external Redis required to start — L1/L2 operate on local or embedded storage; Redis optional for L3/L4 at scale
- Native cache analytics in the dashboard — hit rates, savings, per-model breakdown
- Per-server MCP cache flush via API (
POST /api/mcp/cache/flush/{server_id})
Identity & SSO
Azure AI Gateway's identity model is tightly coupled to Azure Active Directory / Entra ID. Organizations using non-Microsoft identity providers face additional integration work or outright incompatibility.
Smartflow's identity layer is provider-agnostic:
- LDAP / Active Directory (on-premises, no Azure required)
- SAML 2.0 — Okta, PingFederate, ADFS, any compliant IdP
- Azure AD / Entra (supported, but not required)
- Cisco Duo MCP SSO integration
- Custom auth via extensible middleware
- Model-level identity mapping — different users get different model access tiers; enforcement at the proxy, not at the model key level
Observability & Splunk Integration
Azure AI Gateway logs to Azure Monitor and Application Insights — both Microsoft cloud services. There is no native Splunk integration, no HEC forwarding, and no CIM field mapping.
Smartflow's VAS (Verified Audit Stream) log and trace system is built for enterprise SIEM environments:
- Native Splunk HEC forwarding — events delivered directly to your Splunk HTTP Event Collector endpoint, no middleware required
- CIM-mapped fields — log schema aligns with Splunk Common Information Model for immediate use in existing dashboards and alerts
- Syslog and SIEM-agnostic output — compatible with Microsoft Sentinel, IBM QRadar, CrowdStrike, and any syslog-capable SIEM
- Prompt and completion logging — full request/response capture with PII redaction applied before logging, not after
- Per-user, per-department trace IDs — correlate AI usage to existing security investigations in Splunk without custom parsing
- OpenTelemetry export — for teams using distributed tracing infrastructure
Performance & Scale
Azure AI Gateway scales by purchasing additional APIM gateway units — a managed service where compute cost scales with Microsoft's pricing tiers, not linearly with your actual load.
Smartflow runs as a compiled Rust binary with tokio async I/O — no garbage collection pauses, no Python GIL contention, no interpreter overhead:
- 1,000+ requests per second on a Kubernetes deployment — validated in regulated, on-premises environments
- Horizontal K8s scaling — add pods as traffic grows; cost scales linearly with cloud instance hours, not APIM tier pricing
- <5ms p99 proxy overhead — the gateway does not become the bottleneck even under sustained high load
- Single binary — identical performance characteristics on bare metal, Docker, and K8s; no configuration changes between environments
Full Feature Comparison
| Capability | Smartflow Enterprise | Azure AI Gateway (APIM) |
|---|---|---|
| On-premises deployment | ✓ Bare metal, VMware, Docker, K8s | ✗ Azure only |
| Air-gap / SCIF support | ✓ Fully supported | ✗ Not possible |
| Multi-cloud deployment | ✓ AWS, Azure, GCP, DigitalOcean, on-prem | ~ Self-hosted gateway (limited) |
| Inline compliance enforcement | ✓ Pre-flight, MAESTRO — zero egress | ✗ External Azure AI Content Safety API call |
| Information barriers | ✓ Native, per-user / per-group | ✗ Not a feature |
| PII filtering (inline) | ✓ In-process, configurable per policy | ✗ Via content safety API (cloud egress) |
| Token rate limiting | ✓ API + user + department level | ✓ XML policy, subscription/IP/key |
| Semantic caching | ✓ L1–L4 layered, built-in | ~ Single-layer Redis (requires new Redis instance) |
| Cross-user cache deduplication | ✓ L4 cache layer — policy-governed | ✗ Not available |
| Model routing / load balancing | ✓ Any model, any endpoint | ✓ Azure OpenAI + OpenAI-compatible |
| Identity / SSO | ✓ LDAP, SAML, Okta, Azure AD, Duo, custom | ~ Azure AD / Entra primarily |
| Splunk HEC integration | ✓ Native, CIM-mapped fields | ✗ No native integration |
| SIEM-agnostic logging | ✓ Syslog, OpenTelemetry, HEC | ~ Azure Monitor / App Insights only |
| MCP Gateway | ✓ Native, on-prem, per-server cache flush | ~ Preview, Azure-hosted only |
| Supply chain security | ✓ Single Rust binary, zero Python packages | ✗ Managed service; dependency surface not visible |
| Max throughput (sustained) | ✓ 1,000+ RPS (K8s, on-prem validated) | ~ Scales by APIM tier — cost increases non-linearly |
| Cost model | ✓ Infrastructure cost only; linear K8s scaling | ~ APIM tier + Redis + Azure Monitor + egress fees |
| Azure-native integration | ~ Compatible as downstream; not Azure-native | ✓ Native Entra, Azure Monitor, Foundry, AI Center |
| Microsoft Foundry model import | ~ OpenAI-compatible endpoints supported | ✓ Direct Foundry import wizard |
✓ Full support ~ Partial / requires additional setup ✗ Not available
The Complementary Adoption Path
Many enterprises evaluating Smartflow already have Azure APIM deployed for general API management. Rather than a forced displacement, Smartflow can be introduced as a compliance and enforcement layer that sits in front of or alongside APIM — adding what APIM cannot do today without disrupting existing integrations.
How Smartflow and Azure APIM Coexist
In this configuration, Smartflow handles everything APIM cannot — compliance, on-prem enforcement, SIEM logging, and advanced caching — while APIM continues to manage Azure-specific model endpoints and subscription policies. No existing APIM policies or integrations need to change.
What Each Layer Owns in the Coexistence Model
| Responsibility | Smartflow (new layer) | Azure APIM (existing) |
|---|---|---|
| Pre-flight compliance check | ✓ Inline, zero egress | ✗ Defers to Smartflow |
| Information barrier enforcement | ✓ Per-user, per-group | ✗ Not applicable |
| Semantic cache (L1–L4) | ✓ Intercepts before APIM | ~ Azure Redis cache (bypassed on cache hit) |
| Splunk / SIEM logging | ✓ All requests logged via HEC | ✗ Azure Monitor only |
| Azure Foundry / model routing | ~ Pass-through to APIM | ✓ Continues as-is |
| Existing token rate limits | ~ Enforces additional limits | ✓ Existing APIM policies unchanged |
| Identity federation | ✓ LDAP, SAML, Okta — maps to APIM keys | ✓ Azure AD continues for Azure services |
Phase-Out Roadmap — Transitioning from Azure APIM
For organizations that want to reduce Azure lock-in over time, Smartflow provides a deliberate migration path that avoids big-bang cutover risk.
- Deploy Smartflow proxy (Docker or K8s)
- Route all AI traffic through Smartflow → APIM
- Activate pre-flight compliance, PII filtering
- Enable Splunk HEC logging — first unified AI audit trail
- L1–L2 semantic cache running; APIM Redis bypassed on hits
- Zero change to existing APIM policies or model endpoints
- Add non-Azure model endpoints directly to Smartflow routing
- Anthropic, AWS Bedrock, on-prem models bypass APIM entirely
- APIM retains Azure OpenAI / Foundry routing only
- Enable L3–L4 cache; SSO unified identity activated
- Information barriers configured per department
- APIM Redis cache decommissioned — Smartflow cache replaces
- Azure OpenAI endpoints moved to Smartflow direct routing
- APIM retained for non-AI API management if needed
- Or APIM decommissioned — Smartflow handles all AI traffic
- Full MCP Gateway with on-prem tool cache
- MAESTRO compliance at full enforcement depth
- Single pane: Splunk for all AI security events
Verdict
- Data residency or air-gap requirements apply
- Compliance enforcement must run on-premises
- Splunk is the security operations platform
- Multiple clouds or non-Azure models are in scope
- Information barriers between departments are required
- Supply chain risk from Python dependencies is a concern
- 1,000+ RPS at regulated-grade compliance is required
- Deep Microsoft Foundry and Azure OpenAI native integration
- Already provisioned in Azure-first organizations
- Strong developer portal and API catalog experience
- Managed service — no binary deployment or infra ownership
- Token quota management across Azure subscriptions
APERION SmartFlow — Enterprise AI Governance
Request a technical evaluation: aperion.ai · Full documentation: docs.aperion.ai
Published April 2026 · Smartflow v1.7 · Azure APIM AI Gateway (Azure API Management, April 2026 docs)