Resource · Framework
The AI Control Maturity Model
Five levels for the runtime layer that actually sits between your AI and your users. After reading the documented capabilities of 30+ products, our conclusion: the market has converged at Level 3, and Levels 4 and 5, where versioning, backtesting, and safe evolution live, are nearly empty.
Why a runtime maturity model
Existing AI maturity models (CMMI, McKinsey, Gartner, NIST) focus on governance process: committees, documentation, risk registers. Gartner's AI TRiSM names an "AI Runtime Inspection and Enforcement" layer but never defines maturity within it. None answer the practical question: how mature is the technical infrastructure that actually sits between our AI and our users in production? Policy-as-code for AI guardrails is roughly where infrastructure-as-code was in 2012. The model below maps where every major product falls.
The five levels
No centralized control plane. Teams hardcode input validation, regex filters, and prompt checks into application code. No logging, no consistency, no auditability. Where most organizations begin.
A unified proxy sits between applications and providers. Key management, cost tracking, rate limiting, and logging are centralized. Visibility, but not control over what the AI says.
Content filters, PII detection, topic blocking, injection detection, and static rule enforcement run at the gateway. Policies are configured but treated as mutable settings, edited in place, with no formal lifecycle. Where the industry has converged.
Guardrail policies are first-class software artifacts: versioned with immutable snapshots, diffable, rollback-capable, and backtestable against historical traffic. A tamper-evident audit trail records every change with who, when, and why. Deployed through CI/CD.
New policies deploy in shadow mode (observe without blocking) or canary (a fraction of traffic). A/B compares versions on live traffic. Human-in-the-loop case management routes edge cases to reviewers whose decisions feed policy refinement. The system evolves safely.
Where the products sit
We reviewed 30+ products across the category; below are the major platforms, grouped by what they are and mapped to maximum demonstrated maturity by documented capability rather than marketing claim. This is our reading as of June 2026; if we have misjudged your product, tell us and we will correct it.
| Product | Category | Max level | Key evidence |
|---|---|---|---|
| Microsoft (Azure AI Content Safety) | Cloud guardrails | 3 | Broadest detection (multimodal, prompt shields, protected material); on-prem container; in-place edits, no policy versioning |
| Google (Model Armor) | Cloud guardrails | 3-4 | "Inspect only" is genuine shadow mode, the only one among the clouds; floor settings; Terraform; stateless, no template versioning |
| AWS (Bedrock Guardrails) | Cloud guardrails | 3-4 | DRAFT + immutable numbered config versions; CloudTrail SHA-256+RSA log integrity; no backtest, no rollback API, no replay |
| Cloudflare (AI Gateway) | Cloud guardrails | 3 | Edge proxy; Llama Guard, DLP, Firewall for AI; real traffic rate-limiting; per-category flag-vs-block is a primitive shadow mode |
| Check Point (Lakera) | AI security | 3 | Best-in-class injection detection (50K+ patterns); sensitivity-tuning simulator; editable configs, not versioned policy; stateless |
| F5 (CalypsoAI) | AI security | 3-4 | Scanner-lifecycle (scanner versioning/rollback, audit scanners, A/B-like) - not policy-as-code; red-teaming; no replay, backtest, or tamper-evident audit |
| Palo Alto (Prisma AIRS / Protect AI) | AI security | 3 | Runtime AI firewall + model scanning + red-teaming; on-prem/air-gapped scanning; detection-led, no versioned policy, replay, or tamper-evident audit |
| Cisco (AI Defense / Robust Intelligence) | AI security | 3 | Runtime guardrails + validation; Policy Studio turns prose into guardrails; vendor-managed guardrail rollback, not policy-as-code |
| SentinelOne (Prompt Security) | AI security | 3 | Inline coaching, PII redaction, injection; no policy versioning, no shadow mode |
| HiddenLayer (AISec) | AI security | 3 | Model supply-chain scanning (35+ formats: backdoors, trojans, serialization exploits), runtime adversarial and prompt-injection detection, AIBOM, mapped to OWASP/ATLAS/NIST; detection- and security-led, no versioned policy-as-code, replay, or tamper-evident policy audit |
| NVIDIA (NeMo Guardrails) | OSS framework | 3 | Colang code-first dialog control; five-stage rails; free, self-host; versioning only via your own git, no replay/backtest/HITL |
| Microsoft (Agent Governance Toolkit) | OSS framework | 3-4 | OSS; real Merkle-chained audit + Decision BOM, deterministic enforcement; its maintainers list deterministic replay as still-missing; no historical backtest |
| Credo AI | AI governance | 2-3 | AI-native governance: pre-built policy packs (EU AI Act, NIST AI RMF, ISO 42001), agent registry, shadow-AI discovery, audit-ready evidence; defines and documents policy but does not sit in the execution path, leaving runtime enforcement to a separate gateway |
| Holistic AI | AI governance | 3 | Full-lifecycle governance: discovery, bias and fairness auditing, monitoring, compliance frameworks, plus runtime guardrails with enforcement actions; governance- and monitoring-led, not versioned policy-as-code with replay or backtest |
| IBM (watsonx.governance) | AI governance | 2-3 | Model inventory, Factsheets, lifecycle + audit-ready reporting; documents and monitors, does not enforce at runtime; no policy replay |
| ServiceNow (AI governance) | AI governance | 2 | Governance workflows, AI inventory, and risk and compliance management on the ServiceNow platform; documents and manages, does not enforce at runtime |
| OneTrust (AI governance) | AI governance | 2 | AI registry, risk assessments, and compliance workflows (privacy heritage); policy, documentation, and assessment layer, not runtime enforcement |
| OpenAI (Promptfoo) | Testing tool | 3-4 | Pre-deploy red-teaming + regression; YAML config-as-code; supports L4 practices but is not runtime enforcement; OpenAI acquisition announced (Mar 2026) |
| LangChain (LangSmith) | Eval / observability | 2 | Async tracing; strong HITL annotation queues; no gateway, no guardrails |
| Apple (WhyLabs) | Observability | 3-4 | Had policy-as-code YAML, shadow, A/B, runtime blocking - acquired by Apple (Jan 2025) and discontinued |
| Kong (AI Gateway) | Gateway | 3 | Mature API gateway; Prompt Guard, Semantic Guard, PII Sanitizer; declarative config via decK for Git workflows |
| LiteLLM | Gateway | 3 | Full gateway + guardrail ecosystem (Presidio PII, topic blocking, 100+ provider integrations); no policy versioning |
Levels 4 and 5 are nearly empty, and that is the insight
The closest anyone comes is scanner-lifecycle tooling: CalypsoAI/F5 versions, rolls back, and compares its detection scanners - useful, but that is managing detectors, not versioned policy-as-code, and it still has no backtesting against historical traffic and no tamper-evident audit. Among the clouds, AWS Bedrock reaches furthest with native config versioning and CloudTrail log integrity, but no backtesting and no automated rollback. The one product with a real hash-chained audit trail is Microsoft's open-source Agent Governance Toolkit - and its own maintainers list deterministic replay on the policy version that was live as a still-missing requirement. WhyLabs had the most complete Level 3-4 feature set and was acquired by Apple and shut down. No product we reviewed reaches Level 5; the pieces - Google's shadow mode, LangSmith's human-in-the-loop annotation queues, Microsoft's audit chain - exist only in isolation, never as one versioned, replayable, provable system.
What Level 4 and 5 actually require
- Regulatory defensibility. When a regulator asks "what policy was in effect on March 15, and who approved it," an in-place-edited config cannot answer. A versioned policy with an immutable audit trail can.
- Safe change management. Without backtesting, every change is a blind deployment. You cannot answer "if we tighten this filter, how many legitimate requests from the last 30 days would it have blocked?"
- Incident response. Without one-click rollback, reverting a bad change is manual reconfiguration under pressure, the antipattern infrastructure-as-code solved a decade ago.
Where Swiftward fits
The market has no on-prem, policy-as-code engine that delivers Levels 4 and 5 as one product. That gap is what Swiftward is built for: versioned policy, backtest and shadow, replay, and human-in-the-loop on an enterprise foundation you run yourself. It is exactly the Level 4-5 layer this model shows the market is missing, built as one on-prem product. Versioning, replay, shadow, and human-in-the-loop ship today; breadth of integrations and moderation-at-scale maturity are where we are still building, and we say which is which on our comparison page. See the platform.