Resource · Framework

The AI Control Maturity Model

Five levels for the runtime layer that actually sits between your AI and your users. After reading the documented capabilities of 30+ products, our conclusion: the market has converged at Level 3, and Levels 4 and 5, where versioning, backtesting, and safe evolution live, are nearly empty.

Why a runtime maturity model

Existing AI maturity models (CMMI, McKinsey, Gartner, NIST) focus on governance process: committees, documentation, risk registers. Gartner's AI TRiSM names an "AI Runtime Inspection and Enforcement" layer but never defines maturity within it. None answer the practical question: how mature is the technical infrastructure that actually sits between our AI and our users in production? Policy-as-code for AI guardrails is roughly where infrastructure-as-code was in 2012. The model below maps where every major product falls.

The five levels

Level 1 - Ad hoc

No centralized control plane. Teams hardcode input validation, regex filters, and prompt checks into application code. No logging, no consistency, no auditability. Where most organizations begin.

Level 2 - Centralized gateway

A unified proxy sits between applications and providers. Key management, cost tracking, rate limiting, and logging are centralized. Visibility, but not control over what the AI says.

Level 3 - Enforced guardrailsWhere the industry converged

Content filters, PII detection, topic blocking, injection detection, and static rule enforcement run at the gateway. Policies are configured but treated as mutable settings, edited in place, with no formal lifecycle. Where the industry has converged.

Level 4 - Versioned policy-as-codeSwiftward's zone

Guardrail policies are first-class software artifacts: versioned with immutable snapshots, diffable, rollback-capable, and backtestable against historical traffic. A tamper-evident audit trail records every change with who, when, and why. Deployed through CI/CD.

Level 5 - Controlled evolutionSwiftward's zone

New policies deploy in shadow mode (observe without blocking) or canary (a fraction of traffic). A/B compares versions on live traffic. Human-in-the-loop case management routes edge cases to reviewers whose decisions feed policy refinement. The system evolves safely.

Where the products sit

We reviewed 30+ products across the category; below are the major platforms, grouped by what they are and mapped to maximum demonstrated maturity by documented capability rather than marketing claim. This is our reading as of June 2026; if we have misjudged your product, tell us and we will correct it.

Product	Category	Max level	Key evidence
Microsoft (Azure AI Content Safety)	Cloud guardrails	3	Broadest detection (multimodal, prompt shields, protected material); on-prem container; in-place edits, no policy versioning
Google (Model Armor)	Cloud guardrails	3-4	"Inspect only" is genuine shadow mode, the only one among the clouds; floor settings; Terraform; stateless, no template versioning
AWS (Bedrock Guardrails)	Cloud guardrails	3-4	DRAFT + immutable numbered config versions; CloudTrail SHA-256+RSA log integrity; no backtest, no rollback API, no replay
Cloudflare (AI Gateway)	Cloud guardrails	3	Edge proxy; Llama Guard, DLP, Firewall for AI; real traffic rate-limiting; per-category flag-vs-block is a primitive shadow mode
Check Point (Lakera)	AI security	3	Best-in-class injection detection (50K+ patterns); sensitivity-tuning simulator; editable configs, not versioned policy; stateless
F5 (CalypsoAI)	AI security	3-4	Scanner-lifecycle (scanner versioning/rollback, audit scanners, A/B-like) - not policy-as-code; red-teaming; no replay, backtest, or tamper-evident audit
Palo Alto (Prisma AIRS / Protect AI)	AI security	3	Runtime AI firewall + model scanning + red-teaming; on-prem/air-gapped scanning; detection-led, no versioned policy, replay, or tamper-evident audit
Cisco (AI Defense / Robust Intelligence)	AI security	3	Runtime guardrails + validation; Policy Studio turns prose into guardrails; vendor-managed guardrail rollback, not policy-as-code
SentinelOne (Prompt Security)	AI security	3	Inline coaching, PII redaction, injection; no policy versioning, no shadow mode
HiddenLayer (AISec)	AI security	3	Model supply-chain scanning (35+ formats: backdoors, trojans, serialization exploits), runtime adversarial and prompt-injection detection, AIBOM, mapped to OWASP/ATLAS/NIST; detection- and security-led, no versioned policy-as-code, replay, or tamper-evident policy audit
NVIDIA (NeMo Guardrails)	OSS framework	3	Colang code-first dialog control; five-stage rails; free, self-host; versioning only via your own git, no replay/backtest/HITL
Microsoft (Agent Governance Toolkit)	OSS framework	3-4	OSS; real Merkle-chained audit + Decision BOM, deterministic enforcement; its maintainers list deterministic replay as still-missing; no historical backtest
Credo AI	AI governance	2-3	AI-native governance: pre-built policy packs (EU AI Act, NIST AI RMF, ISO 42001), agent registry, shadow-AI discovery, audit-ready evidence; defines and documents policy but does not sit in the execution path, leaving runtime enforcement to a separate gateway
Holistic AI	AI governance	3	Full-lifecycle governance: discovery, bias and fairness auditing, monitoring, compliance frameworks, plus runtime guardrails with enforcement actions; governance- and monitoring-led, not versioned policy-as-code with replay or backtest
IBM (watsonx.governance)	AI governance	2-3	Model inventory, Factsheets, lifecycle + audit-ready reporting; documents and monitors, does not enforce at runtime; no policy replay
ServiceNow (AI governance)	AI governance	2	Governance workflows, AI inventory, and risk and compliance management on the ServiceNow platform; documents and manages, does not enforce at runtime
OneTrust (AI governance)	AI governance	2	AI registry, risk assessments, and compliance workflows (privacy heritage); policy, documentation, and assessment layer, not runtime enforcement
OpenAI (Promptfoo)	Testing tool	3-4	Pre-deploy red-teaming + regression; YAML config-as-code; supports L4 practices but is not runtime enforcement; OpenAI acquisition announced (Mar 2026)
LangChain (LangSmith)	Eval / observability	2	Async tracing; strong HITL annotation queues; no gateway, no guardrails
Apple (WhyLabs)	Observability	3-4	Had policy-as-code YAML, shadow, A/B, runtime blocking - acquired by Apple (Jan 2025) and discontinued
Kong (AI Gateway)	Gateway	3	Mature API gateway; Prompt Guard, Semantic Guard, PII Sanitizer; declarative config via decK for Git workflows
LiteLLM	Gateway	3	Full gateway + guardrail ecosystem (Presidio PII, topic blocking, 100+ provider integrations); no policy versioning

Levels 4 and 5 are nearly empty, and that is the insight

The closest anyone comes is scanner-lifecycle tooling: CalypsoAI/F5 versions, rolls back, and compares its detection scanners - useful, but that is managing detectors, not versioned policy-as-code, and it still has no backtesting against historical traffic and no tamper-evident audit. Among the clouds, AWS Bedrock reaches furthest with native config versioning and CloudTrail log integrity, but no backtesting and no automated rollback. The one product with a real hash-chained audit trail is Microsoft's open-source Agent Governance Toolkit - and its own maintainers list deterministic replay on the policy version that was live as a still-missing requirement. WhyLabs had the most complete Level 3-4 feature set and was acquired by Apple and shut down. No product we reviewed reaches Level 5; the pieces - Google's shadow mode, LangSmith's human-in-the-loop annotation queues, Microsoft's audit chain - exist only in isolation, never as one versioned, replayable, provable system.

What Level 4 and 5 actually require

Regulatory defensibility. When a regulator asks "what policy was in effect on March 15, and who approved it," an in-place-edited config cannot answer. A versioned policy with an immutable audit trail can.
Safe change management. Without backtesting, every change is a blind deployment. You cannot answer "if we tighten this filter, how many legitimate requests from the last 30 days would it have blocked?"
Incident response. Without one-click rollback, reverting a bad change is manual reconfiguration under pressure, the antipattern infrastructure-as-code solved a decade ago.

Where Swiftward fits

The market has no on-prem, policy-as-code engine that delivers Levels 4 and 5 as one product. That gap is what Swiftward is built for: versioned policy, backtest and shadow, replay, and human-in-the-loop on an enterprise foundation you run yourself. It is exactly the Level 4-5 layer this model shows the market is missing, built as one on-prem product. Versioning, replay, shadow, and human-in-the-loop ship today; breadth of integrations and moderation-at-scale maturity are where we are still building, and we say which is which on our comparison page. See the platform.

Book a demo