Keep your AI on-policy. And the content your users post.
Your support agent should not answer what it is not allowed to, break a rule that binds you, or say something that discredits you. And what your users post has to meet your policy and the law. Swiftward checks the text against your policy and blocks, rewrites, or escalates what is off.
Bring your own policy
You write the policy. Your rules, your tone, the laws you answer to, what your agent must never say, what content you will not host. It is not a fixed list of categories you are stuck with: anything you can state as a rule, from discrimination and harassment to off-topic or off-brand, you can enforce. And it is the same engine for two jobs, what your AI says and what your users post, so a policy you write once covers both your agent's output and your user-generated content.
Cheap gates, expensive judges
Having an LLM read every message to weigh it against your policy is slow and costly. So Swiftward runs cheap, fast classifiers first, as a gate, and only spends an LLM-as-judge when the gate says it is worth it. The judge can be a local model or any one you choose, nothing has to leave your environment. You get a judge's reading of the hard cases without paying for it on the easy ones.
It does not just block, it fixes
When the judge finds an answer off-policy, Swiftward can hand it back to the model with what was wrong and ask for a rewrite, then check the new answer and release it only once it passes. As many rounds as your policy allows; if it cannot be made compliant, it is blocked or sent to a human. The point is that the user gets a good answer, not an error page, and you still have the record of what was caught and corrected.
Streaming, without the blind spot
Streaming models emit a few words at a time, too little to judge for a policy breach or to catch a leak. Swiftward keeps streaming working, but it reassembles the stream into whole sentences or paragraphs and checks those, for policy and for leaked personal data, secrets, or code, before they reach the reader. It works the same way for OpenAI and for Anthropic.
The same engine underneath
Every check here is a versioned policy: you shadow-test a change against live traffic, A/B two versions, and replay any past decision, all audited like the rest of Swiftward, so you can change what on-policy means and prove what you enforced. This is also the engine behind the harder side of prompt-injection and social-engineering defense, it runs the same checks on inbound personal data and secrets, and it is how Trust & Safety moderates user content at scale.