Trust & Safety Decision System Map
This is a reference decision model, not a framework or product description. It separates decisions, signals, policy logic, enforcement, human operations, change control, and auditability into distinct layers.
It reflects real production systems in fintech, marketplaces, AI platforms, and regulated SaaS. Use it to audit, design, or qualify Trust & Safety systems.
Is this relevant to you?
- Do you run automated allow/deny or ranking/exposure decisions?
- Do you need replay/audit evidence for incidents, customers, or regulators?
- Do you ship ML/LLMs in prod and need control over their influence?
Sections
- Surfaces & Verdicts (what the system decides)
- Signals & Evidence (what decisions use)
- Policy Logic (how evidence becomes verdicts)
- Enforcement Runtime (where/when/how outcomes are applied)
- Human Ops & Governance (authority + workflow)
- Change Control (how it evolves safely)
- Audit, Replay & Privacy (prove, explain, reconstruct)
Surfaces & Verdicts (what the system decides)
| Category | Mechanism | Examples |
|---|---|---|
| Access & eligibility | allow / deny action | deny API call by policy; block LLM tool call; prevent seller from posting |
| suspend / reinstate subject | freeze wallet; suspend merchant; reinstate account after appeal | |
| Risk assessment | score / tier assignment | transaction risk score; user trust tier; API key risk tier |
| abuse / fraud classification | AML flag; account takeover suspicion; prompt-injection detected | |
| Exposure & distribution | visibility control | suppress scam listing; hide unsafe AI output; block ad delivery |
| ranking adjustment | downrank borderline content; demote low-trust sellers; reduce reach | |
| Flow decisions | auto-resolve vs review | auto-approve low-risk payment; hold withdrawal; quarantine AI output |
| routing to handling path | route to AML vs fraud ops; AI safety vs legal; enterprise escalation queue | |
| Volume & velocity decisions | rate limits / quotas | throttle withdrawals; cap model calls per tenant; limit posting frequency |
| temporary restrictions | 24h cash-out freeze; cooldown after suspicious behavior; DM ban for new users | |
| Data access & flow controls | data access constraints | block retrieval from "HR docs"; deny export to external connector; restrict tool scopes |
| data transformation constraints | redact PII in outputs; block secrets leakage; enforce "no code execution" zone |
Signals & Evidence (what decisions use)
| Category | Mechanism | Examples |
|---|---|---|
| Entity state | identity / verification attributes | KYC tier; MFA enabled; verified business; device trust state |
| enforcement history | prior chargebacks; past strikes; previous holds/overrides | |
| Event context | action / object metadata | amount+currency; tool name+arguments; listing category+price |
| session / device metadata | device fingerprint; IP reputation; auth method; session age | |
| Behavior signals | sequence / velocity features | burst withdrawals; rapid API calls; repeated denied tool attempts |
| pattern anomalies | payout change then withdraw; login then key creation; prompt spam then tool calls | |
| Relationship signals | linkage indicators | shared wallets; shared devices; shared IP ranges |
| coordination indicators | seller rings; coordinated postings; clustered agent behavior | |
| Model outputs | ML scores / labels | fraud probability; anomaly score; toxicity label |
| LLM classifications (with rationale & confidence level if needed) | intent detection; policy label for prompt; sensitive-data presence tag | |
| Human & external signals | human labels / outcomes | "confirmed fraud"; "false positive"; "appeal upheld/overturned" |
| external intelligence | sanctions hit; high-risk jurisdiction list; consortium fraud score |
Policy Logic (how evidence becomes verdicts)
| Category | Mechanism | Examples |
|---|---|---|
| Rules | conditions & thresholds | block if score > X; deny if jurisdiction restricted; allow if KYC ≥ 2 |
| exceptions / allowlists | regulated cohort exception; enterprise allowlist; internal test accounts | |
| Statistical decisioning | banding / cutoffs | approve < X; review X–Y; block > Y |
| ensembles / fusion | combine fraud + AML + behavior; blend anomaly + linkage + score | |
| Composition & precedence | rules constrain models | sanctions rule overrides model allow; policy blocks tool regardless of LLM judgment |
| models inform rules | dynamic thresholds from drift; score drives routing and severity | |
| Externalized decisions | vendor verdict integration | third-party fraud verdict; device reputation vendor; SaaS moderation API |
| consistency/fallback | compare vendor vs internal; fallback on vendor outage; confidence gating | |
| Control contracts | scope | EU-only policy; per-product policy; per-tenant overrides |
| determinism contract | same inputs+versions -> same verdict; version-pinned feature snapshot |
Enforcement Runtime (where/when/how outcomes are applied)
| Category | Mechanism | Examples |
|---|---|---|
| Action semantics | hard enforcement | decline payment; block prompt/tool call; revoke session/token |
| step-up / friction | MFA challenge; re-KYC; CAPTCHA / re-auth | |
| Conditional / deferred | allow-with-monitoring | approve with enhanced monitoring; allow tool call with strict logging |
| holds / quarantines | pending withdrawal review; content hidden until review; output quarantine | |
| Timing model | synchronous | checkout decision <50ms; tool-call admission inline |
| asynchronous | hold then review; batch suspension overnight | |
| Enforcement points | edge/gateway | API gateway deny; LLM proxy blocks tool call |
| service/worker | payment service declines; worker freezes accounts | |
| Propagation | cross-system effects | disable in IAM+payments+support; open case in case system |
| notifications | notify user of restriction; page on-call for critical event | |
| Failure posture | fail-closed / fail-open | fail-closed for withdrawals; fail-open for low-risk reads with caps |
| degraded mode | cached policy snapshot; disable LLM classifier but keep rules |
Human Ops & Governance (authority + workflow)
| Category | Mechanism | Examples |
|---|---|---|
| Review | triage | route large withdrawals to senior queue; route AI safety to specialist queue |
| adjudication | confirm fraud and freeze; mark false positive and restore capability | |
| Approvals | operational approvals | dual approval for large withdrawal; approval for payout address change |
| policy-change approvals | compliance sign-off for AML rule; security sign-off for tool allowlist | |
| Appeals & escalations | user appeals | seller reinstatement; wallet unfreeze request; takedown appeal |
| enterprise/regulator escalations | customer security escalation; regulator inquiry packet | |
| Overrides | override authority | senior ops override; incident commander emergency action |
| override safeguards | reason required; time-boxed override; ticket link mandatory | |
| Quality controls | calibration | disagreement review sessions; policy interpretation alignment |
| reviewer metrics | overturn rate; false-positive rate; time-to-decision by queue | |
| Separation of duties | role boundaries | author cannot deploy; deployer cannot approve; reviewer cannot edit policies |
| accountability | named approver recorded; signed change record; immutable override log |
Change Control (how it evolves safely)
| Category | Mechanism | Examples |
|---|---|---|
| Versioning | policy versions | ruleset v12; rule hash; reason-code taxonomy version |
| model versions | fraud model v3.2; classifier prompt version; feature schema version | |
| Progressive rollout | canary / % rollout | 5%→25%→100%; per-tenant rollout; per-region rollout |
| shadow mode | run new model without enforcement; log diffs vs baseline | |
| Evaluation | offline replay | replay last 30 days; measure precision/recall on labeled cases |
| online monitoring | drift detection; queue impact; false-positive trend | |
| Experimentation | A/B tests | threshold tuning; friction variant testing; ranking demotion strength |
| guardrails | blast-radius cap; auto-rollback trigger; restricted cohorts only | |
| Emergency controls | kill switches | disable auto-block; force review-only; disable one policy group |
| rollback | revert in minutes; rollback by tenant/product/region | |
| Governance workflow | change workflow | proposal→review→approval→deploy; mandatory peer review |
| post-change validation | watch-window after deploy; incident review if metrics spike |
Audit, Replay & Privacy (prove, explain, reconstruct)
| Category | Mechanism | Examples |
|---|---|---|
| Decision ledger | core record | verdict+reason codes+timestamps+actor; subject IDs recorded |
| correlation | trace ID across services; case ID; request ID | |
| Traceability | input snapshot | feature snapshot ID; model output ID; external list version |
| content snapshot | prompt hash + redacted text; output hash + redacted text | |
| Attribution | policy lineage | policy version; rule IDs hit; exception path taken |
| model lineage | model version; threshold set ID; calibration set ID | |
| Replay | reproduce | reproduce disputed decline; reproduce tool-call denial; reproduce suspension |
| what-if simulation | replay under new threshold; replay under new model; tenant-specific replay | |
| Reporting | effectiveness | abuse catch rate; fraud prevented; appeal overturn trend |
| operations | SLA adherence; backlog by queue; latency distribution | |
| Privacy & retention | minimization | store hashes not raw; redact PII; store derived features only |
| retention | 30d raw retention; 1y decision ledger; tenant-specific retention policy |
How to use this map
- Audit an existing Trust & Safety system
- Identify missing control layers
- Separate policy decisions from enforcement mechanics
- Define safe boundaries for ML and LLM usage
- Structure discussions with compliance, security, and regulators
- Use it as a system ownership map (policy vs enforcement vs ops)
Where Swiftward fits
- deterministic policy logic
- explicit decision versioning
- non-authoritative model signals
- replayable decision traces
- on-prem operational control
This map describes the problem space Swiftward is designed for.
Further reading: