Guardrails for Amazon Bedrock is a policy layer that screens both the user prompt and the model completion against rules you define — denied topics, content categories, banned words, sensitive information, and contextual grounding for RAG. A guardrail can be attached to any Bedrock model invocation, or invoked standalone via ApplyGuardrail against any text — including completions from non-Bedrock models like OpenAI. The point is to centralize "what the assistant must never do" once, not re-implement it in every prompt.
A guardrail bundles five filter families. Each is independently configurable; you can mix and match. All filters apply to both prompts (input) and responses (output) by default — disable per direction when it makes sense.
Block entire conversational topics defined in natural language plus example phrases. The classifier is a small dedicated model — far more flexible than a regex.
"topicPolicyConfig": {
"topicsConfig": [
{
"name": "Investment Advice",
"definition": "Personalized recommendations on what securities, funds, or "
"crypto assets a user should buy, sell, or hold.",
"examples": [
"Should I sell my AAPL shares?",
"What's a good ETF to buy right now?",
"Is bitcoin a good long-term hold?",
],
"type": "DENY",
}
]
}
Six harm categories, each with a strength dial: NONE, LOW, MEDIUM, HIGH. Stronger settings catch more borderline cases at the cost of more false positives.
"contentPolicyConfig": {
"filtersConfig": [
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "INSULTS", "inputStrength": "MEDIUM", "outputStrength": "MEDIUM"},
{"type": "SEXUAL", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "MISCONDUCT", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "PROMPT_ATTACK", "inputStrength": "HIGH", "outputStrength": "NONE"},
]
}
Two layers: a managed Profanity list and a custom word list. The custom list is the right place for brand-safety terms (competitor names, internal codenames, product names you don't want hallucinated).
"wordPolicyConfig": {
"wordsConfig": [{"text": "ProjectAtlas"}, {"text": "CompetitorCorp"}],
"managedWordListsConfig": [{"type": "PROFANITY"}],
}
Detects 30+ entity types (SSN, credit-card, phone, email, names, addresses, IP, plus AWS-specific ones like access keys). Each entity has an action: BLOCK (refuse the request), ANONYMIZE (mask the value), or implicit redaction in the model context.
"sensitiveInformationPolicyConfig": {
"piiEntitiesConfig": [
{"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "NAME", "action": "ANONYMIZE"},
{"type": "AWS_ACCESS_KEY", "action": "BLOCK"},
],
"regexesConfig": [{
"name": "InternalTicketId",
"description": "Internal ticket identifiers like TKT-12345.",
"pattern": "TKT-\\d{4,8}",
"action": "ANONYMIZE",
}],
}
Use ANONYMIZE when the model still needs the structure of the message but not the literal value (e.g. "send an email to {EMAIL}" still parses). Use BLOCK for things that should never reach the model at all.
For RAG flows: after generation, the guardrail scores how well the answer is grounded in the retrieved context and how relevant it is to the user query. Below either threshold, the response is blocked.
"contextualGroundingPolicyConfig": {
"filtersConfig": [
{"type": "GROUNDING", "threshold": 0.75}, # answer must be supported by context
{"type": "RELEVANCE", "threshold": 0.70}, # answer must address the query
]
}
To use this, pass the retrieved context to the Converse call as guardContent blocks; the guardrail compares the response against those blocks. Pair this with Knowledge Bases for end-to-end hallucination control.
import boto3
bedrock = boto3.client("bedrock", region_name="us-west-2")
resp = bedrock.create_guardrail(
name="support-bot-guardrail",
description="Default guardrail for the customer-support assistant.",
blockedInputMessaging ="I can't help with that request.",
blockedOutputsMessaging="I can't share that information.",
topicPolicyConfig={
"topicsConfig": [{
"name": "Investment Advice",
"definition": "Personalized recommendations on securities, funds, or crypto.",
"examples": ["Should I buy AAPL?", "Is BTC a good long hold?"],
"type": "DENY",
}]
},
contentPolicyConfig={"filtersConfig": [
{"type": "HATE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "VIOLENCE", "inputStrength": "HIGH", "outputStrength": "HIGH"},
{"type": "PROMPT_ATTACK", "inputStrength": "HIGH", "outputStrength": "NONE"},
]},
sensitiveInformationPolicyConfig={"piiEntitiesConfig": [
{"type": "US_SOCIAL_SECURITY_NUMBER", "action": "BLOCK"},
{"type": "CREDIT_DEBIT_CARD_NUMBER", "action": "BLOCK"},
{"type": "EMAIL", "action": "ANONYMIZE"},
]},
contextualGroundingPolicyConfig={"filtersConfig": [
{"type": "GROUNDING", "threshold": 0.75},
{"type": "RELEVANCE", "threshold": 0.70},
]},
)
guardrail_id = resp["guardrailId"]
print("Created", guardrail_id, "version", resp["version"])
Pass guardrailConfig to any Bedrock runtime call. The guardrail runs on the input first; if blocked, the model is never called. It runs again on the output before returning.
runtime = boto3.client("bedrock-runtime", region_name="us-west-2")
resp = runtime.converse(
modelId="anthropic.claude-opus-4-7",
messages=[{"role": "user", "content": [{"text": "My SSN is 123-45-6789, can you help with my account?"}]}],
guardrailConfig={
"guardrailIdentifier": "gr-pii-strict",
"guardrailVersion": "3",
"trace": "enabled",
},
)
print("Stop:", resp["stopReason"]) # 'guardrail_intervened' on a block
print(resp["output"]["message"]["content"][0]["text"])
Wrap the retrieved context in guardContent blocks so the contextual-grounding filter knows what the answer is supposed to be grounded in:
runtime.converse(
modelId="anthropic.claude-opus-4-7",
messages=[{"role": "user", "content": [
{"guardContent": {"text": {
"text": "Source: 2026 leave policy. EMEA staff receive 18 weeks of paid parental leave.",
"qualifiers": ["grounding_source"],
}}},
{"text": "How many weeks of parental leave do EMEA employees get?"},
]}],
guardrailConfig={
"guardrailIdentifier": guardrail_id,
"guardrailVersion": "DRAFT",
"trace": "enabled",
},
)
ApplyGuardrail runs a guardrail against arbitrary text without invoking a model. Use it to screen completions from non-Bedrock models (OpenAI, Azure, a self-hosted Llama), to validate user-generated content before storage, or to gate any text crossing a trust boundary.
from openai import OpenAI
openai = OpenAI()
runtime = boto3.client("bedrock-runtime", region_name="us-west-2")
# 1. Screen the user prompt against the guardrail
prompt = "Tell me how to bypass my company's expense-policy approvals."
input_check = runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="3",
source="INPUT",
content=[{"text": {"text": prompt}}],
)
if input_check["action"] == "GUARDRAIL_INTERVENED":
print("Blocked at input:", input_check["assessments"])
raise SystemExit
# 2. Call OpenAI (or any non-Bedrock model)
gpt = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
).choices[0].message.content
# 3. Screen the model output before returning to the user
output_check = runtime.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion="3",
source="OUTPUT",
content=[{"text": {"text": gpt}}],
)
if output_check["action"] == "GUARDRAIL_INTERVENED":
final = output_check["outputs"][0]["text"] # the blockedOutputsMessaging string
else:
final = gpt
print(final)
This is the building block for "BYO model + AWS-native safety": the model runs anywhere; the guardrail runs in your AWS account with a complete CloudTrail audit log of every intervention.
With trace: "enabled", the response includes assessments showing exactly which filter triggered and at what confidence. Log these for tuning and incident review.
trace = resp.get("trace", {}).get("guardrail", {})
for kind in ("inputAssessment", "outputAssessments"):
for asm in (trace.get(kind, {}).values() if kind == "inputAssessment" else trace.get(kind, [])):
# Topic policy
for t in asm.get("topicPolicy", {}).get("topics", []):
print(f"TOPIC {t['name']}: {t['action']}")
# Content policy
for f in asm.get("contentPolicy", {}).get("filters", []):
print(f"CONTENT {f['type']}: {f['action']} confidence={f['confidence']}")
# PII
for p in asm.get("sensitiveInformationPolicy", {}).get("piiEntities", []):
print(f"PII {p['type']}: {p['action']} match='{p['match']}'")
# Grounding
for g in asm.get("contextualGroundingPolicy", {}).get("filters", []):
print(f"GROUNDING {g['type']}: {g['action']} score={g['score']} threshold={g['threshold']}")
Guardrails are versioned the same way as Lambda: a DRAFT you can mutate, plus immutable numbered versions you can pin in production. Always pin a numeric version in production callers — never reference DRAFT from a live application, or a guardrail-author's edit can ship to prod accidentally.
# Edit the draft
bedrock.update_guardrail(guardrailIdentifier=guardrail_id, name="support-bot-guardrail", ...)
# Cut a new immutable version
v = bedrock.create_guardrail_version(
guardrailIdentifier=guardrail_id,
description="Tightened HATE input strength to HIGH after 2026-Q2 incident review.",
)
print("Pin this in callers:", v["version"])
ApplyGuardrail; CloudTrail and CloudWatch native; pinned via versioned IDs. Ties cleanly into Bedrock IAM policies./minors sub-categories) and prompt-attack signal. No PII redaction, no grounding check, no denied-topic concept. Best as a cheap first line of defense; not a full guardrail layer.If you're multi-cloud, the practical pattern is: pick one guardrail surface as the canonical one (whichever cloud hosts most of your inference), then apply it to all model output via the standalone endpoints (ApplyGuardrail, Azure Content Safety REST). That way one team owns "the policy" and every channel enforces the same rules.
guardrailVersion: "3", not "DRAFT". Cut a new version (with a description that explains why) every time you change behavior, the same way you'd cut a release.assessments block to CloudWatch Logs Insights or a separate "guardrail events" S3 prefix. False positives are how you tune; false negatives are how you patch policy gaps.blockedInputMessaging and blockedOutputsMessaging to something product-appropriate, possibly with a deflection ("...try the support form at /help").ApplyGuardrail has its own per-second TPS quota separate from model TPS — track it independently.Six categories: content filters (hate, insults, sexual, violence, misconduct, prompt-attack) with NONE/LOW/MEDIUM/HIGH thresholds for input and output independently; denied topics defined by name plus natural-language definition and few-shot examples; word filters with the managed Profanity list and a custom-words list; sensitive-information filters for PII (BLOCK or MASK) plus custom regex; contextual grounding for RAG that scores answer faithfulness against retrieved sources; and image content filters for multimodal inputs.
ApplyGuardrail is a standalone synchronous API that evaluates arbitrary text against a guardrail without invoking a model. It lets you enforce the same policy across non-Bedrock providers (OpenAI, Azure OpenAI, self-hosted vLLM), pre-screen inputs before paying for an expensive model call, or post-screen outputs from any source. This is the key to running a multi-provider stack with a single AWS-native policy — the guardrail becomes a portable contract instead of being locked to InvokeModel.
Mask (BLOCK strategy = ANONYMIZE) when the downstream task can still proceed with a redacted value — e.g., summarizing a support ticket where the email address can be replaced with {EMAIL}. Block when the presence of PII implies the request shouldn't be served at all — e.g., a public chatbot receiving an SSN should refuse and log the event for review. A common pattern is mask on input (let the model work) and block on output (never let real PII leak back to a user).
Contextual grounding scores each generated response against the retrieved source documents on two axes: grounding (is every claim supported by the sources?) and relevance (does the answer address the user's question?). You set a 0–1 threshold; responses below it are blocked or flagged. It's specifically a RAG safety net that catches hallucinations the content filters wouldn't — the model can produce a polite, non-toxic, totally fabricated answer, and only grounding will catch that. Pair it with KB retrieval and a score-threshold guard for layered defense.
All three cover hate, sexual, violence, and self-harm classification. Bedrock adds first-class denied-topic definitions, PII redaction, contextual grounding for RAG, and the standalone ApplyGuardrail API; Azure AI Content Safety adds Prompt Shields and Protected Material detection; OpenAI Moderation is free and minimal — just classification. For an AWS-centric stack, Bedrock Guardrails plus a single policy is simplest. For multi-cloud, run Azure Content Safety and Bedrock Guardrails in parallel via a thin gateway and union the results.
Treat guardrails as code — store the JSON definition in git, deploy via Terraform or CDK, and version each guardrail (DRAFT plus immutable numbered versions). Maintain a fixture set of "must block" and "must pass" prompts per surface and run them in CI against the candidate version before promoting; treat false-positive and false-negative rates as SLOs. Stream InvokeModel CloudWatch metrics by guardrail outcome (blocked input, blocked output, intervention count) and review weekly; tune topic examples first (high impact, low collateral damage) and content-filter strength last (blunter instrument). Different surfaces (kids' app vs. internal coding assistant) get different guardrail aliases.