Guardrails are essential security and compliance measures that help ensure AI applications operate safely and responsibly. They can operate in two modes: Validate (reject inputs/outputs that violate rules) and Mutate (modify inputs/outputs to comply with rules).

Commonly Used Guardrails

PII Detection and Masking

Personally Identifiable Information (PII) protection prevents privacy breaches and ensures compliance. PII includes names, SSNs, email addresses, phone numbers, credit card numbers, and other sensitive data. Mode: Mutate - Replaces PII with placeholders like {REDACTED} Example:
Input: "My name is John Smith, SSN is 123-45-6789, email john@company.com"
Output: "My name is {REDACTED}, SSN is {REDACTED}, email {REDACTED}"

Content Moderation

Content moderation reviews and manages user-generated content to ensure it aligns with community guidelines and legal regulations. Mode: Validate - Rejects inappropriate content Key Areas:
  • Explicit content (sexual or violent material)
  • Hate speech and discriminatory language
  • Misinformation and false claims
  • Illegal content or activities

Prompt Injection Prevention

Prompt injection is a security vulnerability where malicious inputs manipulate an AI model’s behavior by overriding intended instructions. Mode: Validate - Blocks injection attempts Types of Attacks:
  • Direct prompt injection (explicit manipulation)
  • Indirect prompt injection (hidden in external content)
  • Jailbreaking (bypassing safety measures)
  • Encoding attacks (using special characters)

Hallucination Detection

Hallucination refers to AI generating outputs not grounded in input data or real-world knowledge, leading to misleading or incorrect information. Mode: Validate - Detects and flags hallucinated content Types of Hallucination:
  • Factual hallucination (incorrect facts)
  • Contextual hallucination (irrelevant information)
  • Logical hallucination (inconsistent reasoning)
  • Source hallucination (fake citations)

Topic Detection

Topic detection identifies and categorizes the subject matter of AI inputs and outputs to ensure content aligns with intended use cases and organizational policies. Mode: Validate - Rejects content outside allowed topics Common Use Cases:
  • Ensuring AI responses stay within business domain boundaries
  • Preventing discussion of sensitive or off-limits subjects
  • Maintaining focus on specific topics or industries
  • Filtering content based on organizational guidelines
Detection Methods:
  • Keyword-based filtering
  • Machine learning classification
  • Semantic analysis
  • Domain-specific rule sets

Provider Support Matrix

Here’s a comprehensive table of the most commonly used guardrails and which providers support them:
GuardrailModeInput GuardrailOutput GuardrailProviders
PII Detection and MaskingMutate
  • AWS Bedrock
  • Azure PII
  • Enkrypt AI
  • Custom Guardrail (using Presidio)
Content ModerationValidate
  • OpenAI
  • AWS Bedrock
  • Azure Content Safety
  • Enkrypt AI
  • Custom Guardrail (using Guardrails AI)
Prompt Injection DetectionValidate
  • AWS Bedrock
  • Azure Content Safety
  • Enkrypt AI
  • Custom Guardrail (using Guardrails AI)
Hallucination DetectionValidate
  • AWS Bedrock
  • Azure Content Safety
  • Enkrypt AI
  • Custom Guardrail (using Guardrails AI)
Topic DetectionValidate
  • AWS Bedrock
  • Azure Content Safety
  • Enkrypt AI
  • Custom Guardrail (using Guardrails AI)

Configuration

Guardrails are configured using a YAML file that specifies the rules to be applied. Each rule can include both request and response guardrails, defining the actions to be taken at each stage. To learn more about configuring guardrails, please refer to the Configure Guardrails page.