Truefoundry Docs

Guardrails are essential security and compliance measures that help ensure AI applications operate safely and responsibly. They can operate in two modes: Validate (reject inputs/outputs that violate rules) and Mutate (modify inputs/outputs to comply with rules).

Commonly Used Guardrails

PII Detection and Masking

Personally Identifiable Information (PII) protection prevents privacy breaches and ensures compliance. PII includes names, SSNs, email addresses, phone numbers, credit card numbers, and other sensitive data. Mode: Mutate - Replaces PII with placeholders like {REDACTED} Example:

Input: "My name is John Smith, SSN is 123-45-6789, email john@company.com"
Output: "My name is {REDACTED}, SSN is {REDACTED}, email {REDACTED}"

Content Moderation

Content moderation reviews and manages user-generated content to ensure it aligns with community guidelines and legal regulations. Mode: Validate - Rejects inappropriate content Key Areas:

Explicit content (sexual or violent material)
Hate speech and discriminatory language
Misinformation and false claims
Illegal content or activities

Prompt Injection Prevention

Prompt injection is a security vulnerability where malicious inputs manipulate an AI model’s behavior by overriding intended instructions. Mode: Validate - Blocks injection attempts Types of Attacks:

Direct prompt injection (explicit manipulation)
Indirect prompt injection (hidden in external content)
Jailbreaking (bypassing safety measures)
Encoding attacks (using special characters)

Hallucination Detection

Hallucination refers to AI generating outputs not grounded in input data or real-world knowledge, leading to misleading or incorrect information. Mode: Validate - Detects and flags hallucinated content Types of Hallucination:

Factual hallucination (incorrect facts)
Contextual hallucination (irrelevant information)
Logical hallucination (inconsistent reasoning)
Source hallucination (fake citations)

Topic Detection

Topic detection identifies and categorizes the subject matter of AI inputs and outputs to ensure content aligns with intended use cases and organizational policies. Mode: Validate - Rejects content outside allowed topics Common Use Cases:

Ensuring AI responses stay within business domain boundaries
Preventing discussion of sensitive or off-limits subjects
Maintaining focus on specific topics or industries
Filtering content based on organizational guidelines

Detection Methods:

Keyword-based filtering
Machine learning classification
Semantic analysis
Domain-specific rule sets

Provider Support Matrix

Here’s a comprehensive table of the most commonly used guardrails and which providers support them:

Guardrail	Mode	Input Guardrail	Output Guardrail	Providers
PII Detection and Masking	Mutate	✅	✅	AWS Bedrock Azure PII Enkrypt AI Custom Guardrail (using Presidio)
Content Moderation	Validate	✅	✅	OpenAI AWS Bedrock Azure Content Safety Enkrypt AI Custom Guardrail (using Guardrails AI)
Prompt Injection Detection	Validate	✅	✅	AWS Bedrock Azure Content Safety Enkrypt AI Custom Guardrail (using Guardrails AI)
Hallucination Detection	Validate	❌	✅	AWS Bedrock Azure Content Safety Enkrypt AI Custom Guardrail (using Guardrails AI)
Topic Detection	Validate	❌	✅	AWS Bedrock Azure Content Safety Enkrypt AI Custom Guardrail (using Guardrails AI)

Configuration

Guardrails are configured using a YAML file that specifies the rules to be applied. Each rule can include both request and response guardrails, defining the actions to be taken at each stage. To learn more about configuring guardrails, please refer to the Configure Guardrails page.

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Commonly Used Guardrails

Commonly Used Guardrails

PII Detection and Masking

Content Moderation

Prompt Injection Prevention

Hallucination Detection

Topic Detection

Provider Support Matrix

Configuration

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

​Commonly Used Guardrails

​PII Detection and Masking

​Content Moderation

​Prompt Injection Prevention

​Hallucination Detection

​Topic Detection

​Provider Support Matrix

​Configuration

Commonly Used Guardrails

PII Detection and Masking

Content Moderation

Prompt Injection Prevention

Hallucination Detection

Topic Detection

Provider Support Matrix

Configuration