Commonly Used Guardrails
PII Detection and Masking
Personally Identifiable Information (PII) protection prevents privacy breaches and ensures compliance. PII includes names, SSNs, email addresses, phone numbers, credit card numbers, and other sensitive data. Mode: Mutate - Replaces PII with placeholders like{REDACTED}
Example:
Content Moderation
Content moderation reviews and manages user-generated content to ensure it aligns with community guidelines and legal regulations. Mode: Validate - Rejects inappropriate content Key Areas:- Explicit content (sexual or violent material)
- Hate speech and discriminatory language
- Misinformation and false claims
- Illegal content or activities
Prompt Injection Prevention
Prompt injection is a security vulnerability where malicious inputs manipulate an AI model’s behavior by overriding intended instructions. Mode: Validate - Blocks injection attempts Types of Attacks:- Direct prompt injection (explicit manipulation)
- Indirect prompt injection (hidden in external content)
- Jailbreaking (bypassing safety measures)
- Encoding attacks (using special characters)
Hallucination Detection
Hallucination refers to AI generating outputs not grounded in input data or real-world knowledge, leading to misleading or incorrect information. Mode: Validate - Detects and flags hallucinated content Types of Hallucination:- Factual hallucination (incorrect facts)
- Contextual hallucination (irrelevant information)
- Logical hallucination (inconsistent reasoning)
- Source hallucination (fake citations)
Topic Detection
Topic detection identifies and categorizes the subject matter of AI inputs and outputs to ensure content aligns with intended use cases and organizational policies. Mode: Validate - Rejects content outside allowed topics Common Use Cases:- Ensuring AI responses stay within business domain boundaries
- Preventing discussion of sensitive or off-limits subjects
- Maintaining focus on specific topics or industries
- Filtering content based on organizational guidelines
- Keyword-based filtering
- Machine learning classification
- Semantic analysis
- Domain-specific rule sets
Provider Support Matrix
Here’s a comprehensive table of the most commonly used guardrails and which providers support them:Guardrail | Mode | Input Guardrail | Output Guardrail | Providers |
---|---|---|---|---|
PII Detection and Masking | Mutate | ✅ | ✅ |
|
Content Moderation | Validate | ✅ | ✅ |
|
Prompt Injection Detection | Validate | ✅ | ✅ |
|
Hallucination Detection | Validate | ❌ | ✅ |
|
Topic Detection | Validate | ❌ | ✅ |
|