Guardrails
Guardrails allow you to validate and transform LLM inputs and outputs to ensure safety, quality, and compliance.
Config
The guardrails configuration contains an array of rules that are evaluated for each request. Only the first matching guardrail rule is applied to that request. Each rule can specify input and output guardrails that will be applied. Let's take a look at a sample configuration first.
Example Configuration
This sample guardrail has one rule that has one input guardrail that masks PIIs and two output guardrails - one for masking PII and other for failing the request if the LLM responds with any of the denied topics. It also has a when
block, so only specific requests have these guardrails applied on them.
name: guardrails-config
type: gateway-guardrails-config
guardrails_service_url: https://guardrails.truefoundry.com
rules:
- id: openai-guardrails
when:
models:
- openai/gpt3-5
- my-bedrock/anthropic-3-7
metadata:
internal-service: backend-svc # arbitrary key-value pairs
input_guardrails:
- type: pii
action: transform
options:
entity_types:
- email
- ssn
- name
- address
output_guardrails:
- type: topics
action: validate
options:
denied_topics:
- medical advice
- profanity
- hate speech
- violence
- type: pii
action: transform
options:
entity_types:
- email
- ssn
- name
- address
Guardrails Service URL
The guardrails_service_url
field specifies the URL of the server that implements the guardrails APIs. This server provides endpoints for validating and transforming content according to the configured guardrails. The server exposes REST APIs that handle the actual implementation of the guardrail rules. In most case, you should be able to use the standard TrueFoundry Guardrails Server.
Rules
For each rule, we have three sections:
id
: A unique identifier for the rulewhen
: Conditions for when this rule should be applied (an empty object () means apply to all requests)- subjects: An array of user, teams or virutal accounts from which requests is originated - for e.g. user:[email protected], team:team1, virtualaccount:virtualaccountname
- models: An array of model ids which will be used to filter the requests. The model ids are the same as what we pass in the model field in the request.
- metadata: Key value pairs of metadata to filter requests to apply current guardrail on.
input_guardrails
: An array of guardrails to apply to the input promptoutput_guardrails
: An array of guardrails to apply to the LLM response
Each guardrail (under input_guardrails or output_guardrails) has:
type
: The type of guardrail to apply (e.g., "pii", "topics", "word")action
: Either "validate" (check but don't modify) or "transform" (modify the content)options
: Configuration specific to that guardrail type
Supported Guardrails
The available guardrail types are available in the standard deployment of TrueFoundry Guardrails Server:
PII Detection (pii
)
pii
)Detects and handles personally identifiable information like emails, SSNs, names, etc.
Options:
entity_types
: An array of PII types to detect, e.g., ["email", "ssn", "name", "address"]
The following entity types are supported:
- phone
- ssn
- credit_card
- address
- name
- date_of_birth
- ip_address
- passport
- drivers_license
- crypto
- iban
- nrp
- medical_license
- url
- us_bank_number
- us_itin
- uk_nhs
- uk_nino
- es_nif
- es_nie
- it_fiscal_code
- it_driver_license
- it_vat_code
- it_passport
- it_identity_card
- pl_pesel
- sg_nric_fin
- sg_uen
- au_abn
- au_acn
- au_tfn
- au_medicare
- in_pan
- in_aadhaar
- in_vehicle_registration
- in_voter
- in_passport
- fi_personal_identity_code
Topic Control (topics
)
topics
)Validates that content does not contain certain topics.
Options:
denied_topics
: An array of topics to disallow, e.g., ["medical advice", "profanity"]
Word Filtering (word
)
word
)Filters specific words and phrases.
Options:
word_list
: An array of words to filtercase_sensitive
: Whether matching is case sensitive (default false)whole_words_only
: Match whole words only (default true)replacement
: Text to replace filtered words with (default "[FILTERED]")
Updated 3 days ago