Guardrails

Guardrails allow you to validate and transform LLM inputs and outputs to ensure safety, quality, and compliance.

Config

The guardrails configuration contains an array of rules that are evaluated for each request. Only the first matching guardrail rule is applied to that request. Each rule can specify input and output guardrails that will be applied. Let's take a look at a sample configuration first.

Example Configuration

This sample guardrail has one rule that has one input guardrail that masks PIIs and two output guardrails - one for masking PII and other for failing the request if the LLM responds with any of the denied topics. It also has a when block, so only specific requests have these guardrails applied on them.

name: guardrails-config
type: gateway-guardrails-config
guardrails_service_url: https://guardrails.truefoundry.com
rules:
  - id: openai-guardrails
    when:
    	models:
      	- openai/gpt3-5
        - my-bedrock/anthropic-3-7
       metadata:
       	internal-service: backend-svc # arbitrary key-value pairs
    input_guardrails:
      - type: pii
        action: transform
        options:
          entity_types:
            - email
            - ssn
            - name
            - address
    output_guardrails:
      - type: topics
        action: validate
        options:
          denied_topics:
            - medical advice
            - profanity
            - hate speech
            - violence
      - type: pii
        action: transform
        options:
          entity_types:
            - email
            - ssn
            - name
            - address

Guardrails Service URL

The guardrails_service_url field specifies the URL of the server that implements the guardrails APIs. This server provides endpoints for validating and transforming content according to the configured guardrails. The server exposes REST APIs that handle the actual implementation of the guardrail rules. In most case, you should be able to use the standard TrueFoundry Guardrails Server.

Rules

For each rule, we have three sections:

id: A unique identifier for the rule
when: Conditions for when this rule should be applied (an empty object () means apply to all requests)
- subjects: An array of user, teams or virutal accounts from which requests is originated - for e.g. user:[email protected], team:team1, virtualaccount:virtualaccountname
- models: An array of model ids which will be used to filter the requests. The model ids are the same as what we pass in the model field in the request.
- metadata: Key value pairs of metadata to filter requests to apply current guardrail on.
input_guardrails: An array of guardrails to apply to the input prompt
output_guardrails: An array of guardrails to apply to the LLM response

Each guardrail (under input_guardrails or output_guardrails) has:

type: The type of guardrail to apply (e.g., "pii", "topics", "word")
action: Either "validate" (check but don't modify) or "transform" (modify the content)
options: Configuration specific to that guardrail type