Guardrails

Guardrails allow you to validate and transform LLM inputs and outputs to ensure safety, quality, and compliance.

Config

The guardrails configuration contains an array of rules that are evaluated for each request. Only the first matching guardrail rule is applied to that request. Each rule can specify input and output guardrails that will be applied. Let's take a look at a sample configuration first.

Example Configuration

This sample guardrail has one rule that has one input guardrail that masks PIIs and two output guardrails - one for masking PII and other for failing the request if the LLM responds with any of the denied topics. It also has a when block, so only specific requests have these guardrails applied on them.

name: guardrails-config
type: gateway-guardrails-config
guardrails_service_url: https://guardrails.truefoundry.com
rules:
  - id: openai-guardrails
    when:
    	models:
      	- openai/gpt3-5
        - my-bedrock/anthropic-3-7
       metadata:
       	internal-service: backend-svc # arbitrary key-value pairs
    input_guardrails:
      - type: pii
        action: transform
        options:
          entity_types:
            - email
            - ssn
            - name
            - address
    output_guardrails:
      - type: topics
        action: validate
        options:
          denied_topics:
            - medical advice
            - profanity
            - hate speech
            - violence
      - type: pii
        action: transform
        options:
          entity_types:
            - email
            - ssn
            - name
            - address

Guardrails Service URL

The guardrails_service_url field specifies the URL of the server that implements the guardrails APIs. This server provides endpoints for validating and transforming content according to the configured guardrails. The server exposes REST APIs that handle the actual implementation of the guardrail rules. In most case, you should be able to use the standard TrueFoundry Guardrails Server.

Rules

For each rule, we have three sections:

  • id: A unique identifier for the rule
  • when: Conditions for when this rule should be applied (an empty object () means apply to all requests)
    • subjects: An array of user, teams or virutal accounts from which requests is originated - for e.g. user:[email protected], team:team1, virtualaccount:virtualaccountname
    • models: An array of model ids which will be used to filter the requests. The model ids are the same as what we pass in the model field in the request.
    • metadata: Key value pairs of metadata to filter requests to apply current guardrail on.
  • input_guardrails: An array of guardrails to apply to the input prompt
  • output_guardrails: An array of guardrails to apply to the LLM response

Each guardrail (under input_guardrails or output_guardrails) has:

  • type: The type of guardrail to apply (e.g., "pii", "topics", "word")
  • action: Either "validate" (check but don't modify) or "transform" (modify the content)
  • options: Configuration specific to that guardrail type

Supported Guardrails

The available guardrail types are available in the standard deployment of TrueFoundry Guardrails Server:

PII Detection (pii)

Detects and handles personally identifiable information like emails, SSNs, names, etc.

Options:

  • entity_types: An array of PII types to detect, e.g., ["email", "ssn", "name", "address"]

The following entity types are supported:

  • email
  • phone
  • ssn
  • credit_card
  • address
  • name
  • date_of_birth
  • ip_address
  • passport
  • drivers_license
  • crypto
  • iban
  • nrp
  • medical_license
  • url
  • us_bank_number
  • us_itin
  • uk_nhs
  • uk_nino
  • es_nif
  • es_nie
  • it_fiscal_code
  • it_driver_license
  • it_vat_code
  • it_passport
  • it_identity_card
  • pl_pesel
  • sg_nric_fin
  • sg_uen
  • au_abn
  • au_acn
  • au_tfn
  • au_medicare
  • in_pan
  • in_aadhaar
  • in_vehicle_registration
  • in_voter
  • in_passport
  • fi_personal_identity_code

Topic Control (topics)

Validates that content does not contain certain topics.

Options:

  • denied_topics: An array of topics to disallow, e.g., ["medical advice", "profanity"]

Word Filtering (word)

Filters specific words and phrases.

Options:

  • word_list: An array of words to filter
  • case_sensitive: Whether matching is case sensitive (default false)
  • whole_words_only: Match whole words only (default true)
  • replacement: Text to replace filtered words with (default "[FILTERED]")