Guardrails allow you to validate and transform LLM inputs and outputs to ensure safety, quality, and compliance.

Config

The guardrails configuration contains an array of rules that are evaluated for each request. Only the first matching guardrail rule is applied to that request. Each rule can specify input and output guardrails that will be applied. Let’s take a look at a sample configuration first.

Example Configuration

This sample guardrail has one rule that has one input guardrail that masks PIIs and two output guardrails - one for masking PII and other for failing the request if the LLM responds with any of the denied topics. It also has a when block, so only specific requests have these guardrails applied on them.

name: guardrails-config
type: gateway-guardrails-config
rules:
  - id: openai-guardrails
    when:
    	models:
      	- openai/gpt3-5
        - my-bedrock/anthropic-3-7
       metadata:
       	internal-service: backend-svc # arbitrary key-value pairs, it is optional
    input_guardrails:
      - truefoundry:guardrail-config-group:test-guardrail:guardrail-config:nikp-moderations #fqn of guardrail integration
      - truefoundry:guardrail-config-group:test-guardrail:guardrail-config:custom-dummy-in #fqn of guardrail integration
    output_guardrails:
      - truefoundry:guardrail-config-group:test-guardrail:guardrail-config:nikp-bedrock-guardrail #fqn of guardrail integration
      - truefoundry:guardrail-config-group:test-guardrail:guardrail-config:custom-dummy-out #fqn of guardrail integration

Rules

For each rule, we have three sections:

  • id: A unique identifier for the rule

  • when: Defines the subset of requests on which the rule applies. If set to an empty object ({}), the rule applies to all requests.

    • subjects: List of users, teams, or virtual accounts that originate the request. Specify as user:email@example.com, team:team-name, or virtual-account:account-name.
    • models: List of model identifiers to match. These should correspond to the model names used in the request’s model field.
    • metadata: Key-value pairs to match against request metadata. For example, specifying metadata: { environment: "production" } will apply the rule only if the request includes the header X-TFY-METADATA with environment=production.
  • input_guardrails: An array of fqn of guardrail integrations to apply to the input prompt

  • output_guardrails: An array of fqn of guardrail integrations to apply to the LLM response

How to get the fqn of guardrail integrations

You can get the fqn of guardrail integrations by navigating to the Guardrail tab on AI Gateway and clicking on the “Copy FQN” button next to the guardrail integration.

Copy FQN