Fallback allows users to configure fallback models in case failures happen for certain models. This is useful specially in cases, where we have the same model from multiple providers and we want to fallback to the other provider in case one provider is down or has rate limits. It helps provide higher reliability to end users.

The fallback configuration contains an array of rules. Every request is evaluated against the set of rules and only the first matching rule is evaluated - the subsequent rules are ignored.

For each rule, we have three sections:

  1. when (Define the subset of requests on which the rule applies): Truefoundry AI gateway provides a very flexible configuration to define the exact subset of requests on which the rule applies. We can define based on the user calling the model, or the model name or any of the custom metadata key present in the request header X-TFY-METADATA. The subjects, models and metadata fields are conditioned in an AND fashion - meaning that the rule will only match if all the conditions are met. If an incoming request doesn’t match the when block in one rule, the next rule will be evaluated.

    • subjects: Filter based on the list of users / teams / virtual accounts calling the model. User can be specified using user:john-doe or team:engineering-team or virtual-account:acct_1234567890.
    • models: Rule matches if the model name in the request matches any of the models in the list.
    • metadata: Rule matches if the metadata in the request matches the metadata in the rule. For e.g. if we specify metadata: {environment: "production"}, the rule will only match if the request has the metadata key environment with value production in the request header X-TFY-METADATA.
    • response_status_codes: The response status on which the fallbacks will be executed. This is important since we only want to retry on recoverable errors codes like rate limit exceeded (429), etc.
  2. fallback_models:

    • target: The model ID to which traffic should be routed
    • override_params (optional): A key-value object used to modify or extend the request body when falling back to the target model. Each key corresponds to a parameter name, and the associated value specifies what should be sent to the target. This allows you to override existing parameters from the original request or introduce entirely new ones, enabling flexible customization for each target model during fallback.

Let’s say you want to setup fallback based on following rules:

  1. Fallback to gpt-4 of azure, aws if openai-main/gpt-4 fails with 500 or 503. The azure target also overrides a few request parameters like temperature and max_tokens.
  2. Fallback to llama3 of azure, aws if bedrock/llama3 fails with 500 or 429 for customer1.

Your fallback config would look like this:

name: model-fallback-config
type: gateway-fallback-config
# The rules are evaluated in order and once a request matches one rule, 
# the subsequent rules are not checked
rules:
  # Fallback to gpt-4 of azure, aws if openai-main/gpt-4 fails with 500 or 503. The openai-main target also overrides a few request parameters like temperature and max_tokens
  - id: "openai-gpt4-fallback"
    when:
      models: ["openai-main/gpt4"]
      response_status_codes: [500, 503]
    fallback_models:
      - target: openai-main/gpt-4
        override_params:
          temperature: 0.9
          max_tokens: 800
  # Fallback to llama3 of azure, aws if bedrock/llama3 fails with 500 or 429 for customer1.
  - id: "llama-bedrock-customer1-fallback"
    when:
      models: ["bedrock/llama3"]
      metadata:
        customer-id: customer1
      response_status_codes: [500, 429]
    fallback_models:
      - target: aws/llama3
      - target: azure/llama3

Configure Fallback on Gateway

It’s straightforward—simply go to the Config tab in the Gateway, add your configuration, and save.