Fallback
Fallback a request to another model on failure from one model
Fallback allows users to configure fallback models in case failures happen for certain models. This is useful specially in cases, where we have the same model from multiple providers and we want to fallback to the other provider in case one provider is down or has rate limits. It helps provide higher reliability to end users.
The fallback configuration contains an array of rules. Every request is evaluated against the set of rules and only the first matching rule is evaluated - the subsequent rules are ignored.
For each rule, we have three sections:
-
when (Define the subset of requests on which the rule applies): Truefoundry AI gateway provides a very flexible configuration to define the exact subset of requests on which the rule applies. We can define based on the user calling the model, or the model name or any of the custom metadata key present in the request header
X-TFY-METADATA
. The subjects, models and metadata fields are conditioned in an AND fashion - meaning that the rule will only match if all the conditions are met. If an incoming request doesn’t match the when block in one rule, the next rule will be evaluated.subjects
: Filter based on the list of users / teams / virtual accounts calling the model. User can be specified usinguser:john-doe
orteam:engineering-team
orvirtual-account:acct_1234567890
.models
: Rule matches if the model name in the request matches any of the models in the list.metadata
: Rule matches if the metadata in the request matches the metadata in the rule. For e.g. if we specifymetadata: {environment: "production"}
, the rule will only match if the request has the metadata keyenvironment
with valueproduction
in the request headerX-TFY-METADATA
.response_status_codes
: The response status on which the fallbacks will be executed. This is important since we only want to retry on recoverable errors codes like rate limit exceeded (429), etc.
-
fallback_models:
target
: The model ID to which traffic should be routedoverride_params
(optional): A key-value object used to modify or extend the request body when falling back to the target model. Each key corresponds to a parameter name, and the associated value specifies what should be sent to the target. This allows you to override existing parameters from the original request or introduce entirely new ones, enabling flexible customization for each target model during fallback.
Let’s say you want to setup fallback based on following rules:
- Fallback to gpt-4 of azure, aws if openai-main/gpt-4 fails with 500 or 503. The azure target also overrides a few request parameters like temperature and max_tokens.
- Fallback to llama3 of azure, aws if bedrock/llama3 fails with 500 or 429 for customer1.
Your fallback config would look like this:
Configure Fallback on Gateway
It’s straightforward—simply go to the Config tab in the Gateway, add your configuration, and save.