Fallback
Fallback a request to another model on failure from one model
Fallback allows users to configure fallback models in case failures happen for certain models. This is useful specially in cases, where we have the same model from multiple providers and we want to fallback to the other provider in case one provider is down or has rate limits. It helps provide higher reliability to end users.
The fallback configuration contains an array of rules. Every request is evaluated against the set of rules and only the first matching rule is evaluated - the subsequent rules are ignored.
For each rule, we have three sections:
- when
- subjects: An array of user or teams from which requests is originated - for e.g. user:bob, team:team1
- models: An array of model ids which will be used to filter the requests. The model ids are the same as what we pass in the model field in the request.
- metadata
- response_status_codes: The response status on which the fallbacks will be executed. This is important since we only want to retry on recoverable errors codes like rate limit exceeded (429), etc.
- fallback_models: Array of models ids on which to fallback to. Fallback will happen in a chained way wherein we will first retry on the first model in the list, and then subsequently the next ones.
A sample rate limiting config will look like the following:
name: model-fallback-config
type: gateway-fallback-config
# The rules are evaluated in order and once a request matches one rule,
# the subsequent rules are not checked
rules:
# Fallback to gpt-4 of azure, aws if openai-main/gpt-4 fails with 500 or 503.
- id: "openai-gpt4-fallback"
when:
models: ["openai-main/gpt4"]
response_status_codes: [500, 503]
fallback_models: ["azure/gpt-4", "aws/gpt-4"]
# Fallback to llama3 of azure, aws if bedrock/llama3 fails with 500 or 429 for customer1.
- id: "llama-bedrock-customer1-fallback"
when:
models: ["bedrock/llama3"]
metadata:
customer-id: customer1
response_status_codes: [500, 429]
fallback_models: ["azure/llama3", "aws/llama3"]
Updated about 1 month ago