Optimize LLM performance and reliability through intelligent request distribution
OpenAI Status Page
Anthropic Status Page
Latency Variance of models over a course of a month
Azure OpenAI Rate Limits
gpt-4o
is received:
azure/gpt-4o
and 10% to openai/gpt-4o
.Load Balancing Configuration Interface
tfy apply
command. This enables enforcing a PR review process for any changes in the load balancing configuration.
Fallback to next model when rate limited
Canary Deployment
Priority-based Failover
Latency-based Performance Optimization
Environment Based Routing
User-Proximity Routing
Burst Spillover to Fastest Healthy Target
Cost-based Weighted Routing
Advanced Configuration with fallback & retry configuration
engineering-team
only, ensuring targeted traffic management.fallback_candidate
(defaults to true
if not specified).fallback_status_codes
. If not set, the default is ["401", "403", "404", "429", "500", "502", "503"]
.retry_config
. If retry_config
is not provided for a target, the following defaults are automatically applied:
attempts
: 2delay
: 100 (milliseconds)on_status_codes
: ["429", "500", "502", "503"]
This ensures robust retry behavior out of the box, while still allowing you to override these settings per target as needed.