Optimize LLM performance and reliability through intelligent request distribution
Service Outages and Downtime
OpenAI Status Page
Anthropic Status Page
Latency Variance among models
Latency Variance of models over a course of a month
Rate Limits of Models
Azure OpenAI Rate Limits
Canary Testing
gpt-4o
, route 90% of the requests to azure/gpt-4o
and 10% to openai/gpt-4o
.claude-3-opus
, route 100% of the requests to anthropic/claude-3-opus
and if there is a failure, fallback the request to anthropic/claude-3-sonnet
.Weight-based Routing
gpt-4o
is received, you can configure 90% of the requests to go to azure/gpt-4o
and 10% to openai/gpt-4o
.Latency-based Routing
Priority-based Routing
Load Balancing Configuration Interface
tfy apply
command. This enables enforcing a PR review process for any changes in the load balancing configuration.