Configure latency-based load balancing for optimal AI model performance
Latency-based routing automatically monitors the performance of multiple models and routes requests to the fastest available models. This strategy is ideal for optimizing response times and ensuring the best user experience.
# Configuration Detailsname: string # Required: Configuration name (e.g. "latency-based-config") type: gateway-load-balancing-config# Rulesrules: - id: string # Required: Unique identifier for the rule type: "latency-based-routing" # Required: Must be "latency-based-routing" when: # Required: Conditions for when to apply this rule subjects: string[] # Optional: List of user/virtual account identifiers models: string[] # Required: List of model names to match metadata: object # Optional: Additional matching criteria load_balance_targets: # Required: List of models to route to - target: string # Required: Model identifier override_params: object # Optional: Model-specific parameters to override
Model A: 500ms average (fastest)Model B: 550ms average (1.1x - considered fast)Model C: 650ms average (1.3x - considered slow)Model D: 700ms average (1.4x - considered slow)Result: Traffic distributed between Model A and Model B only