Configure priority-based load balancing for AI models with automatic fallback
Priority-based routing routes requests to the highest priority model that is healthy and available. If the highest priority model fails, it automatically falls back to the next highest priority model. This strategy is ideal for ensuring high availability and implementing failover mechanisms.
# Configuration Detailsname: string # Required: Configuration name (e.g. "priority-based-config") type: gateway-load-balancing-config# Rulesrules: - id: string # Required: Unique identifier for the rule type: "priority-based-routing" # Required: Must be "priority-based-routing" when: # Required: Conditions for when to apply this rule subjects: string[] # Optional: List of user/virtual account identifiers models: string[] # Required: List of model names to match metadata: object # Optional: Additional matching criteria load_balance_targets: # Required: List of models to route to - target: string # Required: Model identifier priority: integer # Required: Priority level (lower number = higher priority) override_params: object # Optional: Model-specific parameters to override
The following example demonstrates a priority-based routing configuration that routes requests to the highest priority model that is healthy and available. If the highest priority model fails, it automatically falls back to the next highest priority model.