Priority-based routing routes requests to the highest priority model that is healthy and available. If the highest priority model fails, it automatically falls back to the next highest priority model. This strategy is ideal for ensuring high availability and implementing failover mechanisms.

Overview

Priority-based routing provides:
  • Automatic failover: Seamless switching to backup models when primary fails
  • High availability: Ensures requests are always routed to available models
  • Cost optimization: Route to cheaper models as fallbacks
  • Performance optimization: Use fastest models as primary, slower as backup

Configuration Structure

# Configuration Details
name: string   # Required: Configuration name (e.g. "priority-based-config")                          
type: gateway-load-balancing-config 

# Rules
rules:
  - id: string                      # Required: Unique identifier for the rule                  
    type: "priority-based-routing"   # Required: Must be "priority-based-routing"
    when:                           # Required: Conditions for when to apply this rule        
      subjects: string[]            # Optional: List of user/virtual account identifiers
      models: string[]              # Required: List of model names to match 
      metadata: object              # Optional: Additional matching criteria
    load_balance_targets:           # Required: List of models to route to
      - target: string              # Required: Model identifier
        priority: integer           # Required: Priority level (lower number = higher priority)
        override_params: object            # Optional: Model-specific parameters to override

Key Requirements

Priority Configuration

  • Each target must have a priority value
  • Priority is an integer where lower number = higher priority
  • 0 is the highest priority
  • Gateway routes to the highest priority model that is healthy and available
  • If highest priority model fails, automatically falls back to next highest priority
Note: All conditions in the when block are combined with AND logic.

Example Configurations

The following example demonstrates a priority-based routing configuration that routes requests to the highest priority model that is healthy and available. If the highest priority model fails, it automatically falls back to the next highest priority model.
rules:
  - id: "robust-claude-routing"
    type: "priority-based-routing"
    when:
      models:
        - "claude-3"
    load_balance_targets:
      - target: "anthropic/claude-3-opus"
        priority: 0  # Primary - highest performance
      - target: "anthropic/claude-3-sonnet"
        priority: 1  # Secondary - good performance
      - target: "anthropic/claude-3-haiku"
        priority: 2  # Tertiary - fastest response

How Priority-Based Routing Works

  1. Health Check: Gateway checks if the highest priority model is healthy
  2. Primary Routing: Routes to the highest priority available model
  3. Failure Detection: Monitors for failures (timeouts, errors, status codes)
  4. Automatic Fallback: If primary fails, automatically switches to next priority
  5. Recovery: Can switch back to higher priority models when they become healthy

Best Practices

  1. Priority Planning: Design clear priority hierarchies based on performance, cost, or reliability
  2. Fallback Strategy: Ensure fallback models have sufficient capacity
  3. Monitoring: Monitor failover events and recovery times
  4. Testing: Regularly test failover scenarios
  5. Capacity Planning: Ensure backup models can handle full load if needed

Use Cases

  • High Availability: Ensure service continuity with automatic failover
  • Cost Optimization: Use expensive models as primary, cheaper as backup
  • Performance Optimization: Route to fastest models first, slower as fallback
  • Geographic Distribution: Route to closest data centers first
  • Compliance: Route to compliant models as primary, others as backup
  • A/B Testing: Use new models as primary, established models as backup