Truefoundry Docs

Priority-based routing routes requests to the highest priority model that is healthy and available. If the highest priority model fails, it automatically falls back to the next highest priority model. This strategy is ideal for ensuring high availability and implementing failover mechanisms.

Overview

Priority-based routing provides:

Automatic failover: Seamless switching to backup models when primary fails
High availability: Ensures requests are always routed to available models
Cost optimization: Route to cheaper models as fallbacks
Performance optimization: Use fastest models as primary, slower as backup

Configuration Structure

# Configuration Details
name: string   # Required: Configuration name (e.g. "priority-based-config")                          
type: gateway-load-balancing-config 

# Rules
rules:
  - id: string                      # Required: Unique identifier for the rule                  
    type: "priority-based-routing"   # Required: Must be "priority-based-routing"
    when:                           # Required: Conditions for when to apply this rule        
      subjects: string[]            # Optional: List of user/virtual account identifiers
      models: string[]              # Required: List of model names to match 
      metadata: object              # Optional: Additional matching criteria
    load_balance_targets:           # Required: List of models to route to
      - target: string              # Required: Model identifier
        priority: integer           # Required: Priority level (lower number = higher priority)
        override_params: object            # Optional: Model-specific parameters to override

Key Requirements

Priority Configuration

Each target must have a priority value
Priority is an integer where lower number = higher priority
0 is the highest priority
Gateway routes to the highest priority model that is healthy and available
If highest priority model fails, automatically falls back to next highest priority

Note: All conditions in the when block are combined with AND logic.

Example Configurations

The following example demonstrates a priority-based routing configuration that routes requests to the highest priority model that is healthy and available. If the highest priority model fails, it automatically falls back to the next highest priority model.

rules:
  - id: "robust-claude-routing"
    type: "priority-based-routing"
    when:
      models:
        - "claude-3"
    load_balance_targets:
      - target: "anthropic/claude-3-opus"
        priority: 0  # Primary - highest performance
      - target: "anthropic/claude-3-sonnet"
        priority: 1  # Secondary - good performance
      - target: "anthropic/claude-3-haiku"
        priority: 2  # Tertiary - fastest response

How Priority-Based Routing Works

Health Check: Gateway checks if the highest priority model is healthy
Primary Routing: Routes to the highest priority available model
Failure Detection: Monitors for failures (timeouts, errors, status codes)
Automatic Fallback: If primary fails, automatically switches to next priority
Recovery: Can switch back to higher priority models when they become healthy

Best Practices

Priority Planning: Design clear priority hierarchies based on performance, cost, or reliability
Fallback Strategy: Ensure fallback models have sufficient capacity
Monitoring: Monitor failover events and recovery times
Testing: Regularly test failover scenarios
Capacity Planning: Ensure backup models can handle full load if needed

Use Cases

High Availability: Ensure service continuity with automatic failover
Cost Optimization: Use expensive models as primary, cheaper as backup
Performance Optimization: Route to fastest models first, slower as fallback
Geographic Distribution: Route to closest data centers first
Compliance: Route to compliant models as primary, others as backup
A/B Testing: Use new models as primary, established models as backup

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Priority-Based Routing

Overview

Configuration Structure

Key Requirements

Priority Configuration

Example Configurations

How Priority-Based Routing Works

Best Practices

Use Cases

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

​Overview

​Configuration Structure

​Key Requirements

​Priority Configuration

​Example Configurations

​How Priority-Based Routing Works

​Best Practices

​Use Cases

Overview

Configuration Structure

Key Requirements

Priority Configuration

Example Configurations

How Priority-Based Routing Works

Best Practices

Use Cases