Truefoundry Docs

On this page

Overview
Configuration Structure
Key Requirements
Weight Configuration
Example Configurations
Best Practices
Use Cases

Weight-based routing distributes traffic across multiple model targets based on specified weight percentages. This strategy is ideal for gradual rollouts, A/B testing, or distributing load across different model providers.

Overview

Weight-based routing allows you to:

Distribute traffic proportionally across multiple models
Perform gradual rollouts with controlled traffic distribution
A/B test different model providers or configurations
Balance load across multiple endpoints

Configuration Structure

# Configuration Details
name: string   # Required: Configuration name (e.g. "weight-based-config")                          
type: gateway-load-balancing-config 

# Rules
rules:
  - id: string                    # Required: Unique identifier for the rule                  
    type: "weight-based-routing"  # Required: Must be "weight-based-routing"
    when:                         # Required: Conditions for when to apply this rule        
      subjects: string[]          # Optional: List of user/virtual account identifiers
      models: string[]            # Required: List of model names to match 
      metadata: object            # Optional: Additional matching criteria
    load_balance_targets:         # Required: List of models to route to
      - target: string            # Required: Model identifier
        weight: integer           # Required: Integer weight value, 0-100
        override_params: object            # Optional: Model-specific parameters to override

Key Requirements

Weight Configuration

Each target must have a weight value
Weights must be integers between 0 and 100
Sum of all weights in a rule must equal 100
Weights represent the percentage of traffic sent to each target

Example Configurations

The following example demonstrates a weight-based routing configuration that directs 80% of requests to the Azure GPT-4 model and 20% to the OpenAI GPT-4 model. The when block specifies the conditions under which this rule applies: it matches requests for the gpt-4 model, coming from members of the engineering-team, and only in the production environment. All conditions in the when block must be satisfied for the rule to take effect.

rules:
  - id: "production-rollout"
    type: "weight-based-routing"
    when:
      models: 
        - "gpt-4"
      subjects:
        - "team:engineering-team"
      metadata:
        - "environment:production"
    load_balance_targets:
      - target: "azure/gpt4"
        weight: 80
      - target: "openai/gpt4"
        weight: 20

Best Practices

Weight Validation: Always ensure weights sum to 100 for each rule
Gradual Rollouts: Start with small weights for new models and gradually increase
Monitoring: Monitor performance metrics for each target to adjust weights
Fallback Strategy: Configure fallback candidates for robust error handling
Testing: Test configurations in staging environments before production

Use Cases

Gradual Model Rollouts: Start with 10% traffic to new models
A/B Testing: Compare different model providers or configurations
Load Distribution: Balance traffic across multiple providers
Cost Optimization: Route to cheaper models for non-critical requests
Performance Optimization: Route to faster models for time-sensitive requests

Configuration Latency-Based Routing

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Weight-Based Routing

Overview

Configuration Structure

Key Requirements

Weight Configuration

Example Configurations

Best Practices

Use Cases

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

​Overview

​Configuration Structure

​Key Requirements

​Weight Configuration

​Example Configurations

​Best Practices

​Use Cases

Overview

Configuration Structure

Key Requirements

Weight Configuration

Example Configurations

Best Practices

Use Cases