Skip to main content
Virtual Models are reusable entities in TrueFoundry AI Gateway that allow you to create a single model interface that intelligently routes requests to one or more underlying models based on your configured routing strategy. Instead of directly calling individual models, you can use a Virtual Model that automatically handles load balancing, failover, and retries across multiple providers.

Why Use Virtual Models?

Virtual Models provide several key benefits:
  • Abstraction: Use a single model identifier instead of managing multiple provider-specific models
  • Reliability: Automatically route to healthy models when others fail or experience issues
  • Performance: Distribute traffic based on weights, latency, or priority to optimize response times
  • Flexibility: Easily switch between providers or adjust routing strategies without changing your application code

Creating a Virtual Model

1

Navigate to Virtual Models in AI Gateway

From the TrueFoundry dashboard, navigate to AI Gateway > Models and select Virtual Model.
Navigate to Virtual Models in AI Gateway

Navigate to Virtual Models in AI Gateway

Virtual Models are organized into Virtual Model Provider Groups. When creating a new Virtual Model, you can either add to the already existing group or create a new one.
2

Create or Select a Virtual Model Provider Group and Set Access Controls

Give your Virtual Model Provider Group a unique name to organize your virtual models. The group name must be 3 to 64 characters long, alphanumeric with hyphens allowed, and cannot start with a number. Within a group, you can add multiple virtual models.Next, configure collaborators for the provider group to control who can access and manage your virtual models. You can assign:
  • User Role: Allows users/teams to use the virtual models for inference
  • Manager Role: Allows users/teams to modify the virtual model configuration
Create Virtual Model Provider Group and configure access controls

Create Virtual Model Provider Group and configure access controls

Learn more about access control here.
3

Configure Virtual Model Details, Routing Strategy, and Targets

For each virtual model in your group, configure the following:
  • Name: A unique identifier for your virtual model (e.g., gpt-4-production)
  • Model Types: Select the supported operation types (chat, completion, embedding, etc.)
    You can select multiple model types if your virtual model needs to support different operation types.
  • Routing Strategy: Choose how requests should be distributed across your target models. The AI Gateway supports three main routing strategies:
    Distribute traffic based on assigned weights. For example, with weights of 80 for one model and 20 for another, roughly 80% of requests go to the first and 20% to the second.
    Route requests to the model with the lowest response latency. The gateway monitors response times (per output token) for each model and chooses the fastest healthy model.
    Route requests in priority order with automatic fallback. Requests go to the highest priority model first (0 is highest). If it fails, the gateway falls back to the next one.
    For more on routing strategies, see the Load Balancing Overview. For configuration examples, check Commonly Used Routing Configurations.
  • Target Models: For your chosen routing strategy, specify one or more target models that will receive traffic. For each target, you can configure:
    • Target: Select the model from the dropdown.
    • Retry Configuration: Number of retry attempts, delay between retries, and status codes that trigger retries.
    • Fallback Status Codes: HTTP status codes that should cause fallback to other targets.
    • Fallback Candidate: Whether this target can act as a fallback for others.
    • Override Parameters: (Optional) Set request parameters to override when routing to this target (e.g., temperature, max_tokens).
Configure virtual model details, routing strategy, and target models

Configure virtual model details, routing strategy, and target models

Using Virtual Models

Once created, you can use virtual models just like any other model in the AI Gateway. The virtual model name follows the format: virtual-model-group-name/virtual-model-name.

Try Out Virtual Models in Playground

You can test your virtual models directly in the TrueFoundry Playground: Option 1: Click the try in playground button you see once you create the virtual model.
Try in playground button next to virtual model

Try in playground button next to virtual model

Option 2: Go to the Playground directly and select your virtual model from the model dropdown
Select virtual model from model dropdown in playground

Select virtual model from playground dropdown

The playground allows you to interact with your virtual model and see how it routes requests to the underlying target models based on your configured routing strategy.

FAQ

Yes, you can update the routing configuration at any time. Changes take effect immediately for new requests. Existing in-flight requests will complete with their current routing.
The gateway provides observability metrics that show which targets received traffic, their success rates, and latency. You can view these metrics in the AI Gateway dashboard.
Yes, you can configure a virtual model to support multiple model types (e.g., chat, completion, embedding). However, all target models must support the requested operation type.
If all configured targets fail and exhaust their retry attempts, the request will fail with an error. Ensure you have sufficient fallback targets configured for critical use cases.
No, you can’t use virtual models as targets within other routing configurations. Only real (deployed) models can be the targets in routing rules.
When routing a request, the AI Gateway determines which routing configuration to use in the following order of precedence:
  1. Virtual Model Configuration: If no header-based override is present, the routing rules defined within the virtual model itself will be used.
  2. Header-based routing: If you specify a routing configuration override in the request headers, this configuration will take the highest precedence for that request.
  3. Global Configuration: If neither of the above exist for the model or request, the system-wide (global) routing settings are applied.
The typical order of precedence is: Virtual Model Configuration > Header-based routing > Global configuration.