Budget Limiting

Cost budgeting is a critical feature for effectively managing LLM workloads. It allows organizations to set and enforce cost boundaries across teams, users, and accounts—helping ensure operational efficiency and financial control.

Prevent Runaway Costs: Protect against unexpected cost spikes due to code bugs, infinite loops, or high-volume usage.
Enforce Budget Allocations: Set specific spending limits for teams, users, or virtual accounts. This ensures each group stays within their allocated budget and provides visibility into consumption patterns.

Configure Budget Limiting in TrueFoundry AI Gateway

Using the budget limting feature, you can assign budgets to specific users, teams, virtual accounts, applications, models or any combination of these. The budget limiting configuration is defined as a YAML file which has the following fields:

name: The name of the budget limiting configuration - it can be anything and is only used for reference in logs.
type: This should be gateway-budget-config. It helps TrueFoundry identify that this is a budget limiting configuration file.
rules: An array of rules.

The budget limiting configuration contains an array of rules. Every request is evaluated against the set of rules, and all the matching rules are applied - if any of them exceeds limit, then error is thrown. For example if you have a per developer daily budget and per team daily budget, then every request is evaluated against both the rules and if any of them exceeds limit, then error is thrown. For each rule, we have four sections:

id: A unique identifier for the rule. Only used for reference in logs and metrics.
- You can use dynamic values like {user}, {model} which will be replaced by actual user or model in the request.
  1. If you set the ID as {user}-daily-limit, the system will create a separate rule for each user (for example, alice-daily-limit, bob-daily-limit) and apply the limit individually to each one.
  2. If you set the ID as just daily-limit (without placeholders), the rule will apply collectively to the total number of requests from all users included in the when block.
when (Define the subset of requests on which the rule applies): TrueFoundry AI gateway provides a very flexible configuration to define the exact subset of requests on which the rule applies. We can define based on the user calling the model, or the model name or any of the custom metadata key present in the request header X-TFY-METADATA. The subjects, models and metadata fields are conditioned in an AND fashion - meaning that the rule will only match if all the conditions are met. If an incoming request doesn’t match the when block in one rule, the next rule will be evaluated.
- subjects: Filter based on the list of users / teams / virtual accounts calling the model. User can be specified using user:john-doe or team:engineering-team or virtual-account:acct_1234567890.
- models: Rule matches if the model name in the request matches any of the models in the list.
- metadata: Rule matches if the metadata in the request matches the metadata in the rule. For e.g. if we specify metadata: {environment: "production"}, the rule will only match if the request has the metadata key environment with value production in the request header X-TFY-METADATA.
limit_to: Integer value which along with unit specifies the limit (for e.g. 1000 dollars a month)
unit: Possible values are cost_per_day, cost_per_month (in dollars)

Sample Configuration

name: "budget-limiting-config"
type: "gateway-budget-config"
rules:
  # 1. Limit daily spending for a specific user on a specific model
  - id: "bob-gpt4-daily-budget"
    when:
      subjects: ["user:bob@email.com"]
      models: ["openai/gpt-4"]
    limit_to: 50
    unit: "cost_per_day"
  
  # 2. Limit monthly spending for a team across all models
  - id: "backend-monthly-budget"
    when:
      subjects: ["team:backend"]
    limit_to: 2000
    unit: "cost_per_month"
  
  # 3. Limit monthly spending for a virtual account
  - id: "virtualaccount1-monthly-budget"
    when:
      subjects: ["virtual-account:virtualaccount1"]
    limit_to: 1000
    unit: "cost_per_month"
  
  # 4. Set per-model daily budget limits
  - id: "{model}-daily-budget"
    when: {}
    limit_to: 100
    unit: "cost_per_day"
  
  # 5. Set per-user monthly budget limits
  - id: "{user}-monthly-budget"
    when: {}
    limit_to: 500
    unit: "cost_per_month"
  
  # 6. Set per-user, per-model budget limits
  - id: "{user}-{model}-daily-budget"
    when: {}
    limit_to: 20
    unit: "cost_per_day"

How Budget Evaluation Works

Each incoming request is evaluated against all rules.
Rules are matched based on the when block. If multiple rules match, all applicable rules are enforced.
If any matching rule has exceeded its limit, the request is rejected with an error.

Get Started

Developer Guide

Configure Gateway

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent Responses

Embeddings

Rerank

Responses

Audio

Batch

Files

Moderations

Configure Budget Limiting in TrueFoundry AI Gateway

Sample Configuration

How Budget Evaluation Works

Get Started

Developer Guide

Configure Gateway

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent Responses

Embeddings

Rerank

Responses

Audio

Batch

Files

Moderations

​Configure Budget Limiting in TrueFoundry AI Gateway

​Sample Configuration

​How Budget Evaluation Works

Configure Budget Limiting in TrueFoundry AI Gateway

Sample Configuration

How Budget Evaluation Works