Skip to main content
Cost budgeting is a critical feature for effectively managing LLM workloads. It allows organizations to set and enforce cost boundaries across teams, users, and accounts—helping ensure operational efficiency and financial control.
  1. Prevent Runaway Costs: Protect against unexpected cost spikes due to code bugs, infinite loops, or high-volume usage.
  2. Enforce Budget Allocations: Set specific spending limits for teams, users, or virtual accounts. This ensures each group stays within their allocated budget and provides visibility into consumption patterns.

Configure Budget Limiting in TrueFoundry AI Gateway

Using the budget limting feature, you can assign budgets to specific users, teams, virtual accounts, applications, models or any combination of these. The budget limiting configuration is defined as a YAML file which has the following fields:
  1. name: The name of the budget limiting configuration - it can be anything and is only used for reference in logs.
  2. type: This should be gateway-budget-config. It helps TrueFoundry identify that this is a budget limiting configuration file.
  3. rules: An array of rules.
The budget limiting configuration contains an array of rules. Every request is evaluated against the set of rules, and all the matching rules are applied - if any of them exceeds limit, then error is thrown. For example if you have a per developer daily budget and per team daily budget, then every request is evaluated against both the rules and if any of them exceeds limit, then error is thrown. For each rule, we have four sections:
  1. id: A unique identifier for the rule. Only used for reference in logs and metrics.
    • You can use dynamic placeholders that will be replaced by actual values from the request:
      • {user} - Replaced by the actual user making the request (e.g., user:alice@example.com)
      • {model} - Replaced by the model name being requested (e.g., openai-main/gpt-4)
      • {metadata.key_name} - Replaced by the value of a custom metadata field sent in the X-TFY-METADATA request header
    • How placeholders work:
      1. If you set the ID as {user}-daily-budget, the system will create a separate budget for each user (for example, {user:alice@example.com}-daily-budget, {user:bob@example.com}-daily-budget) and track spending individually.
      2. If you set the ID as {metadata.project_id}-budget, and your request includes X-TFY-METADATA: {"project_id": "proj-123"}, the system will track budget for {proj-123}-budget for that specific project.
      3. If you set the ID as just daily-budget (without placeholders), the budget will apply collectively to all requests included in the when block.
  2. when (Define the subset of requests on which the rule applies): TrueFoundry AI gateway provides a very flexible configuration to define the exact subset of requests on which the rule applies. We can define based on the user calling the model, or the model name or any of the custom metadata key present in the request header X-TFY-METADATA. The subjects, models and metadata fields are conditioned in an AND fashion - meaning that the rule will only match if all the conditions are met. If an incoming request doesn’t match the when block in one rule, the next rule will be evaluated.
    • subjects: Filter based on the list of users / teams / virtual accounts calling the model. User can be specified using user:john-doe or team:engineering-team or virtualaccount:acct_1234567890.
    • models: Rule matches if the model name in the request matches any of the models in the list.
    • metadata: Rule matches if the metadata in the request matches the metadata in the rule. For e.g. if we specify metadata: {environment: "production"}, the rule will only match if the request has the metadata key environment with value production in the request header X-TFY-METADATA.
  3. limit_to: Integer value which along with unit specifies the limit (for e.g. 1000 dollars a month)
  4. unit: Possible values are cost_per_day, cost_per_month (in dollars)
Let’s say you want to set budget limits based on the following rules:
  1. Limit daily spending for a specific user on a specific model (user:bob@email.com on openai-main/gpt-4) to $50 per day
  2. Limit monthly spending for team:backend across all models to $2000 per month
  3. Limit monthly spending for virtualaccount:virtualaccount1 to $1000 per month
  4. Set per-model daily budget limits of $100 per day
  5. Set per-user monthly budget limits of $500 per month
  6. Set per-user, per-model budget limits of $20 per day
  7. Limit each project (identified by custom metadata) to $100 per day
Your budget limit config would look like this:
name: budget-limiting-config
type: gateway-budget-config
# All matching rules are evaluated, and if any rule exceeds its limit, the request is rejected.
rules:
  # Limit daily spending for a specific user on a specific model
  - id: 'bob-gpt4-daily-budget'
    when:
      subjects: ['user:bob@email.com']
      models: ['openai/gpt-4']
    limit_to: 50
    unit: cost_per_day

  # Limit monthly spending for a team across all models
  - id: 'backend-monthly-budget'
    when:
      subjects: ['team:backend']
    limit_to: 2000
    unit: cost_per_month

  # Limit monthly spending for a virtual account
  - id: 'virtualaccount1-monthly-budget'
    when:
      subjects: ['virtualaccount:virtualaccount1']
    limit_to: 1000
    unit: cost_per_month

  # Set per-model daily budget limits
  - id: '{model}-daily-budget'
    when: {}
    limit_to: 100
    unit: cost_per_day

  # Set per-user monthly budget limits
  - id: '{user}-monthly-budget'
    when: {}
    limit_to: 500
    unit: cost_per_month

  # Set per-user, per-model budget limits
  - id: '{user}-{model}-daily-budget'
    when: {}
    limit_to: 20
    unit: cost_per_day

  # Limit each project (identified by custom metadata) to $100 per day
  # Requests must include X-TFY-METADATA: {"project_id": "your-project-id"}
  - id: 'project-{metadata.project_id}-daily-budget'
    when: {}
    limit_to: 100
    unit: cost_per_day

How Budget Evaluation Works

  • Each incoming request is evaluated against all rules.
  • Rules are matched based on the when block. If multiple rules match, all applicable rules are enforced.
  • If any matching rule has exceeded its limit, the request is rejected with an error.
I