- Prevent Runaway Costs: Protect against unexpected cost spikes due to code bugs, infinite loops, or high-volume usage.
- Enforce Budget Allocations: Set specific spending limits for teams, users, or virtual accounts. This ensures each group stays within their allocated budget and provides visibility into consumption patterns.
Configure Budget Limiting in TrueFoundry AI Gateway
Using the budget limting feature, you can assign budgets to specific users, teams, virtual accounts, applications, models or any combination of these. The budget limiting configuration is defined as a YAML file which has the following fields:- name: The name of the budget limiting configuration - it can be anything and is only used for reference in logs.
- type: This should be
gateway-budget-config
. It helps TrueFoundry identify that this is a budget limiting configuration file. - rules: An array of rules.
- id: A unique identifier for the rule. Only used for reference in logs and metrics.
- You can use dynamic placeholders that will be replaced by actual values from the request:
{user}
- Replaced by the actual user making the request (e.g.,user:alice@example.com
){model}
- Replaced by the model name being requested (e.g.,openai-main/gpt-4
){metadata.key_name}
- Replaced by the value of a custom metadata field sent in theX-TFY-METADATA
request header
- How placeholders work:
- If you set the ID as
{user}-daily-budget
, the system will create a separate budget for each user (for example,{user:alice@example.com}-daily-budget
,{user:bob@example.com}-daily-budget
) and track spending individually. - If you set the ID as
{metadata.project_id}-budget
, and your request includesX-TFY-METADATA: {"project_id": "proj-123"}
, the system will track budget for{proj-123}-budget
for that specific project. - If you set the ID as just
daily-budget
(without placeholders), the budget will apply collectively to all requests included in the when block.
- If you set the ID as
- You can use dynamic placeholders that will be replaced by actual values from the request:
- when (Define the subset of requests on which the rule applies): TrueFoundry AI gateway provides a very flexible configuration to define the exact subset of requests on which the rule applies. We can define based on the user calling the model, or the model name or any of the custom metadata key present in the request header
X-TFY-METADATA
. The subjects, models and metadata fields are conditioned in an AND fashion - meaning that the rule will only match if all the conditions are met. If an incoming request doesn’t match the when block in one rule, the next rule will be evaluated.subjects
: Filter based on the list of users / teams / virtual accounts calling the model. User can be specified usinguser:john-doe
orteam:engineering-team
orvirtualaccount:acct_1234567890
.models
: Rule matches if the model name in the request matches any of the models in the list.metadata
: Rule matches if the metadata in the request matches the metadata in the rule. For e.g. if we specifymetadata: {environment: "production"}
, the rule will only match if the request has the metadata keyenvironment
with valueproduction
in the request headerX-TFY-METADATA
.
- limit_to: Integer value which along with unit specifies the limit (for e.g. 1000 dollars a month)
- unit: Possible values are
cost_per_day, cost_per_month
(in dollars)
- Limit daily spending for a specific user on a specific model (user:bob@email.com on openai-main/gpt-4) to $50 per day
- Limit monthly spending for team:backend across all models to $2000 per month
- Limit monthly spending for virtualaccount:virtualaccount1 to $1000 per month
- Set per-model daily budget limits of $100 per day
- Set per-user monthly budget limits of $500 per month
- Set per-user, per-model budget limits of $20 per day
- Limit each project (identified by custom metadata) to $100 per day
How Budget Evaluation Works
- Each incoming request is evaluated against all rules.
- Rules are matched based on the
when
block. If multiple rules match, all applicable rules are enforced. - If any matching rule has exceeded its limit, the request is rejected with an error.