RateLimiting configuration allows you to rate limit requests to a specified tokens/requests per minute/hour/day for certain sets of requests. The rate limiting configuration contains an array of rules. Every request is evaluated against the set of rules, and only the first matching rule is applied—subsequent rules are ignored. So keep generic ones at bottom, specialised configs at top.

For each rule, we have four sections:

  1. id
    1. A unique identifier for the rule.
    2. You can use dynamic values like {user}, {model} which will be replaced by actual user or model in the request.
    3. For Example:
      1. If you set the ID as {user}-daily-limit, the system will create a separate rule for each user (for example, alice-daily-limit, bob-daily-limit) and apply the limit individually to each one.
      2. If you set the ID as just daily-limit (without placeholders), the rule will apply collectively to the total number of requests from all users included in the when block.
  2. when
    1. subjects: An array of user, teams or virutal accounts from which requests is originated - for e.g. user:bob@email.com, team:team1, virtualaccount:virtualaccountname
    2. models: An array of model ids which will be used to filter the requests. The model ids are in the format <provider_name>/<model_name> (for example, `openai-main/gpt-4`). You can find and copy the exact model ID from the LLM Playground UI. When you select a model, the full model identifier is displayed along with a copy button.
    3. metadata.
  3. limit_to: Integer value which along with unit specifies the limit (for e.g. 100000 tokens per minute)
  4. unit: Possible values are requests_per_minute, requests_per_hour, requests_per_day, tokens_per_minute, tokens_per_hour, tokens_per_day

When you select a Model, you can find the copy button here:

Let’s say you want to rate limit requests based on the following rules:

  1. Limit all requests to gpt4 model from openai-main account for user:bob@email.com to 1000 requests per day
  2. Limit all requests to gpt4 model for team:backend to 20000 tokens per minute
  3. Limit all requests to gpt4 model for virtualaccount:virtualaccount1 to 20000 tokens per minute
  4. Limit all models to have a limit of 1000000 tokens per day
  5. Limit all users to have a limit of 1000000 tokens per day
  6. Limit all users to have a limit of 1000000 tokens per day for each model

Your rate limit config would look like this:

name: ratelimiting-config
type: gateway-rate-limiting-config
# The rules are evaluated in order, and only the first matching rule is applied, subsequent rules are ignored.
rules:
  # Limit all requests to gpt4 model from openai-main account for user:bob@email.com to
  # 1000 requests per day
  - id: "openai-gpt4-dev-env"
    when: 
      subjects: ["user:bob@email.com"]
      models: ["openai-main/gpt4"]
    limit_to: 1000
    unit: requests_per_day
  # Limit all requests to gpt4 model for team:backend to 20000 tokens per minute
  - id: "openai-gpt4-dev-env"
    when: 
      subjects: ["team:backend"]
      models: ["openai-main/gpt4"]
    limit_to: 20000
    unit: tokens_per_minute
  # Limit all requests to gpt4 model for virtualaccount:virtualaccount1 to 20000 tokens per minute
  - id: "openai-gpt4-dev-env"
    when: 
      subjects: ["virtualaccount:virtualaccount1"]
      models: ["openai-main/gpt4"]
    limit_to: 20000
    unit: tokens_per_minute
  # Limit all models to have a limit of 1000000 tokens per day 
  - id: "{model}-daily-limit"
 		when: {}
    limit_to: 1000000
    unit: tokens_per_day
  # Limit all users to have a limit of 1000000 tokens per day 
  - id: "{user}-daily-limit"
  	when:{}
    limit_to: 1000000
  # Limit all users to have a limit of 1000000 tokens per day for each model 
  - id: "{user}-{model}-daily-limit"
  	when: {}
    limit_to: 1000000
    unit: tokens_per_day

Configure Ratelimit on Gateway

It’s straightforward—simply go to the Config tab in the Gateway, add your configuration, and save.

Example to Setup Rate Limits for User, Teams and Virtual Accounts

TrueFoundry allows you to setup rate limit for specific users, teams and virtual accounts.

Setup rate limit for users

Say you want to limit all requests to gpt4 model from openai-main account for users bob@email.com and jack@email.com to 1000 requests per day

name: ratelimiting-config
type: gateway-rate-limiting-config
# The rules are evaluated in order, and all matching rules are considered.
# If any one of them causes a rate limit, the corresponding ID will be returned.
rules:
  # Limit all requests to gpt4 model from openai-main account for user:bob@email.com and user:jack@email.com to 1000 requests per day
  - id: "openai-gpt4-dev-env"
    when: 
      subjects: ["user:bob@email.com", "user:jack@email.com"]
      models: ["openai-main/gpt4"]
    limit_to: 1000
    unit: requests_per_day

Setup rate limit for teams

Say you want to limit all requests for team frontend to 5000 requests per day

name: ratelimiting-config
type: gateway-rate-limiting-config
# The rules are evaluated in order, and all matching rules are considered.
# If any one of them causes a rate limit, the corresponding ID will be returned.
rules:
  # Limit all requests for team frontend to 5000 requests per day
  - id: "openai-gpt4-dev-env"
    when: 
      subjects: ["team:frontend"]
    limit_to: 5000
    unit: requests_per_day

Setup rate limit for virtual accounts

Say you want to limit all requests for virtual account va-james to 1500 requests per day

name: ratelimiting-config
type: gateway-rate-limiting-config
# The rules are evaluated in order, and all matching rules are considered.
# If any one of them causes a rate limit, the corresponding ID will be returned.
rules:
  # Limit all requests for virtual account va-james to 1500 requests per day
  - id: "openai-gpt4-dev-env"
    when: 
      subjects: ["virtualaccount:va-james"]
    limit_to: 1500
    unit: requests_per_day