Configure Rate Limits
Overview
This document outlines the rate limit configuration specification and provides a detailed explanation of each field.
Rate Limit Configuration Specification
Below is a sample configuration for setting up rate limits in the Truefoundry LLM Gateway:
configs:
- tenantName: your-tenant-name
segments:
- id: "segment-id"
user: "user-identifier"
model: "model-name/version"
limit: number
unit: "requests" # Can be "requests" or "tokens"
metadata:
key: value
Explanation of Configuration Fields
1. tenantName
- Description: Specifies the name of the tenant, representing your organization or entity using the Truefoundry LLM Gateway.
- Example:
"acme-corp"
2. segments
- Description: Defines a list of rate limit segments, each tailored for specific user and model combinations.
- Structure: An array of objects containing the following fields:
id
user
model
limit
unit
3. id
- Description: A unique identifier for the segment, used to differentiate between different rate limit configurations.
- Example:
"dev-team-gpt4"
4. user
- Description: Identifies the user or group for whom the rate limit applies, allowing tailored rate limits for different users or teams.
- Example:
"dev-team"
5. model
- Description: Specifies the name and version of the LLM model being used. Different models can have different rate limits.
- Example:
"gpt-4/v1"
6. limit
- Description: Defines the maximum number of units allowed per minute. This number can represent either API requests or tokens, depending on the unit specified.
- Example:
100
(for 100 units per minute)
7. unit
- Description: Determines whether the limit is applied to the number of API requests or the number of tokens processed.
- Valid Values:
"requests"
: Limit based on the number of API calls"tokens"
: Limit based on the number of tokens processed
8. metadata
- Description: Specify the metadata header key-value pairs used to select requests that belong to this segment.
- Example:
- With the following metadata configuration, all request with metadata headers containing the following key-value pairs:
-
metadata: app-name: frontend environment: production
-
- With the following metadata configuration, all request with metadata headers containing the following key-value pairs:
Example Configuration
Here's a more realistic example of a rate limit configuration:
configs:
- tenantName: acme-corp
segments:
- id: "dev-team-gpt4"
user: "dev-team"
model: "dev-openai/gpt-4"
limit: 100
unit: "requests"
- id: "prod-team-gpt35"
user: "prod-team"
model: "prod-openai/gpt-3.5-turbo"
limit: 10000
unit: "tokens"
In this example, the development team is limited to 100 requests per minute for the GPT-4 model, while the production team has a limit of 10,000 tokens per minute for the GPT-3.5 Turbo model.
Updated 15 days ago