Configure Rate Limits
Overview
This document outlines the rate limit configuration specification and provides a detailed explanation of each field.
Rate Limit Configuration Specification
Below is a sample configuration for setting up rate limits in the Truefoundry LLM Gateway:
segments:
- id: rate-limiter-1
subject: user:[email protected] # can be a virtual account like serviceaccout:my-svc-account
model: my-openai/gpt-4
limit: requests
unit: requests # Can be "requests" or "tokens"
metadata: # you can also select requests based on metadata keys
key: value
Understanding the fields
segments - Defines a list of rate limit segments. Each segment is tailored specifically for different users or models, detailing how many requests or tokens they can process per minute. This is the top-level entry in the configuration.
id (required) - A unique identifier for each segment, which helps differentiate multiple configurations.
subject (optional) - Specifies the user or service account subjected to the rate limits, accommodating both individual users and service accounts. For users, it's of the form user:<email>
and for virtual accounts, you can rate limit by specifying subject as serviceaccount:<serviceaccount-name>
model (optional) - Indicates the specific model and version used, which might have unique rate limits.
limit (required) - The numeric value representing the maximum allowed units (either requests or tokens) per unit of time.
unit (required) - Specifies whether the rate limit counts API requests or the number of tokens processed.
metadata (optional) - An optional field that uses metadata keys for further refining request selection under this rate limit segment.
Example Configuration
Segment based only on metadata: This configuration applies a rate limit based purely on specific request characteristics defined by metadata. This approach targets requests from a particular application or environment without considering the user or the model.
In this segment, any request containing metadata with app-name set to "frontend" and environment to "staging" will be limited to 300 requests per minute, universally applied regardless of the user or the model.
segments:
- id: "frontend-requests"
limit: 300
unit: "requests"
metadata:
app-name: "frontend"
environment: "staging"
Segment based only on user: This example focuses solely on the user to manage rate limits, perfect for individual user-specific restrictions without relating to the model or metadata.
segments:
- id: "specific-user-limit"
subject: "user:[email protected]"
limit: 100
unit: "requests"
Here, the rate limit of 100 requests per minute is imposed on [email protected] directly, independent of the model being accessed or any other distinguishing attributes.
segments:
- id: "team-model-specific"
subject: "user:[email protected]"
model: "company-ai/gpt-custom"
limit: 50
unit: "tokens"
This configuration enforces a 50 tokens per minute limit for [email protected] using the company-ai/gpt-custom model. It ensures that resource usage is kept within set parameters, providing a focused management of model capabilities for specific users or teams.
Configure Rate Limit via UI
Updated about 19 hours ago