Configure Rate Limits

Overview

This document outlines the rate limit configuration specification and provides a detailed explanation of each field.

Rate Limit Configuration Specification

Below is a sample configuration for setting up rate limits in the Truefoundry LLM Gateway:

segments:
  - id: rate-limiter-1
    subject: user:[email protected] # can be a virtual account like serviceaccout:my-svc-account
    model: my-openai/gpt-4
    limit: requests
    unit: requests  # Can be "requests" or "tokens"
    metadata: # you can also select requests based on metadata keys
    key: value

Understanding the fields

segments - Defines a list of rate limit segments. Each segment is tailored specifically for different users or models, detailing how many requests or tokens they can process per minute. This is the top-level entry in the configuration.

id (required) - A unique identifier for each segment, which helps differentiate multiple configurations.

subject (optional) - Specifies the user or service account subjected to the rate limits, accommodating both individual users and service accounts. For users, it's of the form user:<email> and for virtual accounts, you can rate limit by specifying subject as serviceaccount:<serviceaccount-name>

model (optional) - Indicates the specific model and version used, which might have unique rate limits.

limit (required) - The numeric value representing the maximum allowed units (either requests or tokens) per unit of time.

unit (required) - Specifies whether the rate limit counts API requests or the number of tokens processed.

metadata (optional) - An optional field that uses metadata keys for further refining request selection under this rate limit segment.

Example Configuration

Segment based only on metadata: This configuration applies a rate limit based purely on specific request characteristics defined by metadata. This approach targets requests from a particular application or environment without considering the user or the model.

In this segment, any request containing metadata with app-name set to "frontend" and environment to "staging" will be limited to 300 requests per minute, universally applied regardless of the user or the model.

segments:
  - id: "frontend-requests"
    limit: 300
    unit: "requests"
    metadata:
      app-name: "frontend"
      environment: "staging"

Segment based only on user: This example focuses solely on the user to manage rate limits, perfect for individual user-specific restrictions without relating to the model or metadata.

segments:
  - id: "specific-user-limit"
    subject: "user:[email protected]"
    limit: 100
    unit: "requests"

Here, the rate limit of 100 requests per minute is imposed on [email protected] directly, independent of the model being accessed or any other distinguishing attributes.

segments:
  - id: "team-model-specific"
    subject: "user:[email protected]"
    model: "company-ai/gpt-custom"
    limit: 50
    unit: "tokens"

This configuration enforces a 50 tokens per minute limit for [email protected] using the company-ai/gpt-custom model. It ensures that resource usage is kept within set parameters, providing a focused management of model capabilities for specific users or teams.

Overview

Rate Limit Configuration Specification

Understanding the fields

Example Configuration

Configure Rate Limit via UI