Configure Rate Limits

Overview

This document outlines the rate limit configuration specification and provides a detailed explanation of each field.

Rate Limit Configuration Specification

Below is a sample configuration for setting up rate limits in the Truefoundry LLM Gateway:

configs:
  - tenantName: your-tenant-name
    segments:
      - id: "segment-id"
        user: "user-identifier"
        model: "model-name/version"
        limit: number
        unit: "requests"  # Can be "requests" or "tokens"
        metadata:
        	key: value

Explanation of Configuration Fields

1. tenantName

  • Description: Specifies the name of the tenant, representing your organization or entity using the Truefoundry LLM Gateway.
  • Example: "acme-corp"

2. segments

  • Description: Defines a list of rate limit segments, each tailored for specific user and model combinations.
  • Structure: An array of objects containing the following fields:
    • id
    • user
    • model
    • limit
    • unit

3. id

  • Description: A unique identifier for the segment, used to differentiate between different rate limit configurations.
  • Example: "dev-team-gpt4"

4. user

  • Description: Identifies the user or group for whom the rate limit applies, allowing tailored rate limits for different users or teams.
  • Example: "dev-team"

5. model

  • Description: Specifies the name and version of the LLM model being used. Different models can have different rate limits.
  • Example: "gpt-4/v1"

6. limit

  • Description: Defines the maximum number of units allowed per minute. This number can represent either API requests or tokens, depending on the unit specified.
  • Example: 100 (for 100 units per minute)

7. unit

  • Description: Determines whether the limit is applied to the number of API requests or the number of tokens processed.
  • Valid Values:
    • "requests": Limit based on the number of API calls
    • "tokens": Limit based on the number of tokens processed

8. metadata

  • Description: Specify the metadata header key-value pairs used to select requests that belong to this segment.
  • Example:
    • With the following metadata configuration, all request with metadata headers containing the following key-value pairs:
      • metadata:
        	app-name: frontend
        	environment: production
        

Example Configuration

Here's a more realistic example of a rate limit configuration:

configs:
  - tenantName: acme-corp
    segments:
      - id: "dev-team-gpt4"
        user: "dev-team"
        model: "dev-openai/gpt-4"
        limit: 100
        unit: "requests"
      - id: "prod-team-gpt35"
        user: "prod-team"
        model: "prod-openai/gpt-3.5-turbo"
        limit: 10000
        unit: "tokens"

In this example, the development team is limited to 100 requests per minute for the GPT-4 model, while the production team has a limit of 10,000 tokens per minute for the GPT-3.5 Turbo model.