Truefoundry Docs

Deploying the LLM Benchmarking Tool

The LLM Benchmarking Tool allows you to measure key performance metrics of language models, including token generation throughput, Time to First Token (TTFT), and Inter Token Latency.

Deployment via Application Catalog

The simplest way to deploy the LLM Benchmarking Tool is through the Application Catalog:

Navigate to the Application Catalog and select “Benchmark LLM performance”

Application Catalog

Fill in the required deployment information, including Name and Host endpoint for your benchmarking tool.

Model Configuration

Basic Configuration Parameters

Load Test Parameters

Number of users (peak concurrency): Maximum number of concurrent users for the load test
Ramp up (users started/second): Rate at which new users are added to the test
Host: Base URL for LLM API Server with OpenAI compatible endpoints (v1/chat/completions)

Model Settings

Tokenizer (HuggingFace tokenizer to use to count tokens): HuggingFace tokenizer identifier used to count tokens in prompts and responses
Model (Name of the model in chat/completions payload): Model identifier used in the API request payload

Finding Parameters for TrueFoundry Deployed Models

When using a model deployed on TrueFoundry, you can find the required parameters in the model’s deployment spec:

Model Name: Look under the env section for MODEL_NAME

env:
  MODEL_NAME: nousresearch-meta-llama-3-1-8b-instruct

Host: Find the host URL under the ports section
ports: - host: example-model.truefoundry.tech

Tokenizer: Look for the model_id under artifacts_download.artifacts

artifacts_download:
  artifacts:
    - type: huggingface-hub
      model_id: NousResearch/Meta-Llama-3.1-8B-Instruct

Finding Parameters for External Models

When using external models (like GPT-4), you’ll need to configure the following parameters:

Model Name: Use the model identifier from your provider (e.g., gpt-4, gpt-4o)
Tokenizer: Find an equivalent tokenizer on HuggingFace (e.g., Quivr/gpt-4o)
Host: Your external provider’s API endpoint
API Key: Your provider’s API key (e.g., OpenAI API key)

Finding Parameters for TrueFoundry AI Gateway Models

When using models through TrueFoundry’s AI Gateway:

Navigate to the AI Gateway in your TrueFoundry workspace
Select the model you want to benchmark
Click on the </> Code button to view the API integration code
From the code example, you can find:
- Host: The base URL in the request (e.g., https://truefoundry.tech/api/llm/api/inference/openai/chat/completions)
- Model Name: The model identifier in the request payload (e.g., "model": "openai-main/gpt-4o")
- OpenAI API Key: Generate one using the “Generate API Key” button
- Tokenizer: Find an equivalent tokenizer on HuggingFace (e.g., Quivr/gpt-4o)

Prompt Configuration

Max Output Tokens: Maximum number of tokens allowed in the model’s response
Use Random Prompts: Whether to use randomly generated prompts for testing
Use Single Prompt: Whether to use a single prompt for all test requests
Ignore EOS: Whether to ignore end-of-sequence tokens during token counting

Prompt Min Tokens: Minimum number of tokens in the input prompt (not used with random or single prompts)
Prompt Max Tokens: Maximum number of tokens in the input prompt (not used with random or single prompts)

Viewing Benchmark Results

After running the benchmark, you’ll see comprehensive performance metrics displayed in charts:

Requests per Second
Active Users
Tokens per Second
Response Time Seconds
Response Time First Token (ms)
Inter Token Latency (ms)

Benchmarking Results

LLM Deployment

LLM Finetuning

Prompt Management

LLM Tracing

Benchmarking LLMs

Deploying the LLM Benchmarking Tool

Deployment via Application Catalog

Model Configuration

Basic Configuration Parameters

Load Test Parameters

Model Settings

Finding Parameters for TrueFoundry Deployed Models

Finding Parameters for External Models

Finding Parameters for TrueFoundry AI Gateway Models

Prompt Configuration

Viewing Benchmark Results

LLM Deployment

LLM Finetuning

Prompt Management

LLM Tracing

​Deploying the LLM Benchmarking Tool

​Deployment via Application Catalog

​Model Configuration

​Basic Configuration Parameters

​Load Test Parameters

​Model Settings

​Finding Parameters for TrueFoundry Deployed Models

​Finding Parameters for External Models

​Finding Parameters for TrueFoundry AI Gateway Models

​Prompt Configuration

​Viewing Benchmark Results

Deploying the LLM Benchmarking Tool

Deployment via Application Catalog

Model Configuration

Basic Configuration Parameters

Load Test Parameters

Model Settings

Finding Parameters for TrueFoundry Deployed Models

Finding Parameters for External Models

Finding Parameters for TrueFoundry AI Gateway Models

Prompt Configuration

Viewing Benchmark Results