Skip to main content
The Gateway Model Metrics API provides aggregated insights on model usage, performance, cost, and user activity — helping you analyze usage trends and optimize spend.

Access control

  • Tenant admins: Can access metrics for the entire organization (tenant-wide).
  • Users: Can access their own data and their teams’ data.
  • Virtual accounts: Can access their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).
To generate an API key:
  1. Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
  2. Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)
For detailed authentication setup, see our Authentication guide.

Quick Start

Get started by fetching your first metrics report with a simple API call.
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={"startTime": "2025-01-21T00:00:00.000Z"}
)

print(response.json())

Key Parameters

  • startTime — Start time in ISO 8601 format (e.g., “2025-01-21T00:00:00.000Z”)
  • Authorization — Replace {your_api_key} with your personal or virtual account token
{
    "data": [
        {
            "user": "example_virtualaccount",
            "userType": "virtualaccount",
            "modelName": "openai-main/text-embedding-ada-002",
            "totalRequests": 12717,
            "totalInputTokens": 202678,
            "totalOutputTokens": 0,
            "avgLatencyMs": 238.30577887866608,
            "avgTimeToFirstToken": 0,
            "avgInterTokenLatencyMs": 0,
            "costInUSD": 0.01
        }
    ]
}

API Reference

Endpoint

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "endTime": "2025-01-21T23:59:59.999Z",
        "groupBy": ["modelName", "teams"]
    }
)

print(response.json())

Parameters

startTime
string
required
ISO 8601 timestamp for the start of the data range (e.g., “2025-01-21T00:00:00.000Z”)
endTime
string
ISO 8601 timestamp for the end of the data range (e.g., “2025-01-21T23:59:59.999Z”). If not provided, data is fetched from startTime to the current time.
groupBy
array
Array of fields to group the metrics by. Available options:
  • modelName - Group by model name
  • userEmail - Group by user email
  • virtualaccount - Group by virtual account
  • team - Group by team
  • metadata.your_custom_key - Group by metadata your_custom_key. you can use any metadata keys used during your requests logging
If the groupBy array is empty, the API returns a summarized overview of all requests within the specified time range.

Response Format

The API returns metrics data in JSON format by default.

JSON Response Fields

data
array
Array of metrics objects containing aggregated usage data
Groupby Response Fields These fields are included in the response if the corresponding field is specified in the groupBy array of the API request.
data[].modelName
string
The name of the model used for the requests.
data[].team
string
The name of the team used for the requests.
data[].user
string
The email address of the user or the name of the virtual account associated with the requests. This field is included in the response if either the userEmail or virtualaccount options are specified in the groupBy array of the API request.
data[].userType
string
The type of user for the requests. This field is included in the response if either the userEmail or virtualaccount options are specified in the groupBy array of the API request. Possible values are user and virtualaccount.
data[].metadata_custom_key
string
Custom metadata key used for the requests.
Common Response Fields
data[].totalRequests
number
Total number of requests made
data[].totalInputTokens
number
Total input tokens consumed
data[].totalOutputTokens
number
Total output tokens generated
data[].avgLatencyMs
number
Average request latency in milliseconds
data[].avgTimeToFirstToken
number
Average time to first token in milliseconds
data[].avgInterTokenLatencyMs
number
Average inter-token latency in milliseconds
data[].costInUSD
number
Total cost in USD
{
    "data": [
        {
            "modelName": "openai-main/gpt-4o",
            "team": "developer",
            "totalRequests": 2,
            "totalInputTokens": 28,
            "totalOutputTokens": 16,
            "avgLatencyMs": 1669.8449999999998,
            "avgTimeToFirstToken": 1571.1599999999999,
            "avgInterTokenLatencyMs": 14.094999999999999,
            "costInUSD": 0
        },
        {
            "modelName": "openai-main/gpt-4o1",
            "team": "qa",
            "totalRequests": 1,
            "totalInputTokens": 0,
            "totalOutputTokens": 0,
            "avgLatencyMs": 0,
            "avgTimeToFirstToken": 0,
            "avgInterTokenLatencyMs": 0,
            "costInUSD": 0
        }
    ]
}

Response Formats

JSON Format (Default)

The API returns data in JSON format by default, which is ideal for programmatic processing and integration with analytics tools.

CSV Format

Add the accept: text/csv header to receive data in CSV format, perfect for spreadsheet analysis and reporting.
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        'Accept': 'text/csv',
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "endTime": "2025-01-21T23:59:59.999Z",
        "groupBy": ["modelName", "team"]
    }
)
Model Name,Total Requests,Total Input Tokens,Total Output Tokens,Average Request Latency (ms),Average Time to First Token (ms),Average Inter-Token Latency (ms),Cost ($)
test-openai/gpt-4o,223,2874,4411,1364.84,710.02,12.32,0.05
automation-model/gpt-4,40,1060,561,1026.32,944.43,7.96,0.07
test-openai/gpt-4o-mini,38,17098,1147,1250.27,705.06,17.22,0
llm-gateway-test-azure-openai/gpt-5,38,17618,13321,5655.01,4871.13,1.57,0
test-sambanova/deepseek-r1,36,72,1600,1692.21,1049.59,3.92,0

Common Use Cases

Fetch all metrics for a specific time period:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",  # Start timestamp
        "endTime": "2025-01-21T23:59:59.999Z", # end timestamp
        "groupBy": []
    }
)

print(response.json())
{
    "data": [
        {
            "totalRequests": 150,
            "totalInputTokens": 45000,
            "totalOutputTokens": 12000,
            "avgLatencyMs": 1250.5,
            "avgTimeToFirstToken": 800.2,
            "avgInterTokenLatencyMs": 15.8,
            "costInUSD": 0.25
        }
    ]
}
Fetch all metrics for a specific time period by model:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["modelName"]
    }
)

print(response.json())
{
    "data": [
        {
            "modelName": "openai-main/gpt-4o",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "modelName": "openai-main/gpt-3.5-turbo",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}
Fetch all metrics for a specific time period by team:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["team"]
    }
)

print(response.json())
{
    "data": [
        {
            "team": "teamA",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "team": "teamB",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}
Fetch all metrics for a specific time period by user email:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["userEmail"]
    }
)

print(response.json())
{
    "data": [
        {
            "user": "user1@example.com",
            "userType": "user",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "user": "user2@example.com",
            "userType": "user",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}
Fetch all metrics for a specific time period by virtual account:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["virtualaccount"]
    }
)

print(response.json())
{
    "data": [
        {
            "user": "virtualaccount_1",
            "userType": "virtualaccount",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "user": "virtualaccount_2",
            "userType": "virtualaccount",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}
Fetch all metrics for a specific time period by user email and virtual account:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["userEmail", "virtualaccount"]
    }
)

print(response.json())
{
    "data": [
        {
            "user": "virtualaccount_1",
            "userType": "virtualaccount",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "user": "user@example.com",
            "userType": "user",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}
Fetch all metrics for a specific time period by custom metadata key:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["metadata.custom"]
    }
)

print(response.json())
{
    "data": [
        {
            "metadata_custom": "value1",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "metadata_custom": "value2",
            "totalRequests": 74,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        }
    ]
}
Get detailed metrics for a specific time range with multiple grouping dimensions:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "endTime": "2025-01-21T23:59:59.999Z",
        "groupBy": [
            "modelName",
            "virtualaccount",
            "teams",
            "metadata.department"
        ]
    }
)

print(response.json())
{
    "data": [
        {
            "modelName": "openai-main/gpt-4o",
            "user": "dev-team-account",
            "userType": "virtualaccount",
            "teams": "engineering",
            "metadata_department": "AI",
            "totalRequests": 25,
            "totalInputTokens": 8000,
            "totalOutputTokens": 3000,
            "avgLatencyMs": 1600.5,
            "avgTimeToFirstToken": 1000.2,
            "avgInterTokenLatencyMs": 20.1,
            "costInUSD": 0.05
        },
        {
            "modelName": "openai-main/gpt-3.5-turbo",
            "user": "qa-team-account",
            "userType": "virtualaccount",
            "teams": "quality-assurance",
            "metadata_department": "Testing",
            "totalRequests": 30,
            "totalInputTokens": 12000,
            "totalOutputTokens": 2000,
            "avgLatencyMs": 950.3,
            "avgTimeToFirstToken": 600.8,
            "avgInterTokenLatencyMs": 12.5,
            "costInUSD": 0.03
        }
    ]
}
I