Truefoundry Docs

The Gateway Model Metrics API provides aggregated insights on model usage, performance, cost, and user activity — helping you analyze usage trends and optimize spend.

Access control

Tenant admins: Can access metrics for the entire organization (tenant-wide).
Users: Can access their own data and their teams’ data.
Virtual accounts: Can access their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).

Get your API key

To generate an API key:

Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)

For detailed authentication setup, see our Authentication guide.

Quick Start

Get started by fetching your first metrics report with a simple API call.

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={"startTime": "2025-01-21T00:00:00.000Z"}
)

print(response.json())

Key Parameters

startTime — Start time in ISO 8601 format (e.g., “2025-01-21T00:00:00.000Z”)
Authorization — Replace {your_api_key} with your personal or virtual account token

{
    "data": [
        {
            "user": "example_virtualaccount",
            "userType": "virtualaccount",
            "modelName": "openai-main/text-embedding-ada-002",
            "totalRequests": 12717,
            "totalInputTokens": 202678,
            "totalOutputTokens": 0,
            "avgLatencyMs": 238.30577887866608,
            "avgTimeToFirstToken": 0,
            "avgInterTokenLatencyMs": 0,
            "costInUSD": 0.01
        }
    ]
}

API Reference

Endpoint

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "endTime": "2025-01-21T23:59:59.999Z",
        "groupBy": ["modelName", "teams"]
    }
)

print(response.json())

Parameters

startTime

string

required

ISO 8601 timestamp for the start of the data range (e.g., “2025-01-21T00:00:00.000Z”)

endTime

string

ISO 8601 timestamp for the end of the data range (e.g., “2025-01-21T23:59:59.999Z”). If not provided, data is fetched from startTime to the current time.

groupBy

array

Array of fields to group the metrics by. Available options:

modelName - Group by model name
userEmail - Group by user email
virtualaccount - Group by virtual account
team - Group by team
metadata.your_custom_key - Group by metadata your_custom_key. you can use any metadata keys used during your requests logging

If the groupBy array is empty, the API returns a summarized overview of all requests within the specified time range.

Response Format

The API returns metrics data in JSON format by default.

JSON Response Fields

data

array

Array of metrics objects containing aggregated usage data

Groupby Response Fields These fields are included in the response if the corresponding field is specified in the groupBy array of the API request.

data[].modelName

string

The name of the model used for the requests.

data[].team

string

The name of the team used for the requests.

data[].user

string

The email address of the user or the name of the virtual account associated with the requests. This field is included in the response if either the userEmail or virtualaccount options are specified in the groupBy array of the API request.

data[].userType

string

The type of user for the requests. This field is included in the response if either the userEmail or virtualaccount options are specified in the groupBy array of the API request. Possible values are user and virtualaccount.

data[].metadata_custom_key

string

Custom metadata key used for the requests.

Common Response Fields

data[].totalRequests

number

Total number of requests made

data[].totalInputTokens

number

Total input tokens consumed

data[].totalOutputTokens

number

Total output tokens generated

data[].avgLatencyMs

number

Average request latency in milliseconds

data[].avgTimeToFirstToken

number

Average time to first token in milliseconds

data[].avgInterTokenLatencyMs

number

Average inter-token latency in milliseconds

data[].costInUSD

number

Total cost in USD

{
    "data": [
        {
            "modelName": "openai-main/gpt-4o",
            "team": "developer",
            "totalRequests": 2,
            "totalInputTokens": 28,
            "totalOutputTokens": 16,
            "avgLatencyMs": 1669.8449999999998,
            "avgTimeToFirstToken": 1571.1599999999999,
            "avgInterTokenLatencyMs": 14.094999999999999,
            "costInUSD": 0
        },
        {
            "modelName": "openai-main/gpt-4o1",
            "team": "qa",
            "totalRequests": 1,
            "totalInputTokens": 0,
            "totalOutputTokens": 0,
            "avgLatencyMs": 0,
            "avgTimeToFirstToken": 0,
            "avgInterTokenLatencyMs": 0,
            "costInUSD": 0
        }
    ]
}

Response Formats

JSON Format (Default)

The API returns data in JSON format by default, which is ideal for programmatic processing and integration with analytics tools.

CSV Format

Add the accept: text/csv header to receive data in CSV format, perfect for spreadsheet analysis and reporting.

CSV Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        'Accept': 'text/csv',
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "endTime": "2025-01-21T23:59:59.999Z",
        "groupBy": ["modelName", "team"]
    }
)

CSV Response

Model Name,Total Requests,Total Input Tokens,Total Output Tokens,Average Request Latency (ms),Average Time to First Token (ms),Average Inter-Token Latency (ms),Cost ($)
test-openai/gpt-4o,223,2874,4411,1364.84,710.02,12.32,0.05
automation-model/gpt-4,40,1060,561,1026.32,944.43,7.96,0.07
test-openai/gpt-4o-mini,38,17098,1147,1250.27,705.06,17.22,0
llm-gateway-test-azure-openai/gpt-5,38,17618,13321,5655.01,4871.13,1.57,0
test-sambanova/deepseek-r1,36,72,1600,1692.21,1049.59,3.92,0

Common Use Cases

Fetch all metrics summary

Fetch all metrics for a specific time period:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",  # Start timestamp
        "endTime": "2025-01-21T23:59:59.999Z", # end timestamp
        "groupBy": []
    }
)

print(response.json())

Response

{
    "data": [
        {
            "totalRequests": 150,
            "totalInputTokens": 45000,
            "totalOutputTokens": 12000,
            "avgLatencyMs": 1250.5,
            "avgTimeToFirstToken": 800.2,
            "avgInterTokenLatencyMs": 15.8,
            "costInUSD": 0.25
        }
    ]
}

Group by model name

Fetch all metrics for a specific time period by model:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["modelName"]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "modelName": "openai-main/gpt-4o",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "modelName": "openai-main/gpt-3.5-turbo",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}

Group by team

Fetch all metrics for a specific time period by team:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["team"]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "team": "teamA",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "team": "teamB",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}

Group by user email

Fetch all metrics for a specific time period by user email:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["userEmail"]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "user": "user1@example.com",
            "userType": "user",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "user": "user2@example.com",
            "userType": "user",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}

Group by virtual account

Fetch all metrics for a specific time period by virtual account:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["virtualaccount"]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "user": "virtualaccount_1",
            "userType": "virtualaccount",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "user": "virtualaccount_2",
            "userType": "virtualaccount",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}

Group by user email and virtual account

Fetch all metrics for a specific time period by user email and virtual account:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["userEmail", "virtualaccount"]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "user": "virtualaccount_1",
            "userType": "virtualaccount",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "user": "user@example.com",
            "userType": "user",
            "totalRequests": 75,
            "totalInputTokens": 20000,
            "totalOutputTokens": 4000,
            "avgLatencyMs": 1100.8,
            "avgTimeToFirstToken": 700.1,
            "avgInterTokenLatencyMs": 13.2,
            "costInUSD": 0.10
        }
    ]
}

Group by metadata key

Fetch all metrics for a specific time period by custom metadata key:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "groupBy": ["metadata.custom"]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "metadata_custom": "value1",
            "totalRequests": 75,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        },
        {
            "metadata_custom": "value2",
            "totalRequests": 74,
            "totalInputTokens": 25000,
            "totalOutputTokens": 8000,
            "avgLatencyMs": 1400.2,
            "avgTimeToFirstToken": 900.5,
            "avgInterTokenLatencyMs": 18.3,
            "costInUSD": 0.15
        }
    ]
}

With multiple groupings

Get detailed metrics for a specific time range with multiple grouping dimensions:

Request

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/model/fetch",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTime": "2025-01-21T00:00:00.000Z",
        "endTime": "2025-01-21T23:59:59.999Z",
        "groupBy": [
            "modelName",
            "virtualaccount",
            "teams",
            "metadata.department"
        ]
    }
)

print(response.json())

Response

{
    "data": [
        {
            "modelName": "openai-main/gpt-4o",
            "user": "dev-team-account",
            "userType": "virtualaccount",
            "teams": "engineering",
            "metadata_department": "AI",
            "totalRequests": 25,
            "totalInputTokens": 8000,
            "totalOutputTokens": 3000,
            "avgLatencyMs": 1600.5,
            "avgTimeToFirstToken": 1000.2,
            "avgInterTokenLatencyMs": 20.1,
            "costInUSD": 0.05
        },
        {
            "modelName": "openai-main/gpt-3.5-turbo",
            "user": "qa-team-account",
            "userType": "virtualaccount",
            "teams": "quality-assurance",
            "metadata_department": "Testing",
            "totalRequests": 30,
            "totalInputTokens": 12000,
            "totalOutputTokens": 2000,
            "avgLatencyMs": 950.3,
            "avgTimeToFirstToken": 600.8,
            "avgInterTokenLatencyMs": 12.5,
            "costInUSD": 0.03
        }
    ]
}

Get Started

Developer Guide

MCP Registry and Gateway

Prompt Management

Observability

Integrations

Deployment

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

Fetch Gateway Model Metrics

Access control

Authentication

Quick Start

Key Parameters

API Reference

Endpoint

Parameters

Response Format

JSON Response Fields

Response Formats

JSON Format (Default)

CSV Format

Common Use Cases

Get Started

Developer Guide

MCP Registry and Gateway

Prompt Management

Observability

Integrations

Deployment

API Reference

Chat

Agent

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Models

​Access control

​Authentication

​Quick Start

​Key Parameters

​API Reference

​Endpoint

​Parameters

​Response Format

​JSON Response Fields

​Response Formats

​JSON Format (Default)

​CSV Format

​Common Use Cases

Access control

Authentication

Quick Start

Key Parameters

API Reference

Endpoint

Parameters

Response Format

JSON Response Fields

Response Formats

JSON Format (Default)

CSV Format

Common Use Cases