The Gateway Model Metrics API provides aggregated insights on model usage, performance, cost, and user activity — helping you analyze usage trends and optimize spend.
Access control
Tenant admins: Can access metrics for the entire organization (tenant-wide).
Users: Can access their own data and their teams’ data.
Virtual accounts: Can access their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.
Authentication
You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT) .
To generate an API key:
Personal Access Token (PAT) : Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT) : Go to Access → Virtual Account Tokens (requires admin permissions)
For detailed authentication setup, see our Authentication guide .
Quick Start
Get started by fetching your first metrics report with a simple API call.
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={ "startTime" : "2025-01-21T00:00:00.000Z" }
)
print (response.json())
Key Parameters
startTime — Start time in ISO 8601 format (e.g., “2025-01-21T00:00:00.000Z”)
Authorization — Replace {your_api_key} with your personal or virtual account token
{
"data" : [
{
"user" : "example_virtualaccount" ,
"userType" : "virtualaccount" ,
"modelName" : "openai-main/text-embedding-ada-002" ,
"totalRequests" : 12717 ,
"totalInputTokens" : 202678 ,
"totalOutputTokens" : 0 ,
"avgLatencyMs" : 238.30577887866608 ,
"avgTimeToFirstToken" : 0 ,
"avgInterTokenLatencyMs" : 0 ,
"costInUSD" : 0.01
}
]
}
API Reference
Endpoint
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"endTime" : "2025-01-21T23:59:59.999Z" ,
"groupBy" : [ "modelName" , "teams" ]
}
)
print (response.json())
Parameters
ISO 8601 timestamp for the start of the data range (e.g., “2025-01-21T00:00:00.000Z”)
ISO 8601 timestamp for the end of the data range (e.g., “2025-01-21T23:59:59.999Z”). If not provided, data is fetched from startTime to the current time.
Array of fields to group the metrics by. Available options:
modelName - Group by model name
userEmail - Group by user email
virtualaccount - Group by virtual account
team - Group by team
metadata.your_custom_key - Group by metadata your_custom_key. you can use any metadata keys used during your requests logging
If the groupBy array is empty, the API returns a summarized overview of all requests within the specified time range.
The API returns metrics data in JSON format by default.
JSON Response Fields
Array of metrics objects containing aggregated usage data
Groupby Response Fields
These fields are included in the response if the corresponding field is specified in the groupBy array of the API request.
The name of the model used for the requests.
The name of the team used for the requests.
The email address of the user or the name of the virtual account associated with the requests. This field is included in the response if either the userEmail or virtualaccount options are specified in the groupBy array of the API request.
The type of user for the requests. This field is included in the response if either the userEmail or virtualaccount options are specified in the groupBy array of the API request. Possible values are user and virtualaccount.
data[].metadata_custom_key
Custom metadata key used for the requests.
Common Response Fields
Total number of requests made
Total input tokens consumed
Total output tokens generated
Average request latency in milliseconds
data[].avgTimeToFirstToken
Average time to first token in milliseconds
data[].avgInterTokenLatencyMs
Average inter-token latency in milliseconds
{
"data" : [
{
"modelName" : "openai-main/gpt-4o" ,
"team" : "developer" ,
"totalRequests" : 2 ,
"totalInputTokens" : 28 ,
"totalOutputTokens" : 16 ,
"avgLatencyMs" : 1669.8449999999998 ,
"avgTimeToFirstToken" : 1571.1599999999999 ,
"avgInterTokenLatencyMs" : 14.094999999999999 ,
"costInUSD" : 0
},
{
"modelName" : "openai-main/gpt-4o1" ,
"team" : "qa" ,
"totalRequests" : 1 ,
"totalInputTokens" : 0 ,
"totalOutputTokens" : 0 ,
"avgLatencyMs" : 0 ,
"avgTimeToFirstToken" : 0 ,
"avgInterTokenLatencyMs" : 0 ,
"costInUSD" : 0
}
]
}
The API returns data in JSON format by default, which is ideal for programmatic processing and integration with analytics tools.
Add the accept: text/csv header to receive data in CSV format, perfect for spreadsheet analysis and reporting.
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
'Accept' : 'text/csv' ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"endTime" : "2025-01-21T23:59:59.999Z" ,
"groupBy" : [ "modelName" , "team" ]
}
)
Model Name, Total Requests, Total Input Tokens, Total Output Tokens, Average Request Latency (ms), Average Time to First Token (ms), Average Inter-Token Latency (ms), Cost ($)
test-openai/gpt-4o, 223, 2874, 4411, 1364.84, 710.02, 12.32, 0.05
automation-model/gpt-4, 40, 1060, 561, 1026.32, 944.43, 7.96, 0.07
test-openai/gpt-4o-mini, 38, 17098, 1147, 1250.27, 705.06, 17.22, 0
llm-gateway-test-azure-openai/gpt-5, 38, 17618, 13321, 5655.01, 4871.13, 1.57, 0
test-sambanova/deepseek-r1, 36, 72, 1600, 1692.21, 1049.59, 3.92, 0
Common Use Cases
Fetch all metrics summary
Fetch all metrics for a specific time period:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" , # Start timestamp
"endTime" : "2025-01-21T23:59:59.999Z" , # end timestamp
"groupBy" : []
}
)
print (response.json())
{
"data" : [
{
"totalRequests" : 150 ,
"totalInputTokens" : 45000 ,
"totalOutputTokens" : 12000 ,
"avgLatencyMs" : 1250.5 ,
"avgTimeToFirstToken" : 800.2 ,
"avgInterTokenLatencyMs" : 15.8 ,
"costInUSD" : 0.25
}
]
}
Fetch all metrics for a specific time period by model:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"groupBy" : [ "modelName" ]
}
)
print (response.json())
{
"data" : [
{
"modelName" : "openai-main/gpt-4o" ,
"totalRequests" : 75 ,
"totalInputTokens" : 25000 ,
"totalOutputTokens" : 8000 ,
"avgLatencyMs" : 1400.2 ,
"avgTimeToFirstToken" : 900.5 ,
"avgInterTokenLatencyMs" : 18.3 ,
"costInUSD" : 0.15
},
{
"modelName" : "openai-main/gpt-3.5-turbo" ,
"totalRequests" : 75 ,
"totalInputTokens" : 20000 ,
"totalOutputTokens" : 4000 ,
"avgLatencyMs" : 1100.8 ,
"avgTimeToFirstToken" : 700.1 ,
"avgInterTokenLatencyMs" : 13.2 ,
"costInUSD" : 0.10
}
]
}
Fetch all metrics for a specific time period by team:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"groupBy" : [ "team" ]
}
)
print (response.json())
{
"data" : [
{
"team" : "teamA" ,
"totalRequests" : 75 ,
"totalInputTokens" : 25000 ,
"totalOutputTokens" : 8000 ,
"avgLatencyMs" : 1400.2 ,
"avgTimeToFirstToken" : 900.5 ,
"avgInterTokenLatencyMs" : 18.3 ,
"costInUSD" : 0.15
},
{
"team" : "teamB" ,
"totalRequests" : 75 ,
"totalInputTokens" : 20000 ,
"totalOutputTokens" : 4000 ,
"avgLatencyMs" : 1100.8 ,
"avgTimeToFirstToken" : 700.1 ,
"avgInterTokenLatencyMs" : 13.2 ,
"costInUSD" : 0.10
}
]
}
Fetch all metrics for a specific time period by user email:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"groupBy" : [ "userEmail" ]
}
)
print (response.json())
{
"data" : [
{
"user" : "user1@example.com" ,
"userType" : "user" ,
"totalRequests" : 75 ,
"totalInputTokens" : 25000 ,
"totalOutputTokens" : 8000 ,
"avgLatencyMs" : 1400.2 ,
"avgTimeToFirstToken" : 900.5 ,
"avgInterTokenLatencyMs" : 18.3 ,
"costInUSD" : 0.15
},
{
"user" : "user2@example.com" ,
"userType" : "user" ,
"totalRequests" : 75 ,
"totalInputTokens" : 20000 ,
"totalOutputTokens" : 4000 ,
"avgLatencyMs" : 1100.8 ,
"avgTimeToFirstToken" : 700.1 ,
"avgInterTokenLatencyMs" : 13.2 ,
"costInUSD" : 0.10
}
]
}
Fetch all metrics for a specific time period by virtual account:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"groupBy" : [ "virtualaccount" ]
}
)
print (response.json())
{
"data" : [
{
"user" : "virtualaccount_1" ,
"userType" : "virtualaccount" ,
"totalRequests" : 75 ,
"totalInputTokens" : 25000 ,
"totalOutputTokens" : 8000 ,
"avgLatencyMs" : 1400.2 ,
"avgTimeToFirstToken" : 900.5 ,
"avgInterTokenLatencyMs" : 18.3 ,
"costInUSD" : 0.15
},
{
"user" : "virtualaccount_2" ,
"userType" : "virtualaccount" ,
"totalRequests" : 75 ,
"totalInputTokens" : 20000 ,
"totalOutputTokens" : 4000 ,
"avgLatencyMs" : 1100.8 ,
"avgTimeToFirstToken" : 700.1 ,
"avgInterTokenLatencyMs" : 13.2 ,
"costInUSD" : 0.10
}
]
}
Group by user email and virtual account
Fetch all metrics for a specific time period by user email and virtual account:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"groupBy" : [ "userEmail" , "virtualaccount" ]
}
)
print (response.json())
{
"data" : [
{
"user" : "virtualaccount_1" ,
"userType" : "virtualaccount" ,
"totalRequests" : 75 ,
"totalInputTokens" : 25000 ,
"totalOutputTokens" : 8000 ,
"avgLatencyMs" : 1400.2 ,
"avgTimeToFirstToken" : 900.5 ,
"avgInterTokenLatencyMs" : 18.3 ,
"costInUSD" : 0.15
},
{
"user" : "user@example.com" ,
"userType" : "user" ,
"totalRequests" : 75 ,
"totalInputTokens" : 20000 ,
"totalOutputTokens" : 4000 ,
"avgLatencyMs" : 1100.8 ,
"avgTimeToFirstToken" : 700.1 ,
"avgInterTokenLatencyMs" : 13.2 ,
"costInUSD" : 0.10
}
]
}
Get detailed metrics for a specific time range with multiple grouping dimensions:
import requests
response = requests.post(
"https:// {your_control_plane_url} /api/svc/v1/llm-gateway/metrics/model/fetch" ,
headers ={
"Authorization" : "Bearer <your_api_key>" ,
"Content-Type" : "application/json"
},
json ={
"startTime" : "2025-01-21T00:00:00.000Z" ,
"endTime" : "2025-01-21T23:59:59.999Z" ,
"groupBy" : [
"modelName" ,
"virtualaccount" ,
"teams" ,
"metadata.department"
]
}
)
print (response.json())
{
"data" : [
{
"modelName" : "openai-main/gpt-4o" ,
"user" : "dev-team-account" ,
"userType" : "virtualaccount" ,
"teams" : "engineering" ,
"metadata_department" : "AI" ,
"totalRequests" : 25 ,
"totalInputTokens" : 8000 ,
"totalOutputTokens" : 3000 ,
"avgLatencyMs" : 1600.5 ,
"avgTimeToFirstToken" : 1000.2 ,
"avgInterTokenLatencyMs" : 20.1 ,
"costInUSD" : 0.05
},
{
"modelName" : "openai-main/gpt-3.5-turbo" ,
"user" : "qa-team-account" ,
"userType" : "virtualaccount" ,
"teams" : "quality-assurance" ,
"metadata_department" : "Testing" ,
"totalRequests" : 30 ,
"totalInputTokens" : 12000 ,
"totalOutputTokens" : 2000 ,
"avgLatencyMs" : 950.3 ,
"avgTimeToFirstToken" : 600.8 ,
"avgInterTokenLatencyMs" : 12.5 ,
"costInUSD" : 0.03
}
]
}