Learn how to monitor your TrueFoundry AI Gateway using Prometheus metrics and Grafana dashboards for performance, cost, and usage insights.
/metrics
endpoint that can be scraped by your Prometheus instance. This is the standard way to collect metrics.
Alternatively, if your setup uses an OpenTelemetry Collector, you can configure the gateway to push metrics directly. To do this, set the following environment variables for the tfy-llm-gateway
service:
ENABLE_OTEL_METRICS
: Set to "true"
.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
: The endpoint of your OTEL metrics exporter.OTEL_EXPORTER_OTLP_METRICS_HEADERS
: (Optional) Headers for authentication.Label | Description |
---|---|
model_name | The name of the model used for the request (e.g., gpt-4o ). |
tenant_name | The name of the tenant associated with the request. |
username | The user associated with the request. |
tool_name | The name of the tool called by an agent (only for agent metrics). |
llm_gateway_metadata_* | Custom labels generated from LLM_GATEWAY_METADATA_LOGGING_KEYS . For example, customer_id becomes llm_gateway_metadata_customer_id . |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_input_tokens | Counter | The number of input tokens processed. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_output_tokens | Counter | The number of output tokens generated. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_request_cost | Counter | The estimated cost of the tokens used. | model_name , tenant_name , username , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_request_processing_ms | Histogram | The total time taken to process a request. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_first_token_latency_ms | Histogram | The time to receive the first token from the model. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_inter_token_latency_ms | Histogram | The average time between subsequent tokens in a stream. | model_name , tenant_name , username , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_request_model_inference_failure | Counter | The number of failed model inference requests. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_config_parsing_failures | Counter | The number of configuration parsing errors. | model_name , tenant_name , username , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_rate_limit_requests_total | Counter | Total number of requests that hit rate limits. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_load_balanced_requests_total | Counter | Total number of requests that were load balanced. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_fallback_requests_total | Counter | Total number of requests that were served by fallback. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_budget_requests_total | Counter | Total number of requests that hit budget limits. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_guardrails_requests_total | Counter | Total number of requests that hit guardrails. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_agent_request_duration_ms | Histogram | The total duration of an agent request. | model_name , status , tenant_name , username |
llm_gateway_agent_request_tool_calls_total | Histogram | The number of tool calls in a single agent request. | model_name , tenant_name , username |
llm_gateway_agent_request_iteration_limit_reached_total | Counter | The number of requests hitting the iteration limit. | model_name , iteration_limit , tenant_name , username |
llm_gateway_agent_llm_latency_ms | Histogram | The latency of LLM calls within an agent. | model_name , iteration_number , status , tenant_name , username |
llm_gateway_agent_tool_calls_total | Counter | The total number of tool calls made by agents. | tool_name , integration_fqn , tenant_name , username |
llm_gateway_agent_tool_latency_ms | Histogram | The execution time of each tool call. | tool_name , integration_fqn , status , tenant_name , username |
llm_gateway_agent_mcp_connect_latency_ms | Histogram | The time taken to connect to an MCP server and fetch tools. | integration_fqn (or server_url ), tenant_name , username |
Note
You can find the Grafana Dashboard JSON at the following link:
https://github.com/truefoundry/infra-charts/blob/main/charts/tfy-grafana/dashboards/llm-gateway-metrics.json
model_name
to compare model performance.username
to monitor usage patterns.ruleId
to show the impact of gateway configurations.model_name
, tenant_name
, and username
. If you use custom metadata labels (via LLM_GATEWAY_METADATA_LOGGING_KEYS
), you can add them as filters to your dashboard for more granular analysis.
customer_id
, add a new variable to your dashboard settings:
customer_id
Query
Customer ID
label_values(llm_gateway_input_tokens, llm_gateway_metadata_customer_id)
customer_id
: