/metrics
endpoint that can be scraped by your Prometheus instance. This is the standard way to collect metrics.
Alternatively, if your setup uses an OpenTelemetry Collector, you can configure the gateway to push metrics directly. To do this, set the following environment variables for the tfy-llm-gateway
service:
ENABLE_OTEL_METRICS
: Set to "true"
.OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
: The endpoint of your OTEL metrics exporter.OTEL_EXPORTER_OTLP_METRICS_HEADERS
: (Optional) Headers for authentication.Label | Description |
---|---|
model_name | The name of the model used for the request (e.g., gpt-4o ). |
tenant_name | The name of the tenant associated with the request. |
username | The user associated with the request. |
tool_name | The name of the tool called by an agent (only for agent metrics). |
llm_gateway_metadata_* | Custom labels generated from LLM_GATEWAY_METADATA_LOGGING_KEYS . For example, customer_id becomes llm_gateway_metadata_customer_id . |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_input_tokens | Counter | The number of input tokens processed. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_output_tokens | Counter | The number of output tokens generated. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_request_cost | Counter | The estimated cost of the tokens used. | model_name , tenant_name , username , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_request_processing_ms | Histogram | The total time taken to process a request. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_first_token_latency_ms | Histogram | The time to receive the first token from the model. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_inter_token_latency_ms | Histogram | The average time between subsequent tokens in a stream. | model_name , tenant_name , username , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_request_model_inference_failure | Counter | The number of failed model inference requests. | model_name , tenant_name , username , llm_gateway_metadata_* |
llm_gateway_config_parsing_failures | Counter | The number of configuration parsing errors. | model_name , tenant_name , username , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_rate_limit_requests_total | Counter | Total number of requests that hit rate limits. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_load_balanced_requests_total | Counter | Total number of requests that were load balanced. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_fallback_requests_total | Counter | Total number of requests that were served by fallback. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
llm_gateway_budget_requests_total | Counter | Total number of requests that hit budget limits. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
lll_gateway_guardrails_requests_total | Counter | Total number of requests that hit guardrails. | model_name , tenant_name , username , ruleId , llm_gateway_metadata_* |
Metric Name | Type | Description | Labels |
---|---|---|---|
llm_gateway_agent_request_duration_ms | Histogram | The total duration of an agent request. | model_name , status , tenant_name , username |
llm_gateway_agent_request_tool_calls_total | Histogram | The number of tool calls in a single agent request. | model_name , tenant_name , username |
llm_gateway_agent_request_iteration_limit_reached_total | Counter | The number of requests hitting the iteration limit. | model_name , iteration_limit , tenant_name , username |
llm_gateway_agent_llm_latency_ms | Histogram | The latency of LLM calls within an agent. | model_name , iteration_number , status , tenant_name , username |
llm_gateway_agent_tool_calls_total | Counter | The total number of tool calls made by agents. | tool_name , integration_fqn , tenant_name , username |
llm_gateway_agent_tool_latency_ms | Histogram | The execution time of each tool call. | tool_name , integration_fqn , status , tenant_name , username |
llm_gateway_agent_integration_connect_latency_ms | Histogram | The time taken to connect to an MCP server and fetch tools. | integration_fqn (or server_url ), tenant_name , username |
customer_id
, add a variable to your dashboard: