The tfy-llm-gateway provides detailed, Prometheus-compatible metrics to monitor the health, performance, and cost of your LLM applications with Grafana.

Setup

The gateway exposes a /metrics endpoint that can be scraped by your Prometheus instance. This is the standard way to collect metrics. Alternatively, if your setup uses an OpenTelemetry Collector, you can configure the gateway to push metrics directly. To do this, set the following environment variables for the tfy-llm-gateway service:
  • ENABLE_OTEL_METRICS: Set to "true".
  • OTEL_EXPORTER_OTLP_METRICS_ENDPOINT: The endpoint of your OTEL metrics exporter.
  • OTEL_EXPORTER_OTLP_METRICS_HEADERS: (Optional) Headers for authentication.

Example Push Configuration

ENABLE_OTEL_METRICS: 'true'
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT: https://<prometheus-host>/api/v1/otlp/v1/metrics
OTEL_EXPORTER_OTLP_METRICS_HEADERS: 'Authorization=Bearer <your-token>'
LLM_GATEWAY_METADATA_LOGGING_KEYS: '["customer_id", "request_type"]'

Labels

Labels provide dimensions for filtering and aggregating metrics.
LabelDescription
model_nameThe name of the model used for the request (e.g., gpt-4o).
tenant_nameThe name of the tenant associated with the request.
usernameThe user associated with the request.
tool_nameThe name of the tool called by an agent (only for agent metrics).
llm_gateway_metadata_*Custom labels generated from LLM_GATEWAY_METADATA_LOGGING_KEYS. For example, customer_id becomes llm_gateway_metadata_customer_id.

Gateway Metrics

These metrics provide an overview of the gateway’s performance and usage.

Token Usage and Cost

Metric NameTypeDescriptionLabels
llm_gateway_input_tokensCounterThe number of input tokens processed.model_name, tenant_name, username, llm_gateway_metadata_*
llm_gateway_output_tokensCounterThe number of output tokens generated.model_name, tenant_name, username, llm_gateway_metadata_*
llm_gateway_request_costCounterThe estimated cost of the tokens used.model_name, tenant_name, username, llm_gateway_metadata_*

Latency

Metric NameTypeDescriptionLabels
llm_gateway_request_processing_msHistogramThe total time taken to process a request.model_name, tenant_name, username, llm_gateway_metadata_*
llm_gateway_first_token_latency_msHistogramThe time to receive the first token from the model.model_name, tenant_name, username, llm_gateway_metadata_*
llm_gateway_inter_token_latency_msHistogramThe average time between subsequent tokens in a stream.model_name, tenant_name, username, llm_gateway_metadata_*

Errors and Failures

Metric NameTypeDescriptionLabels
llm_gateway_request_model_inference_failureCounterThe number of failed model inference requests.model_name, tenant_name, username, llm_gateway_metadata_*
llm_gateway_config_parsing_failuresCounterThe number of configuration parsing errors.model_name, tenant_name, username, llm_gateway_metadata_*

Configuration Metrics

Metric NameTypeDescriptionLabels
llm_gateway_rate_limit_requests_totalCounterTotal number of requests that hit rate limits.model_name, tenant_name, username, ruleId, llm_gateway_metadata_*
llm_gateway_load_balanced_requests_totalCounterTotal number of requests that were load balanced.model_name, tenant_name, username, ruleId, llm_gateway_metadata_*
llm_gateway_fallback_requests_totalCounterTotal number of requests that were served by fallback.model_name, tenant_name, username, ruleId, llm_gateway_metadata_*
llm_gateway_budget_requests_totalCounterTotal number of requests that hit budget limits.model_name, tenant_name, username, ruleId, llm_gateway_metadata_*
lll_gateway_guardrails_requests_totalCounterTotal number of requests that hit guardrails.model_name, tenant_name, username, ruleId, llm_gateway_metadata_*

Agent-Specific Metrics

These metrics are for monitoring agent-based interactions.
Metric NameTypeDescriptionLabels
llm_gateway_agent_request_duration_msHistogramThe total duration of an agent request.model_name, status, tenant_name, username
llm_gateway_agent_request_tool_calls_totalHistogramThe number of tool calls in a single agent request.model_name, tenant_name, username
llm_gateway_agent_request_iteration_limit_reached_totalCounterThe number of requests hitting the iteration limit.model_name, iteration_limit, tenant_name, username
llm_gateway_agent_llm_latency_msHistogramThe latency of LLM calls within an agent.model_name, iteration_number, status, tenant_name, username
llm_gateway_agent_tool_calls_totalCounterThe total number of tool calls made by agents.tool_name, integration_fqn, tenant_name, username
llm_gateway_agent_tool_latency_msHistogramThe execution time of each tool call.tool_name, integration_fqn, status, tenant_name, username
llm_gateway_agent_integration_connect_latency_msHistogramThe time taken to connect to an MCP server and fetch tools.integration_fqn (or server_url), tenant_name, username

Grafana Integration

Use custom metadata to create powerful, filterable dashboards in Grafana.

Add a Dashboard Variable

To filter by a custom metadata key like customer_id, add a variable to your dashboard:
{
  "definition": "label_values(llm_gateway_input_tokens, llm_gateway_metadata_customer_id)",
  "label": "customer_id",
  "multi": true,
  "name": "customer_id",
  "query": "label_values(llm_gateway_input_tokens, llm_gateway_metadata_customer_id)",
  "refresh": 2,
  "type": "query"
}

Use the Variable in Queries

Update your Prometheus queries to use the variable:
sum(rate(llm_gateway_input_tokens{model_name=~"$model_name", tenant_name=~"$tenant_name", llm_gateway_metadata_customer_id=~"$customer_id"}[5m]))