Tracing
OpenTelemetry tracing allows you to capture detailed traces of requests as they flow through the AI Gateway. This enables debugging, performance analysis, and end-to-end visibility.How to Enable Tracing
Set the following environment variables:- ENABLE_OTEL_TRACING: Set to
"true"
to enable OTEL tracing. - OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: The OTEL traces exporter endpoint (e.g., your collector or backend).
- OTEL_EXPORTER_OTLP_TRACES_HEADERS: Any required headers for authentication/configuration.
- OTEL_SERVICE_NAME: The name of your service as recognized by OTEL.
Example for TrueFoundry Tracing Project
Example Trace Overview

AI Gateway - OpenTelemetry Tracing
The highlighted span is of type
genai
(LLM), capturing the lifecycle of a large language model (LLM) inference request.
LLM Request Data:
- Model:
openai-main/gpt-4o
- Max tokens:
200
- Top-p:
1
- Temperature:
0.1
The system prompt, user question, and assistant’s response are all visible, providing full transparency into the LLM interaction. Span Metadata:
Includes span name, service name, trace and span IDs, and OTEL scope.
AI Gateway Spans
The following sections describe the various spans available in the AI Gateway and their attributes.Chat Completions API
Chat Completions API
Span Name: 
chatCompletions
Description: Streaming spans are created when a request involves streaming data, such as chat completions. These spans capture the details of the streaming process, including the model used and the parameters affecting the streaming behavior.Attributes:gen_ai.request.model
: The model being used for the chat completion request.gen_ai.request.max_tokens
: The maximum number of tokens allowed in the chat completion request.gen_ai.request.temperature
: The temperature setting used in the chat completion request.gen_ai.operation.name
: The operation being performed, such as ‘chat’.gen_ai.system
: The system or platform being used, e.g., ‘openai’.gen_ai.request.top_p
: The top-p sampling parameter used in the request.gen_ai.system.message
: Events related to system messages in the request.gen_ai.user.message
: Events related to user messages in the request.gen_ai.assistant.message
: Events related to assistant messages in the request.gen_ai.tool.message
: Events related to tool messages in the request.gen_ai.unknown.message
: Events related to unknown message roles in the request.gen_ai.prompt.{index}.content
: The content of the message at a specific index in the request.gen_ai.prompt.{index}.role
: The role of the message at a specific index in the request.gen_ai.completion.{index}.content
: The content of the completion message at a specific index.gen_ai.completion.{index}.role
: The role of the completion message at a specific index.gen_ai.completion.{index}.finish_reason
: The reason why the completion finished, at a specific index.gen_ai.completion.{index}.tool_calls.{toolIndex}.name
: The name of the tool call at a specific index in the completion.gen_ai.completion.{index}.tool_calls.{toolIndex}.id
: The ID of the tool call at a specific index in the completion.gen_ai.completion.{index}.tool_calls.{toolIndex}.arguments
: The arguments of the tool call at a specific index in the completion.

Chat Completions API
Agent Responses API
Agent Responses API
Span Name: 
agentResponsesHandler
Description: This span is created when handling agent responses. It captures details about the request method and URL.Attributes:handler.name
: The name of the handler.request.method
: The HTTP method of the request.request.url
: The URL of the request.gen_ai.prompt.{index}.content
: The content of the message at a specific index in the request.gen_ai.prompt.{index}.role
: The role of the message at a specific index in the request.gen_ai.completion.{index}.content
: The content of the completion message at a specific index.gen_ai.completion.{index}.role
: The role of the completion message at a specific index.gen_ai.completion.{index}.finish_reason
: The reason why the completion finished, at a specific index.gen_ai.completion.{index}.tool_calls.{toolIndex}.name
: The name of the tool call at a specific index in the completion.gen_ai.completion.{index}.tool_calls.{toolIndex}.id
: The ID of the tool call at a specific index in the completion.gen_ai.completion.{index}.tool_calls.{toolIndex}.arguments
: The arguments of the tool call at a specific index in the completion.

Agent Responses API
MCP Server Spans
MCP Server Spans
Description: These spans are created during the process of connecting to an MCP server and listing available tools.Spans:
- MCP Server Initialization:
-
Span Name:
MCP Server Initialization
- Description: This span is created when initializing a connection to an MCP server.
-
Attributes:
mcp_server_fqn
: The FQN of the MCP server being initialized.
MCP Server Initialization
-
Span Name:
- Connect to MCP Server:
-
Span Name:
Connect to MCP Server
- Description: This span is created when establishing a connection to an MCP server.
-
Attributes:
mcp_server_url
: The URL of the MCP server being connected to.
Connect to MCP Server
-
Span Name:
- List Tools:
-
Span Name:
List Tools
- Description: This span is created when listing the tools available on an MCP server.
-
Attributes:
tools
: The list of tools retrieved from the MCP server.
List Tools
-
Span Name:
- Tool Call:
-
Span Name:
Tool Call: <toolName>
- Description: These spans are created for each tool call made during the processing of agent responses. They capture details about the tool being called and the arguments passed.
-
Attributes:
toolName
: The name of the tool being called.args
: The arguments passed to the tool call.integrationId
: The integration ID associated with the tool call.integrationFqn
: The fully qualified name of the integration.result
: The result of the tool call.status
: The status of the tool call.mcp_server_url
: The URL of the MCP server used for the tool call.tools
: The list of tools used in the call.
Tool Call
-
Span Name:
Fallback
Fallback
Span Name:
fallbackRequest
Description: Fallback spans are created when a request to a primary model fails and a fallback model is invoked. These spans capture the transition from the primary model to the fallback model.Attributes:fallback.http.url
: The URL to which the fallback request is made.fallback.http.method
: The HTTP method used for the fallback request.fallback.requested_model
: The original model that was requested before the fallback.fallback.resolved_model
: The model that is used as a fallback.fallback.config_id
: The configuration ID associated with the fallback mechanism.fallback.max_tokens
: The maximum number of tokens allowed in the fallback request.fallback.temperature
: The temperature setting used in the fallback request.
Rate Limiting
Rate Limiting
Span Name:
RateLimiterMiddleware
Description: These spans represent the execution of the rate limiting middleware. The span captures information about the user, the model being accessed, and the rate limiting rules applied.Attributes:rate_limiter.model
: The model being accessed by the request.rate_limiter.metadata
: Additional metadata associated with the request.rate_limiter.user.subject_type
: The type of user making the request.rate_limiter.user.subject_slug
: A unique identifier for the user.rate_limiter.user.tenant_name
: The tenant or organization to which the user belongs.rate_limiter.rules
: The specific rate limiting rules applied to the request.rate_limiter.rule.id
: The ID of a specific rate limiting rule that was checked.rate_limiter.status
: The status of the rate limit check.rate_limiter.remaining
: The number of requests remaining before the rate limit is exceeded.
Load Balancing
Load Balancing
Span Name:
loadBalanceMiddleware
Description: Load balancing spans are created when a request is processed through the load balancing middleware. These spans capture the details of the load balancing process.Attributes:load_balance.http.url
: The URL of the request being load balanced.load_balance.http.method
: The HTTP method of the request being load balanced.user.tenantName
: The tenant name of the user making the request.load_balance.requested_model
: The model that was initially requested for load balancing.load_balance.resolved_model
: The target model selected by the load balancing process.