AI Gateway is OpenTelemetry (OTEL) compliant, making it easy to integrate with modern observability tools and platforms. Both Tracing and Metrics are supported for deep observability and monitoring.
OpenTelemetry tracing allows you to capture detailed traces of requests as they flow through the AI Gateway. This enables debugging, performance analysis, and end-to-end visibility.
Each row on the left represents a request to the endpoint, with the selected trace showing a detailed breakdown of the request and its spans.
Highlighted Span – chatCompletions (LLM):
The highlighted span is of type genai (LLM), capturing the lifecycle of a large language model (LLM) inference request.
LLM Request Data:
Model: openai-main/gpt-4o
Max tokens: 200
Top-p: 1
Temperature: 0.1
Prompt and Completion:
The system prompt, user question, and assistant’s response are all visible, providing full transparency into the LLM interaction.
Span Metadata:
Includes span name, service name, trace and span IDs, and OTEL scope.
The following sections describe the various spans available in the AI Gateway and their attributes.
Span Name:chatCompletions
Description: Streaming spans are created when a request involves streaming data, such as chat completions. These spans capture the details of the streaming process, including the model used and the parameters affecting the streaming behavior.
Attributes:
gen_ai.request.model: The model being used for the chat completion request.
gen_ai.request.max_tokens: The maximum number of tokens allowed in the chat completion request.
gen_ai.request.temperature: The temperature setting used in the chat completion request.
gen_ai.operation.name: The operation being performed, such as ‘chat’.
gen_ai.system: The system or platform being used, e.g., ‘openai’.
gen_ai.request.top_p: The top-p sampling parameter used in the request.
gen_ai.system.message: Events related to system messages in the request.
gen_ai.user.message: Events related to user messages in the request.
gen_ai.assistant.message: Events related to assistant messages in the request.
gen_ai.tool.message: Events related to tool messages in the request.
gen_ai.unknown.message: Events related to unknown message roles in the request.
gen_ai.prompt.{index}.content: The content of the message at a specific index in the request.
gen_ai.prompt.{index}.role: The role of the message at a specific index in the request.
gen_ai.completion.{index}.content: The content of the completion message at a specific index.
gen_ai.completion.{index}.role: The role of the completion message at a specific index.
gen_ai.completion.{index}.finish_reason: The reason why the completion finished, at a specific index.
gen_ai.completion.{index}.tool_calls.{toolIndex}.name: The name of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.id: The ID of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.arguments: The arguments of the tool call at a specific index in the completion.
Usage: These spans help in identifying frequently used models and parameters, allowing for optimizations.
Chat Completions API
Span Name:agentResponsesHandler
Description: This span is created when handling agent responses. It captures details about the request method and URL.
Attributes:
handler.name: The name of the handler.
request.method: The HTTP method of the request.
request.url: The URL of the request.
gen_ai.prompt.{index}.content: The content of the message at a specific index in the request.
gen_ai.prompt.{index}.role: The role of the message at a specific index in the request.
gen_ai.completion.{index}.content: The content of the completion message at a specific index.
gen_ai.completion.{index}.role: The role of the completion message at a specific index.
gen_ai.completion.{index}.finish_reason: The reason why the completion finished, at a specific index.
gen_ai.completion.{index}.tool_calls.{toolIndex}.name: The name of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.id: The ID of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.arguments: The arguments of the tool call at a specific index in the completion.
Usage: This span helps in monitoring agent response handling.
Agent Responses API
Description: These spans are created during the process of connecting to an MCP server and listing available tools.
Spans:
MCP Server Initialization:
Span Name: MCP Server Initialization
Description: This span is created when initializing a connection to an MCP server.
Attributes:
mcp_server_fqn: The FQN of the MCP server being initialized.
MCP Server Initialization
Connect to MCP Server:
Span Name: Connect to MCP Server
Description: This span is created when establishing a connection to an MCP server.
Attributes:
mcp_server_url: The URL of the MCP server being connected to.
Connect to MCP Server
List Tools:
Span Name: List Tools
Description: This span is created when listing the tools available on an MCP server.
Attributes:
tools: The list of tools retrieved from the MCP server.
List Tools
Tool Call:
Span Name: Tool Call: <toolName>
Description: These spans are created for each tool call made during the processing of agent responses. They capture details about the tool being called and the arguments passed.
Attributes:
toolName: The name of the tool being called.
args: The arguments passed to the tool call.
integrationId: The integration ID associated with the tool call.
integrationFqn: The fully qualified name of the integration.
result: The result of the tool call.
status: The status of the tool call.
mcp_server_url: The URL of the MCP server used for the tool call.
tools: The list of tools used in the call.
Tool Call
Span Name: fallbackRequest
Description: Fallback spans are created when a request to a primary model fails and a fallback model is invoked. These spans capture the transition from the primary model to the fallback model.
Attributes:
fallback.http.url: The URL to which the fallback request is made.
fallback.http.method: The HTTP method used for the fallback request.
fallback.requested_model: The original model that was requested before the fallback.
fallback.resolved_model: The model that is used as a fallback.
fallback.config_id: The configuration ID associated with the fallback mechanism.
fallback.max_tokens: The maximum number of tokens allowed in the fallback request.
fallback.temperature: The temperature setting used in the fallback request.
Usage: These spans help in identifying frequently falling back models, allowing for adjustments.
Span Name: RateLimiterMiddleware
Description: These spans represent the execution of the rate limiting middleware. The span captures information about the user, the model being accessed, and the rate limiting rules applied.
Attributes:
rate_limiter.model: The model being accessed by the request.
rate_limiter.metadata: Additional metadata associated with the request.
rate_limiter.user.subject_type: The type of user making the request.
rate_limiter.user.subject_slug: A unique identifier for the user.
rate_limiter.user.tenant_name: The tenant or organization to which the user belongs.
rate_limiter.rules: The specific rate limiting rules applied to the request.
rate_limiter.rule.id: The ID of a specific rate limiting rule that was checked.
rate_limiter.status: The status of the rate limit check.
rate_limiter.remaining: The number of requests remaining before the rate limit is exceeded.
Usage: These spans help in identifying users or models frequently hitting rate limits.
Span Name: loadBalanceMiddleware
Description: Load balancing spans are created when a request is processed through the load balancing middleware. These spans capture the details of the load balancing process.
Attributes:
load_balance.http.url: The URL of the request being load balanced.
load_balance.http.method: The HTTP method of the request being load balanced.
user.tenantName: The tenant name of the user making the request.
load_balance.requested_model: The model that was initially requested for load balancing.
load_balance.resolved_model: The target model selected by the load balancing process.
Usage: These spans help in identifying frequently load balanced models.