Standardized Telemetry: Captures and exports traces, metrics, and logs using OpenTelemetry SDKs and conventions.
OTLP Export: Sends telemetry via the OpenTelemetry Protocol (OTLP) over HTTP, compatible with popular backends (Grafana, Datadog, Jaeger, etc.).
Example Trace Overview - AI Gateway:
AI Gateway - OpenTelemetry Tracing
Each row on the left represents a request to the endpoint, with the selected trace
showing a detailed breakdown of the request and its spans.
Highlighted Span – chatCompletions (LLM):
The highlighted span is of type genai (LLM), capturing the lifecycle of a large language model (LLM) inference request.
LLM Request Data:
Model: openai-main/gpt-4o
Max tokens: 200
Top-p: 1
Temperature: 0.1
Prompt and Completion:
The system prompt, user question, and assistant’s response are all visible, providing full transparency into the LLM interaction.
Span Metadata:
Includes span name, service name, trace and span IDs, and OTEL scope.
The following sections describe the various spans available in the AI Gateway and their attributes.
Span Name:chatCompletions
Description: Streaming spans are created when a request involves streaming data, such as chat completions. These spans capture the details of the streaming process, including the model used and the parameters affecting the streaming behavior.
Attributes:
gen_ai.request.model: The model being used for the chat completion request.
gen_ai.request.max_tokens: The maximum number of tokens allowed in the chat completion request.
gen_ai.request.temperature: The temperature setting used in the chat completion request.
gen_ai.operation.name: The operation being performed, such as ‘chat’.
gen_ai.system: The system or platform being used, e.g., ‘openai’.
gen_ai.request.top_p: The top-p sampling parameter used in the request.
gen_ai.system.message: Events related to system messages in the request.
gen_ai.user.message: Events related to user messages in the request.
gen_ai.assistant.message: Events related to assistant messages in the request.
gen_ai.tool.message: Events related to tool messages in the request.
gen_ai.unknown.message: Events related to unknown message roles in the request.
gen_ai.prompt.{index}.content: The content of the message at a specific index in the request.
gen_ai.prompt.{index}.role: The role of the message at a specific index in the request.
gen_ai.completion.{index}.content: The content of the completion message at a specific index.
gen_ai.completion.{index}.role: The role of the completion message at a specific index.
gen_ai.completion.{index}.finish_reason: The reason why the completion finished, at a specific index.
gen_ai.completion.{index}.tool_calls.{toolIndex}.name: The name of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.id: The ID of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.arguments: The arguments of the tool call at a specific index in the completion.
Usage: These spans help in identifying frequently used models and parameters, allowing for optimizations.
Chat Completions API
Span Name:agentResponsesHandler
Description: This span is created when handling agent responses. It captures details about the request method and URL.
Attributes:
handler.name: The name of the handler.
request.method: The HTTP method of the request.
request.url: The URL of the request.
gen_ai.prompt.{index}.content: The content of the message at a specific index in the request.
gen_ai.prompt.{index}.role: The role of the message at a specific index in the request.
gen_ai.completion.{index}.content: The content of the completion message at a specific index.
gen_ai.completion.{index}.role: The role of the completion message at a specific index.
gen_ai.completion.{index}.finish_reason: The reason why the completion finished, at a specific index.
gen_ai.completion.{index}.tool_calls.{toolIndex}.name: The name of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.id: The ID of the tool call at a specific index in the completion.
gen_ai.completion.{index}.tool_calls.{toolIndex}.arguments: The arguments of the tool call at a specific index in the completion.
Usage: This span helps in monitoring agent response handling.
Agent Responses API
Description: These spans are created during the process of connecting to an MCP server and listing available tools.
Spans:
MCP Server Initialization:
Span Name: MCP Server Initialization
Description: This span is created when initializing a connection to an MCP server.
Attributes:
mcp_server_fqn: The FQN of the MCP server being initialized.
MCP Server Initialization
Connect to MCP Server:
Span Name: Connect to MCP Server
Description: This span is created when establishing a connection to an MCP server.
Attributes:
mcp_server_url: The URL of the MCP server being connected to.
Connect to MCP Server
List Tools:
Span Name: List Tools
Description: This span is created when listing the tools available on an MCP server.
Attributes:
tools: The list of tools retrieved from the MCP server.
List Tools
Tool Call:
Span Name: Tool Call: <toolName>
Description: These spans are created for each tool call made during the processing of agent responses. They capture details about the tool being called and the arguments passed.
Attributes:
toolName: The name of the tool being called.
args: The arguments passed to the tool call.
integrationId: The integration ID associated with the tool call.
integrationFqn: The fully qualified name of the integration.
result: The result of the tool call.
status: The status of the tool call.
mcp_server_url: The URL of the MCP server used for the tool call.
tools: The list of tools used in the call.
Tool Call
Span Name: fallbackRequest
Description: Fallback spans are created when a request to a primary model fails and a fallback model is invoked. These spans capture the transition from the primary model to the fallback model.
Attributes:
fallback.http.url: The URL to which the fallback request is made.
fallback.http.method: The HTTP method used for the fallback request.
fallback.requested_model: The original model that was requested before the fallback.
fallback.resolved_model: The model that is used as a fallback.
fallback.config_id: The configuration ID associated with the fallback mechanism.
fallback.max_tokens: The maximum number of tokens allowed in the fallback request.
fallback.temperature: The temperature setting used in the fallback request.
Usage: These spans help in identifying frequently falling back models, allowing for adjustments.
Span Name: RateLimiterMiddleware
Description: These spans represent the execution of the rate limiting middleware. The span captures information about the user, the model being accessed, and the rate limiting rules applied.
Attributes:
rate_limiter.model: The model being accessed by the request.
rate_limiter.metadata: Additional metadata associated with the request.
rate_limiter.user.subject_type: The type of user making the request.
rate_limiter.user.subject_slug: A unique identifier for the user.
rate_limiter.user.tenant_name: The tenant or organization to which the user belongs.
rate_limiter.rules: The specific rate limiting rules applied to the request.
rate_limiter.rule.id: The ID of a specific rate limiting rule that was checked.
rate_limiter.status: The status of the rate limit check.
rate_limiter.remaining: The number of requests remaining before the rate limit is exceeded.
Usage: These spans help in identifying users or models frequently hitting rate limits.
Span Name: loadBalanceMiddleware
Description: Load balancing spans are created when a request is processed through the load balancing middleware. These spans capture the details of the load balancing process.
Attributes:
load_balance.http.url: The URL of the request being load balanced.
load_balance.http.method: The HTTP method of the request being load balanced.
user.tenantName: The tenant name of the user making the request.
load_balance.requested_model: The model that was initially requested for load balancing.
load_balance.resolved_model: The target model selected by the load balancing process.
Usage: These spans help in identifying frequently load balanced models.
Assistant
Responses are generated using AI and may contain mistakes.