Skip to main content
The LLM Gateway delivers detailed request logs through tracing. Retrieve these logs using the spans query API.
To better understand tracing concepts like traces, spans, and how they work together, see our tracing overview.Example: A chat completion request creates a span hierarchy: Chat Completion Span (parent) → Model Span (child, stores input/output tokens, metrics, costs) → Network Call Span (child, actual provider call).

Chat completion request span hierarchy

Setup the TrueFoundry CLI

Begin by installing the TrueFoundry SDK. Follow the CLI Setup guide for instructions.

Common Use Cases

Define the time range for your query. Use ISO 8601 format for timestamps.
from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    application_names=["tfy-llm-gateway"],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))
A root span is the top-level span in a trace hierarchy that has no parent span.Define the time range for your query. Use ISO 8601 format for timestamps.

Creating tracing projects with collaborators

from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    application_names=["tfy-llm-gateway"],
    parent_span_ids=[""],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))
Define the time range for your query. Use ISO 8601 format for timestamps.
from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    application_names=["tfy-llm-gateway"],
    created_by_subject_types=["virtualaccount"],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))
Define the time range for your query. Use ISO 8601 format for timestamps.
from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    application_names=["tfy-llm-gateway"],
    created_by_subject_slugs=["exampleaccount"],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))
Define the time range for your query. Use ISO 8601 format for timestamps.
from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    application_names=["tfy-llm-gateway"],
    created_by_subject_types=["user"],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))
Define the time range for your query. Use ISO 8601 format for timestamps.
from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    application_names=["tfy-llm-gateway"],
    created_by_subject_slugs=["example@email.com"],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))
Fetch spans of a specific traceId
from truefoundry import client
from truefoundry_sdk import SortDirection

spans = client.traces.query_spans(
    tracing_project_fqn="truefoundry:tracing-project:tfy-default",
    start_time="2025-10-08T00:00:00.000Z",
    end_time="2025-10-08T23:59:59.999Z",
    trace_ids=[
        "0199c25e124a70989b0455584fbbf7b7"
    ],
    application_names=["tfy-llm-gateway"],
    limit=200,
    sort_direction=SortDirection.DESC
)

# Process all spans across all pages
for span in spans:
    print(span.span_name, span.duration, span.span_attributes.get("tfy.span_type"))

Understanding Span Attributes

Each span you query from LLM Gateway captures key request and model details. Recognizing these attributes helps you analyze and debug usage effectively.
This example shows the span attributes for a model request span (tfy.span_type: "Model"). Model spans capture the actual LLM inference call with complete input/output data, performance metrics (latency, token counts, cost), model configuration details, and error handling information. This span type is essential for monitoring model performance, tracking costs, and debugging inference issues.
 {
    "tfy.model.fqn": "truefoundry:openai:openai-main:model:gpt-5",
    "tfy.model.id": "cme6zartb0v0701pjeqk9fulg",
    "tfy.model.name": "openai-main/gpt-5",
    "tfy.input_short_hand": {
      "text": "What is 2+3?",
      "has_file": false,
      "has_image": false,
      "has_audio": false
    },
    "tfy.model.request_url": "https://api.openai.com/v1/chat/completions",
    "tfy.model.request_type": "ChatCompletion",
    "tfy.error_message": "",
    "tfy.model.metric.cost_in_usd": 0.00011625,
    "tfy.model.metric.inter_token_latency_in_ms": 3.03,
    "tfy.model.metric.time_to_first_token_in_ms": 1989.66,
    "tfy.model.metric.input_tokens": 13,
    "tfy.model.metric.latency_in_ms": 2016.9,
    "tfy.model.metric.output_tokens": 10,
    "http.response.status_code": 200,
    "tfy.should_trace": true,
    "tfy.model.streaming": true,
    "tfy.input": {
      "model": "openai-main/gpt-5",
      "messages": [
        {
          "role": "user",
          "content": "What is 2+3?"
        }
      ],
      "stream": true
    },
    "tfy.output": {
      "id": "chatcmpl-CRLtMzfkslZF0XJhFZm9xjL2jmApa",
      "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 10,
        "cache_read_tokens": 0,
        "cache_write_tokens": 0,
        "reasoning_tokens": 0,
        "total_tokens": 23
      },
      "object": "chat.completion",
      "model": "gpt-5-2025-08-07",
      "created": 1760635044,
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "logprobs": null,
          "message": {
            "role": "assistant",
            "content": "5",
            "refusal": null
          }
        }
      ]
    },
    "tfy.span_type": "Model"
  }
This example shows the span attributes for an agent response span (tfy.span_type: "AgentResponse"). Agent response spans capture the orchestration of multiple tools and MCP servers, including complete input/output data, network details, tool execution results, and how the agent combines outputs from different tools into a coherent response. This span type is crucial for understanding agent behavior, tool usage patterns, and debugging multi-step agent workflows.
{
    "tfy.input_short_hand": {
      "text": "Call all tools",
      "has_file": false,
      "has_image": false,
      "has_audio": false
    },
    "tfy.model.request_type": "AgentResponse",
    "net.host.name": "internal.devtest.truefoundry.tech",
    "http.url": "http://internal.devtest.truefoundry.tech/agent/responses",
    "http.target": "/agent/responses",
    "tfy.model.name": "openai-main/gpt-5",
    "http.method": "POST",
    "http.scheme": "http",
    "http.host": "internal.devtest.truefoundry.tech",
    "http.status_code": 200,
    "tfy.input": {
      "model": "openai-main/gpt-5",
      "messages": [
        {
          "role": "user",
          "content": "Call all tools"
        }
      ],
      "stream": true,
      "mcp_servers": [
        {
          "integration_fqn": "common-tools",
          "enable_all_tools": false,
          "tools": [
            {
              "name": "web_search"
            },
            {
              "name": "code_executor"
            }
          ]
        }
      ]
    },
    "tfy.output": {
      "id": "chatcmpl-85318088-cd77-468b-b4c8-5980ae662bd3",
      "object": "chat.completion",
      "created": 1760636048,
      "model": "openai-main/gpt-5",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "I've called all available tools for you. Here are the results:\n\n- Web search (query: \"Hello world test search\"):\n  - Returned an overview of the classic "Hello, World!" program (example in C) and its use as a basic test across languages.\n  - Included sources such as:\n    - Wikipedia: https://en.wikipedia.org/wiki/%22Hello,_World!%22_program\n    - Programiz (C example): https://www.programiz.com/c-programming/examples/print-sentence\n    - GeeksforGeeks (Java/C examples): https://www.geeksforgeeks.org/java/java-hello-world-program/ and https://www.geeksforgeeks.org/c/c-hello-world-program/\n    - Docker Hub "hello-world" image: https://hub.docker.com/_/hello-world\n    - Rust by Example: https://doc.rust-lang.org/rust-by-example/hello.html\n\n- Code executor output:\n  - Stdout:\n    - Hello from the code executor\n    - Sum 1..10 = 55\n  - Stderr: none\n\nWould you like me to run a specific search or execute different code next?"
          },
          "logprobs": null,
          "finish_reason": "stop"
        }
      ]
    },
    "tfy.span_type": "AgentResponse"
  }

Core Span Attributes

AttributeDescription
tfy.span_typeType of span, with possible values:
"ChatCompletion", "Completion", "MCP", "Rerank", "Embedding", "Model", "AgentResponse", "Guardrail"
tfy.tracing_project_fqnFully qualified name of the tracing project
tfy.inputComplete input data sent to the model, mcp_server, guardrail, etc..
tfy.outputComplete output response from the model, mcp_server, guardrail, etc..
tfy.input_short_handAbbreviated version of the input for display purposes
tfy.error_messageError message if the request failed
tfy.prompt_version_fqnFQN of the prompt version used (if applicable)
tfy.prompt_variablesVariables used in prompt templating
tfy.triggered_guardrail_fqnsList of guardrails that were triggered during the request

Request Context Attributes

AttributeDescription
tfy.request.model_nameName of the model that was requested
tfy.request.created_by_subjectSubject (user/service account) that made the request
tfy.request.created_by_subject_teamsTeams associated with the requesting subject
tfy.request.metadataAdditional metadata associated with the request
tfy.request.conversation_idUnique identifier for the conversation (if part of a chat)

Model Attributes

AttributeDescription
tfy.model.idUnique identifier of the model
tfy.model.nameDisplay name of the model
tfy.model.fqnFully qualified name of the model
tfy.model.request_urlURL endpoint used for the model request
tfy.model.streamingWhether the request used streaming mode
tfy.model.request_typeType of request (e.g., “chat”, “completion”)

Model Performance Metrics

AttributeDescription
tfy.model.metric.time_to_first_token_in_msTime taken to receive the first token (streaming)
tfy.model.metric.latency_in_msTotal request latency in milliseconds
tfy.model.metric.input_tokensNumber of tokens in the model input
tfy.model.metric.output_tokensNumber of tokens in the model output
tfy.model.metric.cost_in_usdCost of the request in USD
tfy.model.metric.inter_token_latency_in_msAverage latency between tokens (streaming)

Load Balancing Attributes

AttributeDescription
applied_loadbalance_rule_idsIDs of load balancing rules that were applied

Budget Control Attributes

AttributeDescription
applied_budget_rule_idsIDs of budget rules that were applied to this request

Rate Limiting Attributes

AttributeDescription
applied_ratelimit_rule_idsIDs of all rate limiting rules that were applied

MCP (Model Context Protocol) Server Attributes

AttributeDescription
tfy.mcp_server.idUnique identifier of the MCP server
tfy.mcp_server.nameDisplay name of the MCP server
tfy.mcp_server.urlURL endpoint of the MCP server
tfy.mcp_server.fqnFully qualified name of the MCP server
tfy.mcp_server.server_nameInternal name of the MCP server
tfy.mcp_server.methodMCP method that was called
tfy.mcp_server.primitive_nameName of the MCP primitive used
tfy.mcp_server.error_codeError code if the MCP call failed
tfy.mcp_server.is_tool_call_execution_errorWhether the error was from tool call execution

MCP Server Metrics

AttributeDescription
tfy.mcp_server.metric.latency_in_msLatency of the MCP server call in milliseconds
tfy.mcp_server.metric.number_of_toolsNumber of tools available in the MCP server

Guardrail Attributes

AttributeDescription
tfy.guardrail.idUnique identifier of the guardrail
tfy.guardrail.nameDisplay name of the guardrail
tfy.guardrail.fqnFully qualified name of the guardrail
tfy.guardrail.resultResult of the guardrail check (e.g., “passed”, “failed”, “blocked”)

Guardrail Applied Entity Attributes

AttributeDescription
tfy.guardrail.applied_on_entity.typeType of entity the guardrail was applied to
tfy.guardrail.applied_on_entity.idID of the entity
tfy.guardrail.applied_on_entity.nameName of the entity
tfy.guardrail.applied_on_entity.fqnFQN of the entity
tfy.guardrail.applied_on_entity.scopeScope of the entity

Guardrail Metrics

AttributeDescription
tfy.guardrail.metric.latency_in_msTime taken for the guardrail check in milliseconds

HTTP Response Attributes

AttributeDescription
http.response.status_codeHTTP status code of the response
I