LLM tracing provides detailed visibility into the inner workings of your LLM applications, capturing interactions and latencies across each step. This granular insight allows for pinpointing performance bottlenecks, debugging complex interactions, and optimizing costs associated with LLM usage. TrueFoundry makes it easy to trace your LLM applications with an open-source SDK and OTEL based tracing solution.
Before diving into how to start tracing your LLM applications, let’s understand the key concepts of tracing.
Trace
A trace would represent the complete lifecycle of a request or task as it flows through the various services and components that interact with the LLM. This could involve multiple stages, such as receiving input from the user, sending the input to the model for inference, processing the model’s output, and delivering the result back to the user.
Span
A span represents an individual unit of work or operation that occurs within the trace. It could be a function call, an HTTP request, a database query, or any other significant unit of work. In the context of an LLM application, spans are used to capture each distinct task or action that is performed during the lifecycle of a request.
For Example:
Imagine a user queries an LLM for a recommendation. The trace would look something like this:
Trace: Tracks the entire user request from input to response.
Truefoundry’s provides a OpenTelemetry collector backend that can store opentelemetry-based traces and comes with a user-friendly UI to query and analyze traces. It can receive traces from any opentelemetry compatible SDK - however, for LLM usecases we strongly recommend using the Traceloop SDK as it’s built to capture LLM specific traces and metrics and comes with a lot of inbuilt integrations. You can also use the standard opentelemetry-sdk
to instrument your application and send traces - however, that will require a lot of coding and instrumentation on your part.
The list of providers and frameworks supportedby Traceloop is available here. Here’s a quick summary of the list:
Model SDK | Python | Typescript |
---|---|---|
Azure OpenAI | ✅ | ✅ |
Aleph Alpha | ✅ | ❌ |
Anthropic | ✅ | ✅ |
Amazon Bedrock | ✅ | ✅ |
Amazon SageMaker | ✅ | ❌ |
Cohere | ✅ | ✅ |
IBM watsonx | ✅ | ⏳ |
Google Gemini | ✅ | ✅ |
Google VertexAI | ✅ | ✅ |
Groq | ✅ | ❌ |
Mistral AI | ✅ | ❌ |
Ollama | ✅ | ❌ |
OpenAI | ✅ | ✅ |
Replicate | ✅ | ❌ |
together.ai | ✅ | ❌ |
HuggingFace Transformers | ✅ | ❌ |
Vector DB | Python | Typescript |
---|---|---|
Chroma DB | ✅ | ✅ |
Elasticsearch | ✅ | ✅ |
LanceDB | ✅ | ❌ |
Marqo | ✅ | ❌ |
Milvus | ✅ | ❌ |
pgvector | ✅ | ✅ |
Pinecone | ✅ | ✅ |
Qdrant | ✅ | ✅ |
Weaviate | ✅ | ❌ |
Framework | Python | Typescript |
---|---|---|
Burr | ✅ | ❌ |
CrewAI | ✅ | ❌ |
Haystack by deepset | ✅ | ❌ |
Langchain | ✅ | ✅ |
LlamaIndex | ✅ | ✅ |
LLM tracing provides detailed visibility into the inner workings of your LLM applications, capturing interactions and latencies across each step. This granular insight allows for pinpointing performance bottlenecks, debugging complex interactions, and optimizing costs associated with LLM usage. TrueFoundry makes it easy to trace your LLM applications with an open-source SDK and OTEL based tracing solution.
Before diving into how to start tracing your LLM applications, let’s understand the key concepts of tracing.
Trace
A trace would represent the complete lifecycle of a request or task as it flows through the various services and components that interact with the LLM. This could involve multiple stages, such as receiving input from the user, sending the input to the model for inference, processing the model’s output, and delivering the result back to the user.
Span
A span represents an individual unit of work or operation that occurs within the trace. It could be a function call, an HTTP request, a database query, or any other significant unit of work. In the context of an LLM application, spans are used to capture each distinct task or action that is performed during the lifecycle of a request.
For Example:
Imagine a user queries an LLM for a recommendation. The trace would look something like this:
Trace: Tracks the entire user request from input to response.
Truefoundry’s provides a OpenTelemetry collector backend that can store opentelemetry-based traces and comes with a user-friendly UI to query and analyze traces. It can receive traces from any opentelemetry compatible SDK - however, for LLM usecases we strongly recommend using the Traceloop SDK as it’s built to capture LLM specific traces and metrics and comes with a lot of inbuilt integrations. You can also use the standard opentelemetry-sdk
to instrument your application and send traces - however, that will require a lot of coding and instrumentation on your part.
The list of providers and frameworks supportedby Traceloop is available here. Here’s a quick summary of the list:
Model SDK | Python | Typescript |
---|---|---|
Azure OpenAI | ✅ | ✅ |
Aleph Alpha | ✅ | ❌ |
Anthropic | ✅ | ✅ |
Amazon Bedrock | ✅ | ✅ |
Amazon SageMaker | ✅ | ❌ |
Cohere | ✅ | ✅ |
IBM watsonx | ✅ | ⏳ |
Google Gemini | ✅ | ✅ |
Google VertexAI | ✅ | ✅ |
Groq | ✅ | ❌ |
Mistral AI | ✅ | ❌ |
Ollama | ✅ | ❌ |
OpenAI | ✅ | ✅ |
Replicate | ✅ | ❌ |
together.ai | ✅ | ❌ |
HuggingFace Transformers | ✅ | ❌ |
Vector DB | Python | Typescript |
---|---|---|
Chroma DB | ✅ | ✅ |
Elasticsearch | ✅ | ✅ |
LanceDB | ✅ | ❌ |
Marqo | ✅ | ❌ |
Milvus | ✅ | ❌ |
pgvector | ✅ | ✅ |
Pinecone | ✅ | ✅ |
Qdrant | ✅ | ✅ |
Weaviate | ✅ | ❌ |
Framework | Python | Typescript |
---|---|---|
Burr | ✅ | ❌ |
CrewAI | ✅ | ❌ |
Haystack by deepset | ✅ | ❌ |
Langchain | ✅ | ✅ |
LlamaIndex | ✅ | ✅ |