Overview
LLM tracing provides detailed visibility into the inner workings of your LLM applications, capturing interactions and latencies across each step. This granular insight allows for pinpointing performance bottlenecks, debugging complex interactions, and optimizing costs associated with LLM usage. Truefoundry makes it easy to trace your LLM applications with an open-source SDK and OTEL based tracing solution.
Before diving into how to start tracing your LLM applications, let’s understand the key concepts of tracing.
Key Concepts
Trace
A trace would represent the complete lifecycle of a request or task as it flows through the various services and components that interact with the LLM. This could involve multiple stages, such as receiving input from the user, sending the input to the model for inference, processing the model’s output, and delivering the result back to the user.
Span
A span represents an individual unit of work or operation that occurs within the trace. It could be a function call, an HTTP request, a database query, or any other significant unit of work. In the context of an LLM application, spans are used to capture each distinct task or action that is performed during the lifecycle of a request.
For Example:
Imagine a user queries an LLM for a recommendation. The trace would look something like this:
-
Trace: Tracks the entire user request from input to response.
- Span 1: Input preprocessing (parsing and tokenizing the user query).
- Span 2: Model inference (running the LLM to generate a recommendation).
- Span 3: Post-processing (formatting the recommendation output).
- Span 4: Output delivery (returning the recommendation to the user).
TrueFoundry’s Tracing UI
How does Truefoundry’s Tracing work?
Truefoundry’s provides a OpenTelemetry collector backend that can store opentelemetry-based traces and comes with a user-friendly UI to query and analyze traces. It can receive traces from any opentelemetry compatible SDK - however, for LLM usecases we strongly recommend using the Traceloop SDK as it’s built to capture LLM specific traces and metrics and comes with a lot of inbuilt integrations. You can also use the standard opentelemetry-sdk
to instrument your application and send traces - however, that will require a lot of coding and instrumentation on your part.
The list of providers and frameworks supportedby Traceloop is available here. Here’s a quick summary of the list:
Models
Model SDK | Python | Typescript |
---|---|---|
Azure OpenAI | ✅ | ✅ |
Aleph Alpha | ✅ | ❌ |
Anthropic | ✅ | ✅ |
Amazon Bedrock | ✅ | ✅ |
Amazon SageMaker | ✅ | ❌ |
Cohere | ✅ | ✅ |
IBM watsonx | ✅ | ⏳ |
Google Gemini | ✅ | ✅ |
Google VertexAI | ✅ | ✅ |
Groq | ✅ | ❌ |
Mistral AI | ✅ | ❌ |
Ollama | ✅ | ❌ |
OpenAI | ✅ | ✅ |
Replicate | ✅ | ❌ |
together.ai | ✅ | ❌ |
HuggingFace Transformers | ✅ | ❌ |
Vector DBs
Vector DB | Python | Typescript |
---|---|---|
Chroma DB | ✅ | ✅ |
Elasticsearch | ✅ | ✅ |
LanceDB | ✅ | ❌ |
Marqo | ✅ | ❌ |
Milvus | ✅ | ❌ |
pgvector | ✅ | ✅ |
Pinecone | ✅ | ✅ |
Qdrant | ✅ | ✅ |
Weaviate | ✅ | ❌ |
Frameworks
Framework | Python | Typescript |
---|---|---|
Burr | ✅ | ❌ |
CrewAI | ✅ | ❌ |
Haystack by deepset | ✅ | ❌ |
Langchain | ✅ | ✅ |
LlamaIndex | ✅ | ✅ |