Quickstart Guide

TrueFoundry is compatible with the OpenAI signature, so you can connect to TrueFoundry’s unified LLM gateway through the ChatOpenAI interface.

Installation & Setup

  1. Sign up for a TrueFoundry account
  2. Follow the steps here in Quick start and generate a Personal Access Token (PAT)

Get Base URL and Model Name from Unified Code Snippet

  • Set the base_url to your TrueFoundry endpoint
  • Set the api_key to your TRUEFOUNDRY_PAT
  • Use TrueFoundry model names in the format provider-main/model-name

Installation

pip install langchain-openai

Basic Setup

Connect to TrueFoundry by updating the ChatOpenAI model in LangChain:
from langchain_openai import ChatOpenAI

TRUEFOUNDRY_PAT = "..."  # Your TrueFoundry Personal Access Token
TRUEFOUNDRY_BASE_URL = "..."  # Your TrueFoundry unified endpoint

llm = ChatOpenAI(
    api_key=TRUEFOUNDRY_PAT,
    base_url=TRUEFOUNDRY_BASE_URL,
    model="openai-main/gpt-4o" #similarly you can call any model from any model provider like anthropic, gemini
)

llm.invoke("What is the meaning of life, universe and everything?")
The request is routed through your TrueFoundry gateway to the specified model provider. TrueFoundry automatically handles authentication, load balancing, and logging.

Advanced Example with LangGraph

from langgraph.graph import StateGraph, MessagesState
from langchain_core.messages import HumanMessage

# Build workflow
workflow = StateGraph(MessagesState)
workflow.add_node("agent", call_model)
workflow.set_entry_point("agent")
workflow.set_finish_point("agent")

app = workflow.compile()

# Run agent through TrueFoundry
result = app.invoke({"messages": [HumanMessage(content="Hello!")]})

Observability and Governance

Monitor your LangChain applications through TrueFoundry’s metrics tab: TrueFoundry metrics With TrueFoundry’s AI gateway, you can monitor and analyze:
  • Performance Metrics: Track key latency metrics like Request Latency, Time to First Token (TTFS), and Inter-Token Latency (ITL) with P99, P90, and P50 percentiles
  • Cost and Token Usage: Gain visibility into your application’s costs with detailed breakdowns of input/output tokens and the associated expenses for each model
  • Usage Patterns: Understand how your application is being used with detailed analytics on user activity, model distribution, and team-based usage
  • Rate limit and Load balancing: You can set up rate limiting, load balancing and fallback for your models