LlamaIndex is an open-source data framework for building applications with large language models (LLMs). TrueFoundry seamlessly integrates with LlamaIndex, allowing you to route all LLM requests through the TrueFoundry Gateway for enhanced security, load balancing, cost management, and more. This guide will walk you through connecting LlamaIndex with TrueFoundrry.
You should only use the OpenAILike and OpenAIEmbedding classes from LlamaIndex. These classes are designed for custom OpenAI-compatible endpoints, allowing you to use your TrueFoundry-specific model names directly. Using the standard OpenAI class will cause errors, as it validates model names against OpenAI’s public list.

Prerequisites

Before you begin, ensure you have the following:
  1. Authentication Token: A TrueFoundry API key. Follow the instructions in Generating Tokens to create one.
  2. Gateway Base URL: Your TrueFoundry Gateway URL, which looks like https://<control_plane_url>/api/llm/api/inference/openai.

Code Examples

The following examples demonstrate how to use LlamaIndex with TrueFoundry.

Chat Completion

This example shows how to perform chat completion.
from llama_index.core import Settings
from llama_index.llms.openai_like import OpenAILike

# Configure the LLM
llm = OpenAILike(
    model="tfy_model_name",
    api_key="tfy_api_key",
    api_base="https://<control_plane_url>/api/llm/api/inference/openai",
    is_chat_model=True
)

# Set the LLM for LlamaIndex to use globally
Settings.llm = llm

# Perform a chat completion
response = llm.complete("What is the capital of France?")
print(response.text)

Text Embedding

This example shows how to generate text embeddings.
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure the LLM
embedding_model = OpenAIEmbedding(
    model_name="tfy_model_name",
    api_key="tfy_api_key",
    api_base="https://<your-host>/api/llm/api/inference/openai",
)

# Set the embedding model for LlamaIndex to use globally
Settings.embed_model = embedding_model

# Generate a text embedding
response = embedding_model.get_text_embedding("This is a sample text.")
print(f"Embedding length: {len(response)}")

FAQs