TrueFoundry AI Gateway provides a universal API for all supported models via the standard OpenAI /chat/completions endpoint. This unified interface allows you to seamlessly work with models from different providers through a consistent API.

Contents

SectionDescription
Getting StartedBasic setup and configuration
Input ControlsSystem prompts and request parameters
Working with MediaImages, audio, and video support
Function CallingEnabling models to invoke functions
Response FormatStructured JSON outputs
Prompt CachingOptimize API usage with caching
Reasoning ModelsAccess model reasoning processes

Getting Started

You can use the standard OpenAI client to send requests to the gateway:
from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="<truefoundry-base-url>/api/llm/api/inference/openai" # e.g. https://my-company.truefoundry.cloud/api/llm/api/inference/openai
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini", # this is the truefoundry model id
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

Configuration

You will need to configure the following:
  1. base_url: The base URL of the TrueFoundry dashboard
  2. api_key: API key generated from Personal Access Tokens
  3. model: TrueFoundry model ID in the format provider_account/model_name (available in the LLM playground UI)

Input Controls

System Prompts

System prompts set the behavior and context for the model by defining the assistant’s role, tone, and constraints:
response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
        {"role": "user", "content": "How do I write a function to calculate factorial?"}
    ]
)

Request Parameters

Fine-tune model behavior with these common parameters:
response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.7,       # Controls randomness (0.0 to 1.0)
    max_tokens=100,        # Maximum tokens to generate
    verbosity="high",      # Constrains verbosity: low, medium, high
    top_p=0.9,             # Nucleus sampling parameter
    frequency_penalty=0.0, # Reduces repetition
    presence_penalty=0.0,  # Encourages new topics
    stop=["\n", "Human:"]  # Stop sequences
)
Some models don’t support all parameters. For example, temperature is not supported by o series models like o3-mini.

Working with Media

The API supports various media types including images, audio, video and pdf.

Vision

TrueFoundry supports vision models from all integrated providers as they become available. These models can analyze and interpret images alongside text, enabling multimodal AI applications.
ProviderModels
OpenAIgpt-4-vision-preview, gpt-4o, gpt-4o-mini
Anthropicclaude-3-sonnet, claude-3-haiku, claude-3-opus, claude-3.5-sonnet, claude-3.5-haiku, claude-4-oppus, claude-4-sonnet, claude-3-7-sonnet
Geminigemini-1.0-pro-vision, gemini-1.5-flash, gemini-1.5-flash-8b, gemini-1.5-pro, gemini-2.5-pro, gemini-2.5-flash
AWS Bedrockanthropic.claude-3-5-sonnet, anthropic.claude-3-5-haiku, anthropic.claude-3-5-sonnet-20240620-v1:0
Azure OpenAIgpt-4-vision-preview, gpt-4o, gpt-4o-mini

Using Vision Models with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="http://{controlPlaneURL}/api/llm"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message)

Function Calling

Function calling allows models to invoke defined functions during conversations, enabling them to perform specific actions or retrieve external information.

Basic Usage

Define functions that the model can call:
from openai import OpenAI
import json

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="<truefoundry-base-url>/api/llm/api/inference/openai"
)

# Define a function
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location"]
        }
    }
}]

# Make the request
response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather in New York?"}],
    tools=tools
)

# Check if the model wants to call a function
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)

    print(f"Function called: {function_name}")
    print(f"Arguments: {function_args}")

Function Definition Reference

Implementation Workflows

Response Format

The chat completions API supports structured response formats, enabling you to receive consistent, predictable outputs in JSON format. This is useful for parsing responses programmatically.

JSON Response Options

Advanced Schema Integration

Prompt Caching

Prompt caching optimizes API usage by allowing resumption from specific prefixes in your prompts. This significantly reduces processing time and costs for repetitive tasks or prompts with consistent elements.
Prompt caching is supported by multiple providers, each with their own implementation.

Supported Providers

ProviderImplementationDocumentation
OpenAIAutomatic prompt caching (KV cache)OpenAI Prompt Caching
AnthropicRequires explicit cache_control parameterAnthropic Prompt Caching
Azure OpenAIAutomatic (inherited from OpenAI)Azure OpenAI Prompt Caching
GroqAutomatic (similar to OpenAI)Groq Prompt Caching

Supported Models

Reasoning Models

TrueFoundry AI Gateway provides access to model reasoning processes through thinking/reasoning tokens, available for models from multiple providers including Anthropic,OpenAI,Azure OpenAI,Groq and Vertex. These models expose their internal reasoning process, allowing you to see how they arrive at conclusions. The thinking/reasoning tokens provide step-by-step insights into the model’s cognitive process.

Supported Reasoning Models

Response Format

When reasoning tokens are enabled, the response includes both thinking and content sections:
{
  "id": "1742890579083",
  "object": "chat.completion",
  "created": 1742890579,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "To compute 3^3^3, you need to understand that exponentiation is evaluated from right to left (right-associative). So:\n\n3^3^3 = 3^(3^3), not (3^3)^3\n\nHere's how to calculate it step by step:\n\n**Step 1:** Calculate the rightmost exponent first\n- 3^3 = 27\n\n**Step 2:** Now calculate 3^27\n- 3^27 = 7,625,597,484,987\n\nTherefore: **3^3^3 = 7,625,597,484,987**\n\n---\n\n**Note:** If you meant (3^3)^3 instead, that would be:\n- (3^3)^3 = 27^3 = 19,683\n\nThis is much smaller than 3^3^3 because:\n- (3^3)^3 = 3^(3×3) = 3^9\n- While 3^3^3 = 3^27\n\nThe difference between 3^9 and 3^27 is enormous!",
        "reasoning_content": "The user is asking how to compute 3^3^3. This is a question about exponentiation and specifically about the order of operations when dealing with repeated exponentiation (also known as tetration or power towers).\n\nThe key thing to understand here is that exponentiation is right-associative, meaning 3^3^3 = 3^(3^3), not (3^3)^3.\n\nSo we need to:\n1. First compute 3^3 = 27\n2. Then compute 3^27\n\nLet me work through this:\n- 3^3 = 27\n- 3^27 = 3^27\n\nNow 3^27 is a large number. Let me think about how to compute it:\n3^27 = 3^(3×9) = (3^3)^9 = 27^9\n\nOr we could compute it directly:\n3^1 = 3\n3^2 = 9\n3^3 = 27\n3^4 = 81\n3^5 = 243\n3^6 = 729\n3^7 = 2,187\n3^8 = 6,561\n3^9 = 19,683\n3^10 = 59,049\n...\n\nActually, let me just state that 3^27 = 7,625,597,484,987\n\nSo 3^3^3 = 3^(3^3) = 3^27 = 7,625,597,484,987"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 180,
    "total_tokens": 225
  }
}

Streaming with Reasoning Tokens

For streaming responses, the thinking section is always sent before the content section.

API Reference

For detailed API specifications, parameters, and response schemas, see the Chat Completions API Reference.