Learn how to use TrueFoundry’s unified Chat Completions API to interact with models from multiple providers through a consistent interface
TrueFoundry AI Gateway provides a universal API for all supported models via the standard OpenAI /chat/completions endpoint. This unified interface allows you to seamlessly work with models from different providers through a consistent API.
You can use the standard OpenAI client to send requests to the gateway:
Copy
Ask AI
from openai import OpenAIclient = OpenAI( api_key="your_truefoundry_api_key", base_url="<truefoundry-base-url>/api/llm/api/inference/openai" # e.g. https://my-company.truefoundry.cloud/api/llm/api/inference/openai)response = client.chat.completions.create( model="openai-main/gpt-4o-mini", # this is the truefoundry model id messages=[{"role": "user", "content": "Hello, how are you?"}])print(response.choices[0].message.content)
System prompts set the behavior and context for the model by defining the assistant’s role, tone, and constraints:
Copy
Ask AI
response = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=[ {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."}, {"role": "user", "content": "How do I write a function to calculate factorial?"} ])
The API supports various media types including images, audio, video and pdf.
Images
Supported Models:GPT-4o, GPT-4 Vision, Claude 3, Gemini Pro VisionSend images as part of your chat completion requests using either URLs or base64 encoding:
import base64def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8')response = client.chat.completions.create( model="openai-main/gpt-4o", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{encode_image('image.jpeg')}" } } ] } ])
Audio
Supported Models: Google Gemini models (Gemini 2.0 Flash, etc.)Send audio files in supported formats (MP3, WAV, etc.). Currently supported for Google Gemini models:
import base64def encode_video(video_path): with open(video_path, "rb") as video_file: return base64.b64encode(video_file.read()).decode('utf-8')response = client.chat.completions.create( model="internal-google/gemini-2-0-flash", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Describe what's happening in this video"}, { "type": "image_url", "image_url": { "url": f"data:video/mp4;base64,{encode_video('video.mp4')}", "mime_type": "video/mp4" # required for gemini models } } ] } ])
PDF Documents
Supported Models: Google Gemini models (Gemini 2.5 Flash, etc.)PDF document processing allows models to analyze and extract information from PDF files:
TrueFoundry supports vision models from all integrated providers as they become available. These models can analyze and interpret images alongside text, enabling multimodal AI applications.
Function calling allows models to invoke defined functions during conversations, enabling them to perform specific actions or retrieve external information.
from openai import OpenAIimport jsonclient = OpenAI( api_key="your_truefoundry_api_key", base_url="<truefoundry-base-url>/api/llm/api/inference/openai")# Define a functiontools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } }}]# Make the requestresponse = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=[{"role": "user", "content": "What's the weather in New York?"}], tools=tools)# Check if the model wants to call a functionif response.choices[0].message.tool_calls: tool_call = response.choices[0].message.tool_calls[0] function_name = tool_call.function.name function_args = json.loads(tool_call.function.arguments) print(f"Function called: {function_name}") print(f"Arguments: {function_args}")
Process function calls and continue the conversation:
Copy
Ask AI
# Initial requestmessages = [{"role": "user", "content": "What's the weather in Tokyo?"}]response = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=messages, tools=tools)# Handle function callif response.choices[0].message.tool_calls: messages.append(response.choices[0].message) for tool_call in response.choices[0].message.tool_calls: function_name = tool_call.function.name function_args = json.loads(tool_call.function.arguments) # Execute your function (simulated here) if function_name == "get_weather": result = f"The weather in {function_args['location']} is 22°C and sunny" # Add the function result to the conversation messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result }) # Continue the conversation final_response = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=messages ) print(final_response.choices[0].message.content)
Controlling When and How Functions Are Called
Control when and how functions are called:
Copy
Ask AI
# Force a specific function callresponse = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=[{"role": "user", "content": "What's the weather?"}], tools=tools, tool_choice={"type": "function", "function": {"name": "get_weather"}})# Allow automatic function calling (default)response = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=[{"role": "user", "content": "What's the weather?"}], tools=tools, tool_choice="auto")# Prevent function callingresponse = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=[{"role": "user", "content": "What's the weather?"}], tools=tools, tool_choice="none")# Force any function callresponse = client.chat.completions.create( model="openai-main/gpt-4o-mini", messages=[{"role": "user", "content": "What's the weather?"}], tools=tools, tool_choice="required")
The chat completions API supports structured response formats, enabling you to receive consistent, predictable outputs in JSON format. This is useful for parsing responses programmatically.
Basic JSON Mode: Getting Valid JSON Without Structure Constraints
JSON mode ensures the model’s output is valid JSON without enforcing a specific structure:
Copy
Ask AI
from openai import OpenAIclient = OpenAI( api_key="your_truefoundry_api_key", base_url="<truefoundry-base-url>/api/llm/api/inference/openai")response = client.chat.completions.create( model="openai-main/gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant designed to output JSON."}, {"role": "user", "content": "Extract information about the 2020 World Series winner"} ], response_format={"type": "json_object"})print(response.choices[0].message.content)
Output:
Copy
Ask AI
{ "team": "Los Angeles Dodgers", "year": 2020, "opponent": "Tampa Bay Rays", "games_played": 6, "series_result": "4-2"}
JSON Schema Mode: Enforcing Specific Data Structures
JSON Schema mode provides strict structure validation using predefined schemas:
Copy
Ask AI
from openai import OpenAIimport jsonclient = OpenAI( api_key="your_truefoundry_api_key", base_url="<truefoundry-base-url>/api/llm/api/inference/openai")# Define JSON schemauser_info_schema = { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer", "minimum": 0}, "occupation": {"type": "string"}, "location": {"type": "string"}, "skills": { "type": "array", "items": {"type": "string"} } }, "required": ["name", "age", "occupation", "location", "skills"], "additionalProperties": False}response = client.chat.completions.create( model="openai-main/gpt-4o", messages=[ { "role": "system", "content": "Extract user information and respond according to the provided JSON schema." }, { "role": "user", "content": "My name is Sarah Johnson, I'm 28 years old, and I work as a data scientist in New York. I'm skilled in Python, SQL, and machine learning." } ], response_format={ "type": "json_schema", "json_schema": { "name": "user_info", "schema": user_info_schema, "strict": True } })# Parse responseresult = json.loads(response.choices[0].message.content)
When using JSON schema with strict mode set to true, all properties defined in the schema must be included in the required array. If any property is defined but not marked as required, the API will return a 400 Bad Request Error.
Pydantic provides automatic validation, serialization, and type hints for structured data:
Copy
Ask AI
from openai import OpenAIfrom pydantic import BaseModel, Fieldfrom typing import Listclient = OpenAI( api_key="your_truefoundry_api_key", base_url="<truefoundry-base-url>/api/llm/api/inference/openai")# Define Pydantic modelclass UserInfo(BaseModel): name: str = Field(description="Full name of the user") age: int = Field(ge=0, description="Age in years") occupation: str = Field(description="Job title or profession") location: str = Field(description="City or location") skills: List[str] = Field(description="List of professional skills") class Config: extra = "forbid" # Prevent additional fieldsresponse = client.chat.completions.create( model="openai-main/gpt-4o", messages=[ { "role": "system", "content": "Extract user information and respond according to the provided schema." }, { "role": "user", "content": "Hi, I'm Mike Chen, a 32-year-old software architect from Seattle. I specialize in cloud computing, microservices, and Kubernetes." } ], response_format={ "type": "json_schema", "json_schema": { "name": "user_info", "schema": UserInfo.model_json_schema(), "strict": True } })# Parse and validate with Pydanticuser_data = UserInfo.model_validate_json(response.choices[0].message.content)
When using OpenAI models with Pydantic Models, there should not be any optional fields in the pydantic model when strict mode is true. This is because the corresponding JSON schema will have missing fields in the “required” section.
Streamlined Pydantic Integration with OpenAI's Beta Parse API
The beta parse client provides the most streamlined approach for Pydantic integration:
Copy
Ask AI
from openai import OpenAIfrom pydantic import BaseModel, Fieldfrom typing import List, Optionalclass UserInfo(BaseModel): name: str = Field(description="Full name of the user") age: int = Field(ge=0, description="Age in years") occupation: str = Field(description="Job title or profession") location: Optional[str] = Field(None, description="City or location") skills: List[str] = Field(default=[], description="List of professional skills")client = OpenAI( api_key="your_truefoundry_api_key", base_url="<truefoundry-base-url>/api/llm/api/inference/openai")completion = client.beta.chat.completions.parse( model="openai-main/gpt-4o", messages=[ { "role": "system", "content": "Extract user information from the provided text." }, { "role": "user", "content": "Hello, I'm Alex Rodriguez, a 29-year-old product manager from Austin. I have experience in agile methodologies, data analysis, and team leadership." } ], response_format=UserInfo,)user_result = completion.choices[0].message.parsed
This approach allows for optional fields in your Pydantic model and provides a cleaner API for structured responses.
Prompt caching optimizes API usage by allowing resumption from specific prefixes in your prompts. This significantly reduces processing time and costs for repetitive tasks or prompts with consistent elements.
Prompt caching is supported by multiple providers, each with their own implementation.
Supported models: All recent models, gpt-4o and newer.Prompt caching is enabled for all recent models. You can use the prompt_cache_key parameter to improve cache hit rates when requests share common prefixes.
Supported models: Claude Opus 4.1, Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, Claude Sonnet 3.5 (deprecated), Claude Haiku 3.5, Claude Haiku 3, Claude Opus 3 (deprecated)For Anthropic models, you must explicitly add the cache_control parameter to any message content you want to cache:
TrueFoundry AI Gateway provides access to model reasoning processes through thinking/reasoning tokens, available for models from multiple providers including Anthropic,OpenAI,Azure OpenAI,Groq and Vertex.These models expose their internal reasoning process, allowing you to see how they arrive at conclusions. The thinking/reasoning tokens provide step-by-step insights into the model’s cognitive process.
Supported models: Claude Opus 4.1 (claude-opus-4-1-20250805), Claude Opus 4 (claude-opus-4-20250514), Claude Sonnet 4 (claude-sonnet-4-20250514), Claude Sonnet 3.7 (claude-3-7-sonnet-20250219)
via Anthropic, AWS Bedrock, and Google Vertex AI
For Anthropic models (from Anthropic, Google Vertex AI, AWS Bedrock), TrueFoundry automatically translates the reasoning_effort parameter into Anthropic’s native thinking parameter format since Anthropic doesn’t support the reasoning_effort parameter directly.The translation uses the max_tokens parameter with the following ratios:
For Gemini models (from Anthropic, Google Vertex AI, AWS Bedrock), TrueFoundry automatically translates the reasoning_effort parameter into Gemini’s native thinking parameter format since Gemini doesn’t support the reasoning_effort parameter directly.The translation uses the max_tokens parameter with the following ratios:
low: 30% of max_tokens
medium: 60% of max_tokens
high: 90% of max_tokens
Note: Gemini 2.5 Pro and 2.5 Flash comes with reasoning on by default.
When reasoning tokens are enabled, the response includes both thinking and content sections:
Copy
Ask AI
{ "id": "1742890579083", "object": "chat.completion", "created": 1742890579, "model": "", "provider": "aws", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "To compute 3^3^3, you need to understand that exponentiation is evaluated from right to left (right-associative). So:\n\n3^3^3 = 3^(3^3), not (3^3)^3\n\nHere's how to calculate it step by step:\n\n**Step 1:** Calculate the rightmost exponent first\n- 3^3 = 27\n\n**Step 2:** Now calculate 3^27\n- 3^27 = 7,625,597,484,987\n\nTherefore: **3^3^3 = 7,625,597,484,987**\n\n---\n\n**Note:** If you meant (3^3)^3 instead, that would be:\n- (3^3)^3 = 27^3 = 19,683\n\nThis is much smaller than 3^3^3 because:\n- (3^3)^3 = 3^(3×3) = 3^9\n- While 3^3^3 = 3^27\n\nThe difference between 3^9 and 3^27 is enormous!", "reasoning_content": "The user is asking how to compute 3^3^3. This is a question about exponentiation and specifically about the order of operations when dealing with repeated exponentiation (also known as tetration or power towers).\n\nThe key thing to understand here is that exponentiation is right-associative, meaning 3^3^3 = 3^(3^3), not (3^3)^3.\n\nSo we need to:\n1. First compute 3^3 = 27\n2. Then compute 3^27\n\nLet me work through this:\n- 3^3 = 27\n- 3^27 = 3^27\n\nNow 3^27 is a large number. Let me think about how to compute it:\n3^27 = 3^(3×9) = (3^3)^9 = 27^9\n\nOr we could compute it directly:\n3^1 = 3\n3^2 = 9\n3^3 = 27\n3^4 = 81\n3^5 = 243\n3^6 = 729\n3^7 = 2,187\n3^8 = 6,561\n3^9 = 19,683\n3^10 = 59,049\n...\n\nActually, let me just state that 3^27 = 7,625,597,484,987\n\nSo 3^3^3 = 3^(3^3) = 3^27 = 7,625,597,484,987" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 45, "completion_tokens": 180, "total_tokens": 225 }}