This provides universal api for all the supported models via standard openai /chat/completions
endpoint.
Universal OpenAI Compatible API
TrueFoundry AI Gateway allows you to use any chat based llm model via standard openai /chat/completions
endpoint. You can use standard openai client to send requests to the gateway. Here is a sample code snippet for the same:
from openai import OpenAI
client = OpenAI(
api_key="your_truefoundry_api_key",
base_url="<truefoundry-base-url>/api/llm/api/inference/openai" # e.g. https://my-company.truefoundry.cloud/api/llm/api/inference/openai
)
response = client.chat.completions.create(
model="openai-main/gpt-4o-mini", # this is the truefoundry model id
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
You will need to configure the following:
- base_url: This is the base url of the truefoundry dashbaord.
- api_key: This is the api key which can be generated from Personal Access Tokens
- model: This is the truefoundry model id. This is of the format
provider_account/model_name
. You can find this on truefoundry llm playground ui.
Sending a system prompt
You can include a system prompt to set the behavior and context for the model. Here’s how to do it:
response = client.chat.completions.create(
model="openai-main/gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
{"role": "user", "content": "How do I write a function to calculate factorial?"}
]
)
print(response.choices[0].message.content)
The system message helps guide the model’s responses and can be used to set specific instructions, tone, or expertise areas.
Multimodal Inputs
TrueFoundry AI Gateway supports various types of multimodal inputs, allowing you to work with different data formats.
Images
You can send images as part of your chat completion requests. You can either send a url or a base64 encoded image.
Send an image url to the model:
response = client.chat.completions.create(
model="openai-main/gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIyLTA1L25zODIzMC1pbWFnZS5qcGc.jpg"
}
},
],
}
],
)
print(response.choices[0].message.content)
Sending a base64 encoded image to the model:
import base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
response = client.chat.completions.create(
model="openai-main/gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{encode_image('dogs.jpeg')}"
}
}
]
}
]
)
print(response.choices[0].message.content)
Audio
For audio inputs, you can send audio files in supported formats (MP3, WAV, etc.). Please make sure that the model supports audio inputs, otherwise the request will fail.
Audio inputs in chat completions are currently supported for Google Gemini models.
Using audio input url:
response = client.chat.completions.create(
model="internal-google/gemini-2-0-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe this audio"},
{
"type": "image_url",
"image_url": {
"url": "https://raw.githubusercontent.com/prof3ssorSt3v3/media-sample-files/refs/heads/master/hal-9000.wav",
"mime_type": "audio/mp3" # this field is only required for gemini models
}
}
]
}
]
)
print(response.choices[0].message.content)
Using local audio file as base64 encoded:
import base64
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode('utf-8')
response = client.chat.completions.create(
model="internal-google/gemini-2-0-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe this audio"},
{
"type": "image_url",
"image_url": {
"url": f"data:audio/wav;base64,{encode_audio('/path-to-audio-file.wav')}"
}
}
]
}
]
)
print(response.choices[0].message.content)
Video
Video processing is natively supported for Google Gemini models. But it can be used for other models with the help of sending frames as images.
Here is the code snippet to send a video to the model:
Send a video url to the model:
response = client.chat.completions.create(
model="internal-google/gemini-2-0-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what's happening in this video"},
{
"type": "image_url",
"image_url": {
"url": "https://www.youtube.com/watch?v=fxqE27gIZcc",
"mime_type": "video/mp4" # this field is only required for gemini models
}
}
]
}
]
)
print(response.choices[0].message.content)
Send base64 encoded video to the model (please make sure the size of the video is within limits of the provider):
import base64
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode('utf-8')
response = client.chat.completions.create(
model="internal-google/gemini-2-0-flash",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what's happening in this video"},
{
"type": "image_url",
"image_url": {
"image_url": f"data:image/jpeg;base64,{encode_video('path/to/video.mp4')}",
"mime_type": "video/mp4" # this field is only required for gemini models
}
}
]
}
]
)
print(response.choices[0].message.content)
Parameters Supported
The chat completions API supports all openai compatible parameters.:
response = client.chat.completions.create(
model="openai-main/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello, how are you?"}],
temperature=0.7, # Controls randomness (0.0 to 1.0)
max_tokens=100, # Maximum number of tokens to generate
top_p=0.9, # Nucleus sampling parameter
frequency_penalty=0.0, # Reduces repetition
presence_penalty=0.0, # Encourages new topics
stop=["\n", "Human:"] # Stop sequences
)
print(response.choices[0].message.content)
Function and Tool Calling
You can define functions that the model can call during the conversation. Here’s how to implement function calling:
response = client.chat.completions.create(
model="openai-main/gpt-4o-mini",
messages=[{"role": "user", "content": "What's the weather in New York?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}]
)
print(response.choices[0].message.tool_calls)
The model can then call these functions when appropriate, and you can handle the function calls in your application logic. This enables the model to perform specific actions or retrieve information from external sources.