This provides universal api for all the supported models via standard openai /chat/completions endpoint.

Universal OpenAI Compatible API

Truefoundry AI Gateway allows you to use any chat based llm model via standard openai /chat/completions endpoint. You can use standard openai client to send requests to the gateway. Here is a sample code snippet for the same:

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="<truefoundry-base-url>/api/llm/api/inference/openai" # e.g. https://my-company.truefoundry.cloud/api/llm/api/inference/openai
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini", # this is the truefoundry model id
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

You will need to configure the following:

  1. base_url: This is the base url of the truefoundry dashbaord.
  2. api_key: This is the api key which can be generated from Personal Access Tokens
  3. model: This is the truefoundry model id. This is of the format provider_account/model_name. You can find this on truefoundry llm playground ui.

Sending a system prompt

You can include a system prompt to set the behavior and context for the model. Here’s how to do it:

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that specializes in Python programming."},
        {"role": "user", "content": "How do I write a function to calculate factorial?"}
    ]
)

print(response.choices[0].message.content) 

The system message helps guide the model’s responses and can be used to set specific instructions, tone, or expertise areas.

Multimodal Inputs

Truefoundry AI Gateway supports various types of multimodal inputs, allowing you to work with different data formats.

Images

You can send images as part of your chat completion requests. You can either send a url or a base64 encoded image.

Send an image url to the model:

response = client.chat.completions.create(
    model="openai-main/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIyLTA1L25zODIzMC1pbWFnZS5qcGc.jpg"
                    }
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Sending a base64 encoded image to the model:

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="openai-main/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encode_image('dogs.jpeg')}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Audio

For audio inputs, you can send audio files in supported formats (MP3, WAV, etc.). Please make sure that the model supports audio inputs, otherwise the request will fail. Audio inputs in chat completions are currently supported for Google Gemini models.

Using audio input url:

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Transcribe this audio"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://raw.githubusercontent.com/prof3ssorSt3v3/media-sample-files/refs/heads/master/hal-9000.wav",
                        "mime_type": "audio/mp3" # this field is only required for gemini models
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Using local audio file as base64 encoded:

import base64

def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Transcribe this audio"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:audio/wav;base64,{encode_audio('/path-to-audio-file.wav')}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Video

Video processing is natively supported for Google Gemini models. But it can be used for other models with the help of sending frames as images.

Here is the code snippet to send a video to the model:

Send a video url to the model:

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what's happening in this video"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://www.youtube.com/watch?v=fxqE27gIZcc",
                        "mime_type": "video/mp4" # this field is only required for gemini models
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Send base64 encoded video to the model (please make sure the size of the video is within limits of the provider):

import base64

def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user", 
            "content": [
                {"type": "text", "text": "Describe what's happening in this video"},
                {
                    "type": "image_url",
                    "image_url": {
                        "image_url": f"data:image/jpeg;base64,{encode_video('path/to/video.mp4')}",
                        "mime_type": "video/mp4" # this field is only required for gemini models
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Parameters Supported

The chat completions API supports all openai compatible parameters.:

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    temperature=0.7,  # Controls randomness (0.0 to 1.0)
    max_tokens=100,   # Maximum number of tokens to generate
    top_p=0.9,        # Nucleus sampling parameter
    frequency_penalty=0.0,  # Reduces repetition
    presence_penalty=0.0,   # Encourages new topics
    stop=["\n", "Human:"]   # Stop sequences
)

print(response.choices[0].message.content)

Function and Tool Calling

You can define functions that the model can call during the conversation. Here’s how to implement function calling:

response = client.chat.completions.create(
    model="openai-main/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather in New York?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }]
)

print(response.choices[0].message.tool_calls)

The model can then call these functions when appropriate, and you can handle the function calls in your application logic. This enables the model to perform specific actions or retrieve information from external sources.