Multimodal Input

TrueFoundry AI Gateway supports various types of multimodal inputs, allowing you to work with different data formats including images, audio, and video.

Images

You can send images as part of your chat completion requests. You can either send a URL or a base64 encoded image.

Using Image URLs

Send an image URL to the model:

from openai import OpenAI

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="<truefoundry-base-url>/api/llm/api/inference/openai"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://images.rawpixel.com/image_800/cHJpdmF0ZS9sci9pbWFnZXMvd2Vic2l0ZS8yMDIyLTA1L25zODIzMC1pbWFnZS5qcGc.jpg"
                    }
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Using Base64 Encoded Images

Send a base64 encoded image to the model:

import base64
from openai import OpenAI

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

client = OpenAI(
    api_key="your_truefoundry_api_key",
    base_url="<truefoundry-base-url>/api/llm/api/inference/openai"
)

response = client.chat.completions.create(
    model="openai-main/gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encode_image('dogs.jpeg')}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Audio

For audio inputs, you can send audio files in supported formats (MP3, WAV, etc.). Please make sure that the model supports audio inputs, otherwise the request will fail. Audio inputs in chat completions are currently supported for Google Gemini models.

Using Audio URLs

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Transcribe this audio"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://raw.githubusercontent.com/prof3ssorSt3v3/media-sample-files/refs/heads/master/hal-9000.wav",
                        "mime_type": "audio/mp3" # this field is only required for gemini models
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Using Base64 Encoded Audio

import base64

def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Transcribe this audio"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:audio/wav;base64,{encode_audio('/path-to-audio-file.wav')}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Video

Video processing is natively supported for Google Gemini models. But it can be used for other models with the help of sending frames as images.

Using Video URLs

Send a video URL to the model:

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe what's happening in this video"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://www.youtube.com/watch?v=fxqE27gIZcc",
                        "mime_type": "video/mp4" # this field is only required for gemini models
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Using Base64 Encoded Video

Send base64 encoded video to the model (please make sure the size of the video is within limits of the provider):

import base64

def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode('utf-8')

response = client.chat.completions.create(
    model="internal-google/gemini-2-0-flash",
    messages=[
        {
            "role": "user", 
            "content": [
                {"type": "text", "text": "Describe what's happening in this video"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encode_video('path/to/video.mp4')}",
                        "mime_type": "video/mp4" # this field is only required for gemini models
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Supported Models

Images: Most vision-capable models including GPT-4o, GPT-4 Vision, Claude 3, Gemini Pro Vision
Audio: Google Gemini models (Gemini 2.0 Flash, etc.)
Video: Google Gemini models (Gemini 2.0 Flash, etc.)

Make sure to check model capabilities before sending multimodal inputs to avoid errors.

Get Started

Developer Guide

MCP Registry and Gateway

Configure Gateway

Integrations

Observability

Deployment

API Reference

Chat

Agent Responses

Embeddings

Rerank

Responses

Audio

Batch

Files

Moderations

Images

Using Image URLs

Using Base64 Encoded Images

Audio

Using Audio URLs

Using Base64 Encoded Audio

Video

Using Video URLs

Using Base64 Encoded Video

Supported Models

Get Started

Developer Guide

MCP Registry and Gateway

Configure Gateway

Integrations

Observability

Deployment

API Reference

Chat

Agent Responses

Embeddings

Rerank

Responses

Audio

Batch

Files

Moderations

​Images

​Using Image URLs

​Using Base64 Encoded Images

​Audio

​Using Audio URLs

​Using Base64 Encoded Audio

​Video

​Using Video URLs

​Using Base64 Encoded Video

​Supported Models

Images

Using Image URLs

Using Base64 Encoded Images

Audio

Using Audio URLs

Using Base64 Encoded Audio

Video

Using Video URLs

Using Base64 Encoded Video

Supported Models