Batch Predictions with TrueFoundry LLM Gateway
This guide explains how to perform batch predictions using TrueFoundry's LLM Gateway with different providers.
Prerequisites
- TrueFoundry API Key
- Provider account configured in TrueFoundry (OpenAI or Vertex AI)
- Python environment with
openai
library installed
Authentication
All API requests require authentication using your TrueFoundry API key and provider integration name. This is handled through the OpenAI client configuration:
from openai import OpenAI
BASE_URL = "https://internal.devtest.truefoundry.tech/api/llm"
API_KEY = "your-truefoundry-api-key"
# Configure OpenAI client with TrueFoundry settings
client = OpenAI(
api_key=API_KEY,
base_url=BASE_URL,
)
Provider Specific Extra Headers
When making requests, you'll need to specify provider-specific headers based on which LLM provider you're using:
OpenAI Provider Headers
extra_headers = {
"x-tfy-provider-name": "openai-provider-name" # name of tfy provider integration
}
Vertex AI Provider Headers
extra_headers = {
"x-tfy-provider-name": "google-provider-name", # name of tfy provider integration
"x-tfy-vertex-storage-bucket-name": "your-bucket-name",
"x-tfy-vertex-region": "your-region", # e.g., "europe-west4"
"x-tfy-provider-model": "gemini-2-0-flash" # or any other supported model
}
Input File Format
The batch prediction system requires input files in JSONL (JSON Lines) format. Each line in the file must be a valid JSON object representing a single request. The file should not contain any empty lines or comments.
JSONL Format Requirements
- Each line must be a valid JSON object
- No empty lines between JSON objects
- No trailing commas
- No comments
- Each line must end with a newline character
Request Format
Example of a valid JSONL file (request.jsonl
):
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4-vision-preview", "messages": [{"role": "user", "content": [{"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}]}], "max_tokens": 1000}}
When using Vertex AI, you can skip method, url and body.model fields since they are not used.
Workflow
The batch prediction process involves four main steps:
- Upload input file
- Create batch job
- Check batch status
- Fetch results
1. Upload Input File
Upload your JSONL file using the OpenAI client:
# Upload the input file
file = client.files.create(
file=open("request.jsonl", "rb"),
purpose="batch",
extra_headers=extra_headers
)
# The response will contain the file ID needed for creating the batch job
print(file.id) # Example: file-PnFGrFLN5LjjcWr4eFsStK
2. Create Batch Job
Create a batch job using the file ID from the upload step:
batch_job = client.batches.create(
input_file_id=file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_headers=extra_headers
)
# The response includes a batch ID for tracking
print(batch_job.id) # Example: batch_67f7bfc50b288190893f242d9fa47c52
3. Check Batch Status
Monitor the batch job status:
batch_status = client.batches.retrieve(
batch_job.id,
extra_headers=extra_headers
)
print(batch_status.status) # Example: completed, validating, in_progress, etc.
The status can be one of:
validating
: Initial validation of the batchin_progress
: Processing the requestscompleted
: All requests processed successfullyfailed
: Batch processing failed
4. Fetch Results
Once the batch is completed, fetch the results:
if batch_status.status == "completed":
output_content = client.files.content(
batch_status.output_file_id,
extra_headers=extra_headers
)
print(output_content.content)
Complete Example
Here's a complete example that puts it all together:
from openai import OpenAI
# Initialize client
client = OpenAI(
api_key="your-api-key",
base_url="https://internal.devtest.truefoundry.tech/api/llm",
)
extra_headers = {"x-tfy-provider-name": "openai-main"}
# 1. Upload file
file = client.files.create(
file=open("request.jsonl", "rb"),
purpose="batch",
extra_headers=extra_headers
)
# 2. Create batch job
batch_job = client.batches.create(
input_file_id=file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
extra_headers=extra_headers
)
# 3. Check status
batch_status = client.batches.retrieve(
batch_job.id,
extra_headers=extra_headers
)
# 4. Fetch results when completed
if batch_status.status == "completed":
output_content = client.files.content(
batch_status.output_file_id,
extra_headers=extra_headers
)
print(output_content.content)
Best Practices
- Use meaningful
custom_id
values in your JSONL requests to track individual requests - Implement proper error handling around API calls
- Monitor batch status regularly with appropriate polling intervals
- For Vertex AI:
- Ensure proper bucket permissions
- Use appropriate region settings
- Handle URL encoding for file IDs
- For OpenAI:
- Follow OpenAI's rate limits
- Use appropriate model parameters
- Store API keys securely and never hardcode them in your application
Updated 4 days ago