Batch Predictions with TrueFoundry AI Gateway
This guide explains how to perform batch predictions using TrueFoundry’s AI Gateway with different providers.
Prerequisites
- TrueFoundry API Key
- Provider account configured in TrueFoundry (OpenAI or Vertex AI)
- Python environment with
openai
library installed
Authentication
All API requests require authentication using your TrueFoundry API key and provider integration name. This is handled through the OpenAI client configuration:
Provider Specific Extra Headers
When making requests, you’ll need to specify provider-specific headers based on which LLM provider you’re using:
OpenAI Provider Headers
Vertex AI Provider Headers
Input File Format
The batch prediction system requires input files in JSONL (JSON Lines) format. Each line in the file must be a valid JSON object representing a single request. The file should not contain any empty lines or comments.
JSONL Format Requirements
- Each line must be a valid JSON object
- No empty lines between JSON objects
- No trailing commas
- No comments
- Each line must end with a newline character
Request Format
Example of a valid JSONL file (request.jsonl
):
When using Vertex AI, you can skip method, url and body.model fields since they are not used.
Workflow
The batch prediction process involves four main steps:
- Upload input file
- Create batch job
- Check batch status
- Fetch results
1. Upload Input File
Upload your JSONL file using the OpenAI client:
2. Create Batch Job
Create a batch job using the file ID from the upload step:
3. Check Batch Status
Monitor the batch job status:
The status can be one of:
validating
: Initial validation of the batchin_progress
: Processing the requestscompleted
: All requests processed successfullyfailed
: Batch processing failed
4. Fetch Results
Once the batch is completed, fetch the results:
Complete Example
Here’s a complete example that puts it all together:
Best Practices
-
Use meaningful
custom_id
values in your JSONL requests to track individual requests -
Implement proper error handling around API calls
-
Monitor batch status regularly with appropriate polling intervals
-
For Vertex AI:
- Ensure proper bucket permissions
- Use appropriate region settings
- Handle URL encoding for file IDs
-
For OpenAI:
- Follow OpenAI’s rate limits
- Use appropriate model parameters
-
Store API keys securely and never hardcode them in your application