Authentication

To authenticate with the AI Gateway, provide your TrueFoundry API key as a bearer token in the Authorization header:
Authorization: Bearer your-api-key
You can use either a Personal Access Token (PAT) or a Virtual Account Token (VAT) as your API key. For detailed information on creating and managing these tokens, refer to our Access Control documentation.

Request Headers

NameDescription
AuthorizationYour TrueFoundry API key as bearer token
x-tfy-metadataStringified JSON where both keys and values must be strings. Used for request routing and metrics filtering
x-tfy-provider-nameRequired for responses API, file upload API, and batch APIs to route requests to the correct provider account
x-tfy-strict-openaiBoolean flag to enable strict OpenAI compatibility (set to false for Claude reasoning model responses with thinking tokens)
x-tfy-retry-configJSON object to configure retry behavior for failed requests (see example below)
x-tfy-request-timeoutNumber in milliseconds specifying the maximum time to wait for a response

Example with Metadata

client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are an AI bot."},
        {"role": "user", "content": "Enter your prompt here"},
    ],
    model="tfy-ai-bedrock/us-anthropic-claude-sonnet-4-20250514-v1-0",
    stream=True,
    extra_headers={
        "X-TFY-METADATA": '{"tfy_log_request":"true", "custom_field":"value"}'
    }
)

Example with Retry Configuration

client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Generate a summary of this article"},
    ],
    model="tfy-ai-openai/gpt-4o",
    extra_headers={
        "X-TFY-RETRY-CONFIG": '{"attempts": 3, "onStatusCodes": [429, 500, 503], "useRetryAfterHeader": true}'
    }
)

Response Headers

NameDescription
x-tfy-resolved-modelThe final TrueFoundry model ID used to process the request (may differ from requested model due to load balancing or fallbacks)
x-tfy-applied-configurationsDictionary of applied configurations including load balancing, fallback, model config, applied guardrails, and rate limiting
server-timingFor non-streaming requests only. Contains timing information for different processing stages including middlewares, guardrails, and model calls

Example of Server-Timing Header

When inspecting network requests in your browser’s developer tools, you’ll see the server-timing header with timing information like this:
Browser network inspector showing server-timing header with detailed processing time breakdown for middleware, guardrails, and model calls

Server-timing header in browser developer tools