Truefoundry Docs

On this page

Authentication
Request Headers
Example with Metadata
Example with Retry Configuration
Response Headers
Example of Server-Timing Header

Authentication

To authenticate with the AI Gateway, provide your TrueFoundry API key as a bearer token in the Authorization header:

Authorization: Bearer your-api-key

You can use either a Personal Access Token (PAT) or a Virtual Account Token (VAT) as your API key. For detailed information on creating and managing these tokens, refer to our Access Control documentation.

Request Headers

Name	Description
`Authorization`	Your TrueFoundry API key as bearer token
`x-tfy-metadata`	Stringified JSON where both keys and values must be strings. Used for request routing and metrics filtering
`x-tfy-provider-name`	Required for responses API, file upload API, and batch APIs to route requests to the correct provider account
`x-tfy-strict-openai`	Boolean flag to enable strict OpenAI compatibility (set to `false` for Claude reasoning model responses with thinking tokens)
`x-tfy-retry-config`	JSON object to configure retry behavior for failed requests (see example below)
`x-tfy-request-timeout`	Number in milliseconds specifying the maximum time to wait for a response

Example with Metadata

client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are an AI bot."},
        {"role": "user", "content": "Enter your prompt here"},
    ],
    model="tfy-ai-bedrock/us-anthropic-claude-sonnet-4-20250514-v1-0",
    stream=True,
    extra_headers={
        "X-TFY-METADATA": '{"tfy_log_request":"true", "custom_field":"value"}'
    }
)

Example with Retry Configuration

client.chat.completions.create(
    messages=[
        {"role": "user", "content": "Generate a summary of this article"},
    ],
    model="tfy-ai-openai/gpt-4o",
    extra_headers={
        "X-TFY-RETRY-CONFIG": '{"attempts": 3, "onStatusCodes": [429, 500, 503], "useRetryAfterHeader": true}'
    }
)

Response Headers

Name	Description
`x-tfy-resolved-model`	The final TrueFoundry model ID used to process the request (may differ from requested model due to load balancing or fallbacks)
`x-tfy-applied-configurations`	Dictionary of applied configurations including load balancing, fallback, model config, applied guardrails, and rate limiting
`server-timing`	For non-streaming requests only. Contains timing information for different processing stages including middlewares, guardrails, and model calls

Example of Server-Timing Header

When inspecting network requests in your browser’s developer tools, you’ll see the server-timing header with timing information like this:

Browser network inspector showing server-timing header with detailed processing time breakdown for middleware, guardrails, and model calls

Server-timing header in browser developer tools

Install AI Gateway Chat Completions

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

Request and Response Headers

Authentication

Request Headers

Example with Metadata

Example with Retry Configuration

Response Headers

Example of Server-Timing Header

Get Started

Developer Guide

MCP Registry and Gateway

Observability

Integrations

Deployment

API Reference

Chat

Agent

MCP

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Moderations

​Authentication

​Request Headers

​Example with Metadata

​Example with Retry Configuration

​Response Headers

​Example of Server-Timing Header

Authentication

Request Headers

Example with Metadata

Example with Retry Configuration

Response Headers

Example of Server-Timing Header