Authentication

To authenticate with the AI Gateway, provide your TrueFoundry API key as a bearer token in the Authorization header:
Authorization: Bearer your-api-key
You can use either a Personal Access Token (PAT) or a Virtual Account Token (VAT) as your API key. For detailed information on creating and managing these tokens, refer to our Access Control documentation.

Request Headers

NameDescriptionExample
AuthorizationYour TrueFoundry API key as bearer tokenAuthorization: Bearer TFY_API_KEY
x-tfy-metadataStringified JSON where both keys and values must be strings. Used for request routing and metrics filteringx-tfy-metadata: {"custom_field":"value"}
x-tfy-provider-nameRequired for responses API, file upload API, and batch APIs to route requests to the correct provider accountx-tfy-provider-name: openai
x-tfy-strict-openaiBoolean flag to enable strict OpenAI compatibility (set to false for Claude reasoning model responses with thinking tokens)x-tfy-strict-openai: true
x-tfy-retry-configJSON object to configure retry behavior for failed requestsx-tfy-retry-config: {"attempts": 3, "onStatusCodes": [429, 500, 503]}
x-tfy-request-timeoutNumber in milliseconds specifying the maximum time to wait for a response from a single model. If fallbacks or retries are configured, the timeout is applied per model request (i.e., each attempt, including fallbacks, will have its own timeout).x-tfy-request-timeout: 60000
x-tfy-logging-configConfiguration for request loggingx-tfy-logging-config: {"enabled": true}

Response Headers

NameDescription
x-tfy-resolved-modelThe final TrueFoundry model ID used to process the request (may differ from requested model due to load balancing or fallbacks)
x-tfy-applied-configurationsDictionary of applied configurations including load balancing, fallback, model config, applied guardrails, and rate limiting
server-timingFor non-streaming requests only. Contains timing information for different processing stages including middlewares, guardrails, and model calls

Example of Server-Timing Header

When inspecting network requests in your browser’s developer tools, you’ll see the server-timing header with timing information like this: The example below shows a detailed breakdown of request processing:
Processing StageDurationDescription
Authentication0.9 msAuthenticating User
Input guardrails0.7 msInput validation and content filtering
Model call1350 msAI model response generation (bulk of the time)
Output guardrails722.3 msOutput validation and filtering
Logging1.1 msLogging request
Total2080 msComplete request processing time (2.08 seconds)
Metrics like load balancing (0 ms), rate limiting (0 ms), and cost budget (0 ms) show zero duration because these configs weren’t triggered for this particular request. This timing information helps identify bottlenecks in your request processing pipeline.
Browser network inspector showing server-timing header with detailed processing time breakdown for middleware, guardrails, and model calls

Server-timing header in browser developer tools