TrueFoundry AI Gateway provides access to model reasoning processes through thinking/reasoning tokens, currently available for Claude 3.7 Sonnet (via Anthropic, AWS Bedrock, and Google Vertex AI). These models expose their internal reasoning process, allowing you to see how they arrive at conclusions. The thinking/reasoning tokens provide step-by-step insights into the model’s cognitive process.

Enabling Reasoning Tokens

To enable thinking/reasoning tokens, your request must include:
  1. The header: X-TFY-STRICT-OPENAI: false
  2. A thinking field in the request body
import requests
import json

url = "https://{controlPlaneUrl}/api/llm/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "X-TFY-STRICT-OPENAI": "false"
}

payload = {
    "messages": [
        {"role": "user", "content": "How to compute 3^3^3?"}
    ],
    "model": "anthropic/claude-3-7",
    "thinking": {
        "type": "enabled",
        "budget_tokens": 16000
    },
    "max_tokens": 18000
}

response = requests.post(url, headers=headers, json=payload)
When the X-TFY-STRICT-OPENAI header is set to false, the response is no longer OpenAI-compliant, as it introduces an additional reasoning layer that OpenAI’s compliance framework does not support.

Response Format

When reasoning tokens are enabled, the response includes both thinking and content sections:
{
  "id": "1742890579083",
  "object": "chat.completion",
  "created": 1742890579,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "thinking",
            "thinking": "The user has asked a complex question about quantum mechanics. To provide a useful answer, I should first break down the core concepts and then explain them in simple terms before diving into advanced details."
          },
          {
            "type": "text",
            "text": "Quantum mechanics is a branch of physics that explains how particles behave at very small scales. Unlike classical physics, where objects have definite positions and velocities, quantum particles exist in a superposition of states until measured. Would you like a more detailed explanation or examples?"
          }
        ]
      },
      "finish_reason": "end_turn"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 180,
    "total_tokens": 225
  }
}

Streaming with Reasoning Tokens

For streaming responses, the thinking section is always sent before the content section.

Thinking Token Chunk

{
  "id": "aws-1742890615621",
  "object": "chat.completion.chunk",
  "created": 1742890615,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "thinking": "The user is asking about the differences between AI and machine learning. I should start by defining AI in general and then narrow down to how ML fits into it."
      }
    }
  ]
}

Content Token Chunk

{
  "id": "aws-1742890615621",
  "object": "chat.completion.chunk",
  "created": 1742890615,
  "model": "",
  "provider": "aws",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "Artificial Intelligence (AI) is a broad field of computer science focused on building systems that can perform tasks requiring human intelligence. Machine Learning (ML) is a subset of AI that enables computers to learn patterns from data and improve performance over time without explicit programming."
      }
    }
  ]
}
In streaming responses, the thinking chunk typically arrives first, followed by the content chunks.