Skip to main content
This guide provides instructions for using the Codex CLI through the Truefoundry LLM Gateway.
GPT-5-Codex is now available! You can access GPT-5-Codex through TrueFoundry AI Gateway. Follow the setup steps below to get started.

What is Codex?

Codex is the official command-line interface (CLI) tool for OpenAI, providing a streamlined way to interact with OpenAI’s language models directly from your terminal. With Truefoundry LLM Gateway integration, you can route your Codex requests via Gateway.

Key Features of OpenAI Codex CLI

  1. Terminal-Native AI Interactions: Chat with AI models directly from your terminal without switching contexts
  2. Intelligent Code Generation: Generate code snippets, functions, and programs across multiple programming languages using natural language prompts
  3. Streaming and Interactive Sessions: Real-time streaming responses enable dynamic, conversation-like interactions for code development

Prerequisites

Before integrating Codex with TrueFoundry, ensure you have:
  1. TrueFoundry Account: Create a Truefoundry account with at least one model provider and generate a Personal Access Token by following the instructions in Generating Tokens. For a quick setup guide, see our Gateway Quick Start
  2. Codex Installation: Install the Codex CLI on your system
  3. Routing Configuration: Setup routing configuration for your desired models (see Setup Process section below)

Why Routing Configuration is Necessary

Codex has internal logic that sends thinking tokens to certain models during processing. This works well with standard OpenAI model names (like gpt-5), but causes compatibility issues with Truefoundry’s fully qualified model names (like openai-main/gpt-5 or azure-openai/gpt-5). When Codex encounters fully qualified model names directly, it incorrectly sends thinking tokens, which can cause unexpected behavior. The Solution: Routing configuration allows you to:
  1. Use standard model names in your Codex commands (e.g., gpt-5)
  2. Have Truefoundry Gateway automatically route requests to the fully qualified target model (e.g., openai-main/gpt-5)
This approach prevents the thinking token issue while letting you access any model through the Truefoundry Gateway.

Setup Process

1. Configure Environment Variables

To connect Codex with Truefoundry LLM Gateway, set these environment variables:
export OPENAI_API_KEY=TFY_API_KEY
export OPENAI_BASE_URL="https://{controlPlaneUrl}/api/llm"
Replace TFY_API_KEY with your actual Truefoundry API key and {controlPlaneUrl} with your Truefoundry control plane URL.
TrueFoundry playground showing unified code snippet with base URL and model name highlighted for Codex CLI integration

Get Base URL and Model Name from Unified Code Snippet

Tip: Add these lines to your shell profile (.bashrc, .zshrc, etc.) to make the configuration persistent across terminal sessions.

2. Setup Routing Configuration

Create a routing configuration to route your requests to specific model providers: TrueFoundry AI Gateway Routing Configuration setup showing virtual model mapping This configuration ensures that when you request gpt-5 through Codex, your request will be routed to the openai-main/gpt-5 model with 100% of the traffic weight.
Add your desired model name (e.g., gpt-5) as the virtual model, and set the target to the fully qualified model name you want to use (e.g., openai-main/gpt-5).

Usage Examples

Basic Usage with Load Balanced Models

Always specify the model defined in your routing configuration to ensure your requests go through the Truefoundry Gateway:
# Basic chat with the load-balanced gpt-5 model
codex chat --model gpt-5 "Generate a Python function to calculate the Fibonacci sequence"

# All requests using --model gpt-5 will be routed through the TF Gateway
codex chat --model gpt-5 "Explain quantum computing in simple terms"

# Streaming also works with the load-balanced model
codex chat --model gpt-5 --stream "Write a short story about AI"

Advanced Options with Gateway Routing

# Set temperature while using the load-balanced model
codex chat --model gpt-5 --temperature 0.7 "Generate creative marketing slogans"

# Specify max tokens with the gateway-routed model
codex chat --model gpt-5 --max-tokens 500 "Write a detailed explanation of relativity"

# Chain operations through the gateway
cat input.txt | codex chat --model gpt-5 "Translate to French" | codex chat --model gpt-5 "Make this more formal"

Understanding Load Balancing

When you use Codex with --model gpt-5, your request is automatically routed according to your routing configuration. In the example above, all requests to gpt-5 are sent to openai-main/gpt-5 with 100% of the traffic. You can also create more sophisticated routing rules with multiple targets and different weights:
load_balance_targets:
  - target: openai-main/gpt-5
    weight: 70
  - target: azure-openai/gpt-5
    weight: 30
With this configuration, approximately 70% of your requests would go to OpenAI’s GPT-5 model, and 30% would go to Azure OpenAI GPT-5 model.