Using Codex

This guide provides instructions for using the Codex CLI through the Truefoundry LLM Gateway.

What is Codex?

Codex is the official command-line interface (CLI) tool for OpenAI, providing a streamlined way to interact with OpenAI's language models directly from your terminal. With Truefoundry LLM Gateway integration, you can route your Codex requests via Gateway.

Prerequisites

Codex (OpenAI CLI) installed on your system
TrueFoundry API Key
Load Balance Config setup for your desired models

Why Load Balancing Configuration is Necessary

Codex has internal logic that sends "thinking tokens" to certain models during processing. This works well with standard OpenAI model names (like gpt-4), but causes compatibility issues with Truefoundry's fully qualified model names (like openai-main/gpt-4 or azure-openai/gpt-4).

When Codex encounters these fully qualified model names directly, it incorrectly sends thinking tokens, which can cause unexpected behavior.

The Solution: Load balancing configuration allows you to:

Call a standard model name in your Codex commands (e.g., gpt-4)
Have Truefoundry Gateway automatically route the request to the fully qualified target model (e.g., openai-main/gpt-4)

This approach prevents the thinking token issue while still letting you access any model through the Truefoundry Gateway.

Setup Process

1. Configure Environment Variables

To connect Codex with Truefoundry LLM Gateway, set these environment variables:

export OPENAI_API_KEY=TFY_API_KEY
export OPENAI_BASE_URL="https://{controlPlaneUrl}/api/llm/api/inference/openai"

Replace TFY_API_KEY with your actual Truefoundry API key and {controlPlaneUrl} with your Truefoundry control plane URL.

You can add these lines to your shell profile (.bashrc, .zshrc, etc.) for persistent configuration:

2. Setup Load Balance Configuration

Create a load balancing configuration to route your requests to specific model providers:

name: loadbalancing-config
type: gateway-load-balancing-config
rules:
  - id: codex-load-balancing
    type: weight-based-routing
    when:
      models:
        - gpt-4
    load_balance_targets:
      - target: openai-main/gpt-4
        weight: 100

This configuration ensures that when you request gpt-4 through Codex, your request will be routed to the openai-main/gpt-4 model with 100% of the traffic weight.

Usage Examples

Basic Usage with Load Balanced Models

Always specify the model defined in your load balancing configuration to ensure your requests go through the Truefoundry Gateway:

# Basic chat with the load-balanced gpt-4 model
codex chat --model gpt-4 "Generate a Python function to calculate the Fibonacci sequence"

# All requests using --model gpt-4 will be routed through the TF Gateway
codex chat --model gpt-4 "Explain quantum computing in simple terms"

# Streaming also works with the load-balanced model
codex chat --model gpt-4 --stream "Write a short story about AI"

Advanced Options with Gateway Routing

# Set temperature while using the load-balanced model
codex chat --model gpt-4 --temperature 0.7 "Generate creative marketing slogans"

# Specify max tokens with the gateway-routed model
codex chat --model gpt-4 --max-tokens 500 "Write a detailed explanation of relativity"

# Chain operations through the gateway
cat input.txt | codex chat --model gpt-4 "Translate to French" | codex chat --model gpt-4 "Make this more formal"

Understanding Load Balancing

When you use Codex with --model gpt-4, your request gets load-balanced according to your configuration. In the example configuration above, any request to gpt-4 will be routed to openai-main/gpt-4 with 100% of the traffic.

You can create more sophisticated routing rules with multiple targets and different weights:

load_balance_targets:
  - target: openai-main/gpt-4
    weight: 70
  - target: azure-openai/gpt-4
    weight: 30

With this configuration, approximately 70% of your requests would go to OpenAI's GPT-4 model, and 30% would go to Azure OpenAI GPT-4 model.