gpt-4
), but causes compatibility issues with Truefoundry’s fully qualified model names (like openai-main/gpt-4
or azure-openai/gpt-4
).
When Codex encounters these fully qualified model names directly, it incorrectly sends thinking tokens, which can cause unexpected behavior.
The Solution: Load balancing configuration allows you to:
gpt-4
)openai-main/gpt-4
)TFY_API_KEY
with your actual Truefoundry API key and {controlPlaneUrl}
with your Truefoundry control plane URL.
Get Base URL and Model Name from Unified Code Snippet
.bashrc
, .zshrc
, etc.) for persistent configuration:
gpt-4
through Codex, your request will be routed to the openai-main/gpt-4
model with 100% of the traffic weight.
--model gpt-4
, your request gets load-balanced according to your configuration. In the example configuration above, any request to gpt-4
will be routed to openai-main/gpt-4
with 100% of the traffic.
You can create more sophisticated routing rules with multiple targets and different weights: