This guide provides instructions for integrating OpenAI Swarm with the Truefoundry AI Gateway.

What is OpenAI Swarm?

OpenAI Swarm is an experimental framework for building, orchestrating, and deploying multi-agent systems. It provides a lightweight, scalable, and highly customizable approach to coordinating multiple AI agents that can work together to solve complex tasks through handoffs and collaboration.

Key Features of OpenAI Swarm

  • Multi-Agent Coordination: Build teams of specialized agents that can transfer tasks between each other and collaborate on complex workflows
  • Lightweight Framework: Minimal overhead with a simple API that makes it easy to define agents, their capabilities, and coordination patterns
  • Function Calling: Agents can be equipped with custom functions and tools to interact with external systems and APIs
  • Contextual Handoffs: Seamless transfer of context and conversation state between agents based on user needs and agent capabilities

Prerequisites

Before integrating OpenAI Swarm with TrueFoundry, ensure you have:
  1. TrueFoundry Account: Create a Truefoundry account and follow our Gateway Quick Start
  2. OpenAI Swarm Installation: Install OpenAI Swarm using pip: pip install git+https://github.com/openai/swarm.git
  3. Load Balance Configuration: Setup load balancing configuration for your desired models (see Setup Process section below)

Important: Model Routing Configuration

Routing Configuration Required: OpenAI Swarm uses standard OpenAI model names (like gpt-4o, gpt-4o-mini) in its internal logic. To route these requests through TrueFoundry Gateway to your specific model providers, you need to set up a load balancing configuration on TrueFoundry. This maps standard model names to your TrueFoundry model format (e.g., openai-main/gpt-4o).For detailed information about load balancing configurations, see our Load Balancing Documentation.

Setup Process

1. Basic Setup with OpenAI Swarm

You will get your ‘truefoundry-api-key’, ‘truefoundry-gateway-url’ and model name directly from the unified code snippet
Configure the OpenAI client with TrueFoundry gateway settings and use it with Swarm:
from swarm import Swarm, Agent
import os
from openai import OpenAI

# Get API key from environment variable
api_key = os.getenv("OPENAI_API_KEY")  # Set this to your TrueFoundry API key

# Initialize OpenAI client with TrueFoundry gateway
openai_client = OpenAI(
    api_key="your-truefoundry-api-key",
    base_url="your-truefoundry-base-url",
)

# Initialize Swarm with the TrueFoundry-configured OpenAI client
client = Swarm(client=openai_client)

def transfer_to_agent_b():
    return agent_b

agent_a = Agent(
    name="Agent A",
    instructions="You are a helpful agent.",
    functions=[transfer_to_agent_b],
)

agent_b = Agent(
    name="Agent B",
    instructions="Only speak in Haikus.",
)

response = client.run(
    agent=agent_a,
    messages=[{"role": "user", "content": "I want to talk to agent B."}],
)

print(response.messages[-1]["content"])
Replace:
  • your-truefoundry-api-key with your actual TrueFoundry API key
  • your-truefoundry-base-url with your TrueFoundry Gateway URL

2. Configure Model Routing

Create a load balancing configuration to route standard OpenAI model names to your TrueFoundry providers:
name: swarm-load-balancing-config
type: gateway-load-balancing-config
rules:
  - id: swarm-gpt4o-routing
    type: weight-based-routing
    when:
      models:
        - gpt-4o
    load_balance_targets:
      - target: openai-main/gpt-4o
        weight: 100

3. Environment Variables Configuration

For persistent configuration across your Swarm applications, set these environment variables:
export OPENAI_API_KEY="your-truefoundry-api-key"
export OPENAI_BASE_URL="your-truefoundry-base-url"

Observability and Governance

Monitor your OpenAI Swarm applications through TrueFoundry’s metrics tab: TrueFoundry metrics With TrueFoundry’s AI gateway, you can monitor and analyze:
  • Performance Metrics: Track key latency metrics like Request Latency, Time to First Token (TTFS), and Inter-Token Latency (ITL) with P99, P90, and P50 percentiles
  • Cost and Token Usage: Gain visibility into your application’s costs with detailed breakdowns of input/output tokens and the associated expenses for each model
  • Usage Patterns: Understand how your application is being used with detailed analytics on user activity, model distribution, and team-based usage
  • Agent Performance: Monitor individual agent performance and handoff patterns
  • Rate limit and Load balancing: Set up rate limiting, load balancing and fallback for your models