MCP Server API

Overview

The MCP Server API lets you programmatically interact with your registered MCP Servers through the TrueFoundry AI Gateway. You can invoke LLMs and tools, and integrate with external systems—all via a simple HTTP API.

How to Get the API Code Snippet

You can generate a ready-to-use API code snippet directly from the AI Gateway web UI:

Go to the Playground or your MCP Server group in the AI Gateway.
Click the API Code Snippet button.
Copy the generated code and use it in your application.

MCP Server API Code Snippet - Button

MCP Server API Code Snippet - Example

Authentication

All API requests require a TrueFoundry API token. Set it as an environment variable for convenience:

export TFY_API_TOKEN=your-token-here

API Endpoint

POST https://<tfy-control-plane-base-url>/api/llm/agent/responses

Replace <tfy-control-plane-base-url> with your TrueFoundry control plane URL.

Request Structure

Send a JSON payload with the following fields:

Request Parameters

Parameter	Type	Required	Default	Description
`model`	string	✓	-	The LLM model to use (e.g., “gpt-4o”)
`messages`	array	✗	-	Array of message objects with `role` and `content`
`mcp_servers`	array	✗	-	Array of MCP Server configurations (see below)
`max_tokens`	number	✗	-	Maximum number of tokens to generate
`temperature`	number	✗	-	Controls randomness in the response (0.0 to 2.0)
`top_p`	number	✗	-	Nucleus sampling parameter (0.0 to 1.0)
`top_k`	number	✗	-	Top-k sampling parameter
`stream`	boolean	✗	-	Whether to stream responses (only `true` is supported)
`iteration_limit`	number	✗	5	Maximum tool call iterations (1-20)

MCP Server Configuration

Each entry in the mcp_servers array should include:

MCP Server Parameters

Parameter	Type	Required	Default	Description
`integration_fqn`	string	✗*	-	Fully qualified name of the MCP Server integration
`url`	string	✗*	-	URL of the MCP server (must be valid URL)
`headers`	object	✗	-	HTTP headers to send to the MCP server
`enable_all_tools`	boolean	✗	`true`	Whether to enable all tools for this server
`tools`	array	✗	-	Array of specific tools to enable

*Note: Either integration_fqn or url must be provided, but not both.

Tool Configuration

Each entry in the tools array should include:

Tool Parameters

Parameter	Type	Required	Description
`name`	string	✓	The name of the tool as it appears in the MCP server

Example Request

curl --location 'https://<tfy-control-plane-base-url>/api/llm/agent/responses' \
  --header 'Content-Type: application/json' \
  --header 'x-tfy-metadata: {"tfy_log_request":"true"}' \
  --header "Authorization: Bearer ${TFY_API_TOKEN}" \
  --data-raw '{
    "temperature": 0.7,
    "max_tokens": 500,
    "model": "gpt-4o",
    "messages": [
      { "role": "user", "content": "hi" }
    ],
    "stream": true,
    "mcp_servers": [
      {
        "integration_fqn": "common-tools",
        "enable_all_tools": false,
        "tools": [
          { "name": "search" },
          { "name": "code_executor" }
        ]
      },
      {
        "integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:slack",
        "enable_all_tools": false,
        "tools": [
          { "name": "findUserByEmail" },
          { "name": "sendMessageToUser" }
        ]
      },
      {
        "integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:sentry",
        "enable_all_tools": false,
        "tools": [
          { "name": "getIssueEventDetails" }
        ]
      }
    ]
  }'

Streaming Response Format

The MCP Server API uses Server-Sent Events (SSE) to stream responses in real-time. This allows you to receive partial responses as they’re generated, including tool calls and their results.

Important: Both assistant content and tool call arguments are streamed incrementally across multiple chunks. You must accumulate these fragments to build complete responses.

Response Structure

Each SSE event contains:

data: {"id": "event_id", "object": "chat.completion.chunk", "choices": [...]}

Event Types

The streaming response includes several types of events:

1. Content Events

Regular assistant response content streamed over multiple chunks:

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221957,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "role": "assistant",
      "content": "",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": null
  }]
}

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221957,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "content": "User Name"
    },
    "logprobs": null,
    "finish_reason": null
  }]
}

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221957,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "content": " is a Slack user with"
    },
    "logprobs": null,
    "finish_reason": null
  }]
}

2. Tool Call Events

Tool calls are streamed incrementally - function names come first, then arguments are streamed across multiple chunks:

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221956,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "index": 0,
        "id": "call_xxxxxxxxxxxxxxxxxxxx",
        "type": "function",
        "function": {
          "name": "a_getSlackUsers",
          "arguments": ""
        },
        "mcp_server_integration_id": "xxxxxxxxxxxxxxxxxxxxx",
        "tool_name": "getSlackUsers"
      }],
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": null
  }]
}

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221956,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "tool_calls": [{
        "index": 0,
        "function": {
          "arguments": "{}"
        }
      }]
    },
    "logprobs": null,
    "finish_reason": null
  }]
}

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221956,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {},
    "logprobs": null,
    "finish_reason": "tool_calls"
  }]
}

3. Tool Result Events

Results from tool execution:

{
  "id": "tool-call-result-call_xxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "delta": {
      "tool_call_id": "call_xxxxxxxxxxxxxxxxxxxx",
      "role": "tool",
      "content": "{\"content\":[{\"type\":\"text\",\"text\":\"[{\\\"id\\\":\\\"UXXXXXXX\\\",\\\"is_bot\\\":false,\\\"real_name\\\":\\\"User Name\\\",\\\"email\\\":\\\"user@example.com\\\"},{\\\"id\\\":\\\"UXXXXXXX\\\",\\\"is_bot\\\":false,\\\"real_name\\\":\\\"Another User\\\",\\\"email\\\":\\\"another@example.com\\\"}]\"}]}",
      "mcp_server_integration_id": "xxxxxxxxxxxxxxxxxxxxx",
      "tool_name": "getSlackUsers"
    }
  }]
}

4. Error Events

When errors occur:

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "This request would exceed the rate limit for your organization. Please reduce the prompt length or try again later."
  }
}

Processing Streaming Responses

OpenAI Client Example

You can use the OpenAI client library with a custom base URL to handle the streaming response:

import httpx
import json
from openai import OpenAI

def update_base_url(request: httpx.Request) -> None:
    request.url = httpx.URL(str(request.url).replace("chat/completions", "agent/responses"))

client = OpenAI(
    base_url="https://your-gateway-url",
    api_key="your-tfy-api-token",
    http_client=httpx.Client(
        event_hooks={"request": [update_base_url]}
    )
)

def handle_tool_calls(delta):
    """Handle tool calls from assistant."""
    global current_tool_calls

    tool_calls = delta.tool_calls
    for tool_call in tool_calls:
        index = tool_call.index

        # Initialize tool call if it's new
        if index not in current_tool_calls:
            current_tool_calls[index] = {
                'id': '',
                'function': {'name': '', 'arguments': ''},
                'integration_id': '',
                'tool_name': '',
                'name_printed': False
            }

        # Update tool call with new data
        if hasattr(tool_call, 'id') and tool_call.id:
            current_tool_calls[index]['id'] = tool_call.id

        if hasattr(tool_call, 'mcp_server_integration_id'):
            current_tool_calls[index]['integration_id'] = tool_call.mcp_server_integration_id

        if hasattr(tool_call, 'tool_name'):
            current_tool_calls[index]['tool_name'] = tool_call.tool_name

        if hasattr(tool_call, 'function') and tool_call.function:
            function_data = tool_call.function
            if hasattr(function_data, 'name') and function_data.name and not current_tool_calls[index]['name_printed']:
                current_tool_calls[index]['function']['name'] = function_data.name
                current_tool_calls[index]['name_printed'] = True

                # Print tool call header
                tool_call_id = current_tool_calls[index]['id'] or 'pending'
                integration_id = current_tool_calls[index]['integration_id']
                tool_name = current_tool_calls[index]['tool_name']
                integration_info = f" (Integration: {integration_id}), Tool Name: {tool_name}" if integration_id else ""
                print(f"\n[Tool Call: {tool_call_id}:{function_data.name}{integration_info}]")
                print("Args: ", end='', flush=True)

            if hasattr(function_data, 'arguments') and function_data.arguments:
                current_tool_calls[index]['function']['arguments'] += function_data.arguments
                print(function_data.arguments, end='', flush=True)

def handle_tool_result(delta):
    """Handle tool result messages."""
    integration_id = getattr(delta, 'mcp_server_integration_id', '')
    tool_name = getattr(delta, 'tool_name', '')
    tool_call_id = getattr(delta, 'tool_call_id', '')

    integration_info = f" (Integration: {integration_id}), Tool Name: {tool_name}" if integration_id else ""
    print(f"\n[Tool Result: {tool_call_id}:{tool_name}{integration_info}]: ", end='', flush=True)

    content = getattr(delta, 'content', '')
    if content:
        try:
            # Try to parse the JSON to extract the actual result
            result_json = json.loads(content)
            if 'content' in result_json:
                for item in result_json['content']:
                    if item.get('type') == 'text':
                        text = item.get('text', '')
                        print(text, end='', flush=True)
            else:
                # Just print the content as is if we can't extract text
                print(content, end='', flush=True)
        except json.JSONDecodeError:
            # If not valid JSON, just print as is
            print(content, end='', flush=True)

    print()  # Add newline after tool result

# Initialize tracking variables
current_tool_calls = {}
messages = []

# Stream the response
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Your message"}],
    stream=True,
    extra_body={
        "mcp_servers": [
            {
                "integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:slack",
                "enable_all_tools": False,
                "tools": [{"name": "getSlackUsers"}, {"name": "findUserByEmail"}]
            }
        ],
        "iteration_limit": 10
    }
)

print("Assistant: ", end='', flush=True)

for chunk in stream:
    if chunk.choices:
        choice = chunk.choices[0]
        delta = choice.delta
        finish_reason = choice.finish_reason

        # Handle tool results
        if hasattr(delta, 'role') and delta.role == 'tool':
            handle_tool_result(delta)
            continue

        # Handle tool calls
        if hasattr(delta, 'tool_calls') and delta.tool_calls:
            handle_tool_calls(delta)

        # Handle regular content
        if hasattr(delta, 'content') and delta.content:
            print(delta.content, end='', flush=True)

        # Handle message completion
        if finish_reason:
            if current_tool_calls:
                print()  # Add newline after tool call arguments
            current_tool_calls = {}  # Reset for next iteration

print()  # Final newline

Tool Call Flow

The streaming API follows this flow when tools are involved:

Assistant Response Start: Initial content from the LLM (streamed)
Tool Call Event: Function name, then arguments streamed incrementally
Tool Execution: The gateway executes the complete tool call
Tool Result Event: Results are streamed back
Assistant Follow-up: The assistant processes results and continues

Stream Termination

The stream ends with either:

A [DONE] message indicating completion
An error event if something goes wrong
Client disconnection

Tips

Use the API Code Snippet button in the UI for a quick start.
Always keep your API token secure.
You can enable all tools for a server by setting enable_all_tools to true, or specify only the tools you need.
For more details on tool names and integration FQNs, check your MCP Server configuration in the AI Gateway.

Get Started

Developer Guide

MCP Server

Configure Gateway

Observability

Deployment

API Reference

Chat

Embeddings

Rerank

Responses

Audio

Batch

Files

Moderations

Overview

How to Get the API Code Snippet

Authentication

API Endpoint

Request Structure

Request Parameters

MCP Server Configuration

MCP Server Parameters

Tool Configuration

Tool Parameters

Example Request

Streaming Response Format

Response Structure

Event Types

1. Content Events

2. Tool Call Events

3. Tool Result Events

4. Error Events

Processing Streaming Responses

OpenAI Client Example

Tool Call Flow

Stream Termination

Tips

Get Started

Developer Guide

MCP Server

Configure Gateway

Observability

Deployment

API Reference

Chat

Embeddings

Rerank

Responses

Audio

Batch

Files

Moderations

​Overview

​How to Get the API Code Snippet

​Authentication

​API Endpoint

​Request Structure

Request Parameters

​MCP Server Configuration

MCP Server Parameters

​Tool Configuration

Tool Parameters

​Example Request

​Streaming Response Format

​Response Structure

​Event Types

​1. Content Events

​2. Tool Call Events

​3. Tool Result Events

​4. Error Events

​Processing Streaming Responses

​OpenAI Client Example

​Tool Call Flow

​Stream Termination

​Tips

Overview

How to Get the API Code Snippet

Authentication

API Endpoint

Request Structure

MCP Server Configuration

Tool Configuration

Example Request

Streaming Response Format

Response Structure

Event Types

1. Content Events

2. Tool Call Events

3. Tool Result Events

4. Error Events

Processing Streaming Responses

OpenAI Client Example

Tool Call Flow

Stream Termination

Tips