Overview

The MCP Server API lets you programmatically interact with your registered MCP Servers through the TrueFoundry AI Gateway. You can invoke LLMs and tools, and integrate with external systems—all via a simple HTTP API.


How to Get the API Code Snippet

You can generate a ready-to-use API code snippet directly from the AI Gateway web UI:

  1. Go to the Playground or your MCP Server group in the AI Gateway.
  2. Click the API Code Snippet button.
  3. Copy the generated code and use it in your application.

MCP Server API Code Snippet - Button

MCP Server API Code Snippet - Example


Authentication

All API requests require a TrueFoundry API token. Set it as an environment variable for convenience:

export TFY_API_TOKEN=your-token-here

API Endpoint

POST https://<tfy-control-plane-base-url>/api/llm/agent/responses

Replace <tfy-control-plane-base-url> with your TrueFoundry control plane URL.


Request Structure

Send a JSON payload with the following fields:

Request Parameters

ParameterTypeRequiredDefaultDescription
modelstring-The LLM model to use (e.g., “gpt-4o”)
messagesarray-Array of message objects with role and content
mcp_serversarray-Array of MCP Server configurations (see below)
max_tokensnumber-Maximum number of tokens to generate
temperaturenumber-Controls randomness in the response (0.0 to 2.0)
top_pnumber-Nucleus sampling parameter (0.0 to 1.0)
top_knumber-Top-k sampling parameter
streamboolean-Whether to stream responses (only true is supported)
iteration_limitnumber5Maximum tool call iterations (1-20)

MCP Server Configuration

Each entry in the mcp_servers array should include:

MCP Server Parameters

ParameterTypeRequiredDefaultDescription
integration_fqnstring✗*-Fully qualified name of the MCP Server integration
urlstring✗*-URL of the MCP server (must be valid URL)
headersobject-HTTP headers to send to the MCP server
enable_all_toolsbooleantrueWhether to enable all tools for this server
toolsarray-Array of specific tools to enable

*Note: Either integration_fqn or url must be provided, but not both.

Tool Configuration

Each entry in the tools array should include:

Tool Parameters

ParameterTypeRequiredDescription
namestringThe name of the tool as it appears in the MCP server

Example Request

curl --location 'https://<tfy-control-plane-base-url>/api/llm/agent/responses' \
  --header 'Content-Type: application/json' \
  --header 'x-tfy-metadata: {"tfy_log_request":"true"}' \
  --header "Authorization: Bearer ${TFY_API_TOKEN}" \
  --data-raw '{
    "temperature": 0.7,
    "max_tokens": 500,
    "model": "gpt-4o",
    "messages": [
      { "role": "user", "content": "hi" }
    ],
    "stream": true,
    "mcp_servers": [
      {
        "integration_fqn": "common-tools",
        "enable_all_tools": false,
        "tools": [
          { "name": "search" },
          { "name": "code_executor" }
        ]
      },
      {
        "integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:slack",
        "enable_all_tools": false,
        "tools": [
          { "name": "findUserByEmail" },
          { "name": "sendMessageToUser" }
        ]
      },
      {
        "integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:sentry",
        "enable_all_tools": false,
        "tools": [
          { "name": "getIssueEventDetails" }
        ]
      }
    ]
  }'

Streaming Response Format

The MCP Server API uses Server-Sent Events (SSE) to stream responses in real-time. This allows you to receive partial responses as they’re generated, including tool calls and their results.

Important: Both assistant content and tool call arguments are streamed incrementally across multiple chunks. You must accumulate these fragments to build complete responses.

Response Structure

Each SSE event contains:

data: {"id": "event_id", "object": "chat.completion.chunk", "choices": [...]}

Event Types

The streaming response includes several types of events:

1. Content Events

Regular assistant response content streamed over multiple chunks:

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221957,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "role": "assistant",
      "content": "",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": null
  }]
}
{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221957,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "content": "User Name"
    },
    "logprobs": null,
    "finish_reason": null
  }]
}
{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221957,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "content": " is a Slack user with"
    },
    "logprobs": null,
    "finish_reason": null
  }]
}

2. Tool Call Events

Tool calls are streamed incrementally - function names come first, then arguments are streamed across multiple chunks:

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221956,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "index": 0,
        "id": "call_xxxxxxxxxxxxxxxxxxxx",
        "type": "function",
        "function": {
          "name": "a_getSlackUsers",
          "arguments": ""
        },
        "mcp_server_integration_id": "xxxxxxxxxxxxxxxxxxxxx",
        "tool_name": "getSlackUsers"
      }],
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": null
  }]
}
{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221956,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {
      "tool_calls": [{
        "index": 0,
        "function": {
          "arguments": "{}"
        }
      }]
    },
    "logprobs": null,
    "finish_reason": null
  }]
}
{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "created": 1750221956,
  "model": "gpt-4o-2024-08-06",
  "service_tier": "default",
  "system_fingerprint": "fp_xxxxxxxxxxxx",
  "choices": [{
    "index": 0,
    "delta": {},
    "logprobs": null,
    "finish_reason": "tool_calls"
  }]
}

3. Tool Result Events

Results from tool execution:

{
  "id": "tool-call-result-call_xxxxxxxxxxxxxxxxxxxx",
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "delta": {
      "tool_call_id": "call_xxxxxxxxxxxxxxxxxxxx",
      "role": "tool",
      "content": "{\"content\":[{\"type\":\"text\",\"text\":\"[{\\\"id\\\":\\\"UXXXXXXX\\\",\\\"is_bot\\\":false,\\\"real_name\\\":\\\"User Name\\\",\\\"email\\\":\\\"user@example.com\\\"},{\\\"id\\\":\\\"UXXXXXXX\\\",\\\"is_bot\\\":false,\\\"real_name\\\":\\\"Another User\\\",\\\"email\\\":\\\"another@example.com\\\"}]\"}]}",
      "mcp_server_integration_id": "xxxxxxxxxxxxxxxxxxxxx",
      "tool_name": "getSlackUsers"
    }
  }]
}

4. Error Events

When errors occur:

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "This request would exceed the rate limit for your organization. Please reduce the prompt length or try again later."
  }
}

Processing Streaming Responses

OpenAI Client Example

You can use the OpenAI client library with a custom base URL to handle the streaming response:

import httpx
import json
from openai import OpenAI

def update_base_url(request: httpx.Request) -> None:
    request.url = httpx.URL(str(request.url).replace("chat/completions", "agent/responses"))

client = OpenAI(
    base_url="https://your-gateway-url",
    api_key="your-tfy-api-token",
    http_client=httpx.Client(
        event_hooks={"request": [update_base_url]}
    )
)

def handle_tool_calls(delta):
    """Handle tool calls from assistant."""
    global current_tool_calls

    tool_calls = delta.tool_calls
    for tool_call in tool_calls:
        index = tool_call.index

        # Initialize tool call if it's new
        if index not in current_tool_calls:
            current_tool_calls[index] = {
                'id': '',
                'function': {'name': '', 'arguments': ''},
                'integration_id': '',
                'tool_name': '',
                'name_printed': False
            }

        # Update tool call with new data
        if hasattr(tool_call, 'id') and tool_call.id:
            current_tool_calls[index]['id'] = tool_call.id

        if hasattr(tool_call, 'mcp_server_integration_id'):
            current_tool_calls[index]['integration_id'] = tool_call.mcp_server_integration_id

        if hasattr(tool_call, 'tool_name'):
            current_tool_calls[index]['tool_name'] = tool_call.tool_name

        if hasattr(tool_call, 'function') and tool_call.function:
            function_data = tool_call.function
            if hasattr(function_data, 'name') and function_data.name and not current_tool_calls[index]['name_printed']:
                current_tool_calls[index]['function']['name'] = function_data.name
                current_tool_calls[index]['name_printed'] = True

                # Print tool call header
                tool_call_id = current_tool_calls[index]['id'] or 'pending'
                integration_id = current_tool_calls[index]['integration_id']
                tool_name = current_tool_calls[index]['tool_name']
                integration_info = f" (Integration: {integration_id}), Tool Name: {tool_name}" if integration_id else ""
                print(f"\n[Tool Call: {tool_call_id}:{function_data.name}{integration_info}]")
                print("Args: ", end='', flush=True)

            if hasattr(function_data, 'arguments') and function_data.arguments:
                current_tool_calls[index]['function']['arguments'] += function_data.arguments
                print(function_data.arguments, end='', flush=True)

def handle_tool_result(delta):
    """Handle tool result messages."""
    integration_id = getattr(delta, 'mcp_server_integration_id', '')
    tool_name = getattr(delta, 'tool_name', '')
    tool_call_id = getattr(delta, 'tool_call_id', '')

    integration_info = f" (Integration: {integration_id}), Tool Name: {tool_name}" if integration_id else ""
    print(f"\n[Tool Result: {tool_call_id}:{tool_name}{integration_info}]: ", end='', flush=True)

    content = getattr(delta, 'content', '')
    if content:
        try:
            # Try to parse the JSON to extract the actual result
            result_json = json.loads(content)
            if 'content' in result_json:
                for item in result_json['content']:
                    if item.get('type') == 'text':
                        text = item.get('text', '')
                        print(text, end='', flush=True)
            else:
                # Just print the content as is if we can't extract text
                print(content, end='', flush=True)
        except json.JSONDecodeError:
            # If not valid JSON, just print as is
            print(content, end='', flush=True)

    print()  # Add newline after tool result

# Initialize tracking variables
current_tool_calls = {}
messages = []

# Stream the response
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Your message"}],
    stream=True,
    extra_body={
        "mcp_servers": [
            {
                "integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:slack",
                "enable_all_tools": False,
                "tools": [{"name": "getSlackUsers"}, {"name": "findUserByEmail"}]
            }
        ],
        "iteration_limit": 10
    }
)

print("Assistant: ", end='', flush=True)

for chunk in stream:
    if chunk.choices:
        choice = chunk.choices[0]
        delta = choice.delta
        finish_reason = choice.finish_reason

        # Handle tool results
        if hasattr(delta, 'role') and delta.role == 'tool':
            handle_tool_result(delta)
            continue

        # Handle tool calls
        if hasattr(delta, 'tool_calls') and delta.tool_calls:
            handle_tool_calls(delta)

        # Handle regular content
        if hasattr(delta, 'content') and delta.content:
            print(delta.content, end='', flush=True)

        # Handle message completion
        if finish_reason:
            if current_tool_calls:
                print()  # Add newline after tool call arguments
            current_tool_calls = {}  # Reset for next iteration

print()  # Final newline

Tool Call Flow

The streaming API follows this flow when tools are involved:

  1. Assistant Response Start: Initial content from the LLM (streamed)
  2. Tool Call Event: Function name, then arguments streamed incrementally
  3. Tool Execution: The gateway executes the complete tool call
  4. Tool Result Event: Results are streamed back
  5. Assistant Follow-up: The assistant processes results and continues

Stream Termination

The stream ends with either:

  • A [DONE] message indicating completion
  • An error event if something goes wrong
  • Client disconnection

Tips

  • Use the API Code Snippet button in the UI for a quick start.
  • Always keep your API token secure.
  • You can enable all tools for a server by setting enable_all_tools to true, or specify only the tools you need.
  • For more details on tool names and integration FQNs, check your MCP Server configuration in the AI Gateway.