Overview
The MCP Server API lets you programmatically interact with your registered MCP Servers through the TrueFoundry AI Gateway. You can invoke LLMs and tools, and integrate with external systems—all via a simple HTTP API.
How to Get the API Code Snippet
You can generate a ready-to-use API code snippet directly from the AI Gateway web UI:
- Go to the Playground or your MCP Server group in the AI Gateway.
- Click the API Code Snippet button.
- Copy the generated code and use it in your application.
MCP Server API Code Snippet - Button
MCP Server API Code Snippet - Example
Authentication
All API requests require a TrueFoundry API token. Set it as an environment variable for convenience:
export TFY_API_TOKEN=your-token-here
API Endpoint
POST https://<tfy-control-plane-base-url>/api/llm/agent/responses
Replace <tfy-control-plane-base-url>
with your TrueFoundry control plane URL.
Request Structure
Send a JSON payload with the following fields:
Request Parameters
Parameter | Type | Required | Default | Description |
---|
model | string | ✓ | - | The LLM model to use (e.g., “gpt-4o”) |
messages | array | ✗ | - | Array of message objects with role and content |
mcp_servers | array | ✗ | - | Array of MCP Server configurations (see below) |
max_tokens | number | ✗ | - | Maximum number of tokens to generate |
temperature | number | ✗ | - | Controls randomness in the response (0.0 to 2.0) |
top_p | number | ✗ | - | Nucleus sampling parameter (0.0 to 1.0) |
top_k | number | ✗ | - | Top-k sampling parameter |
stream | boolean | ✗ | - | Whether to stream responses (only true is supported) |
iteration_limit | number | ✗ | 5 | Maximum tool call iterations (1-20) |
MCP Server Configuration
Each entry in the mcp_servers
array should include:
MCP Server Parameters
Parameter | Type | Required | Default | Description |
---|
integration_fqn | string | ✗* | - | Fully qualified name of the MCP Server integration |
url | string | ✗* | - | URL of the MCP server (must be valid URL) |
headers | object | ✗ | - | HTTP headers to send to the MCP server |
enable_all_tools | boolean | ✗ | true | Whether to enable all tools for this server |
tools | array | ✗ | - | Array of specific tools to enable |
*Note: Either integration_fqn
or url
must be provided, but not both.
Each entry in the tools
array should include:
Tool Parameters
Parameter | Type | Required | Description |
---|
name | string | ✓ | The name of the tool as it appears in the MCP server |
Example Request
curl --location 'https://<tfy-control-plane-base-url>/api/llm/agent/responses' \
--header 'Content-Type: application/json' \
--header 'x-tfy-metadata: {"tfy_log_request":"true"}' \
--header "Authorization: Bearer ${TFY_API_TOKEN}" \
--data-raw '{
"temperature": 0.7,
"max_tokens": 500,
"model": "gpt-4o",
"messages": [
{ "role": "user", "content": "hi" }
],
"stream": true,
"mcp_servers": [
{
"integration_fqn": "common-tools",
"enable_all_tools": false,
"tools": [
{ "name": "search" },
{ "name": "code_executor" }
]
},
{
"integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:slack",
"enable_all_tools": false,
"tools": [
{ "name": "findUserByEmail" },
{ "name": "sendMessageToUser" }
]
},
{
"integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:sentry",
"enable_all_tools": false,
"tools": [
{ "name": "getIssueEventDetails" }
]
}
]
}'
The MCP Server API uses Server-Sent Events (SSE) to stream responses in real-time. This allows you to receive partial responses as they’re generated, including tool calls and their results.
Important: Both assistant content and tool call arguments are streamed incrementally across multiple chunks. You must accumulate these fragments to build complete responses.
Response Structure
Each SSE event contains:
data: {"id": "event_id", "object": "chat.completion.chunk", "choices": [...]}
Event Types
The streaming response includes several types of events:
1. Content Events
Regular assistant response content streamed over multiple chunks:
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"created": 1750221957,
"model": "gpt-4o-2024-08-06",
"service_tier": "default",
"system_fingerprint": "fp_xxxxxxxxxxxx",
"choices": [{
"index": 0,
"delta": {
"role": "assistant",
"content": "",
"refusal": null
},
"logprobs": null,
"finish_reason": null
}]
}
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"created": 1750221957,
"model": "gpt-4o-2024-08-06",
"service_tier": "default",
"system_fingerprint": "fp_xxxxxxxxxxxx",
"choices": [{
"index": 0,
"delta": {
"content": "User Name"
},
"logprobs": null,
"finish_reason": null
}]
}
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"created": 1750221957,
"model": "gpt-4o-2024-08-06",
"service_tier": "default",
"system_fingerprint": "fp_xxxxxxxxxxxx",
"choices": [{
"index": 0,
"delta": {
"content": " is a Slack user with"
},
"logprobs": null,
"finish_reason": null
}]
}
Tool calls are streamed incrementally - function names come first, then arguments are streamed across multiple chunks:
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"created": 1750221956,
"model": "gpt-4o-2024-08-06",
"service_tier": "default",
"system_fingerprint": "fp_xxxxxxxxxxxx",
"choices": [{
"index": 0,
"delta": {
"role": "assistant",
"content": null,
"tool_calls": [{
"index": 0,
"id": "call_xxxxxxxxxxxxxxxxxxxx",
"type": "function",
"function": {
"name": "a_getSlackUsers",
"arguments": ""
},
"mcp_server_integration_id": "xxxxxxxxxxxxxxxxxxxxx",
"tool_name": "getSlackUsers"
}],
"refusal": null
},
"logprobs": null,
"finish_reason": null
}]
}
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"created": 1750221956,
"model": "gpt-4o-2024-08-06",
"service_tier": "default",
"system_fingerprint": "fp_xxxxxxxxxxxx",
"choices": [{
"index": 0,
"delta": {
"tool_calls": [{
"index": 0,
"function": {
"arguments": "{}"
}
}]
},
"logprobs": null,
"finish_reason": null
}]
}
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"created": 1750221956,
"model": "gpt-4o-2024-08-06",
"service_tier": "default",
"system_fingerprint": "fp_xxxxxxxxxxxx",
"choices": [{
"index": 0,
"delta": {},
"logprobs": null,
"finish_reason": "tool_calls"
}]
}
Results from tool execution:
{
"id": "tool-call-result-call_xxxxxxxxxxxxxxxxxxxx",
"object": "chat.completion.chunk",
"choices": [{
"index": 0,
"delta": {
"tool_call_id": "call_xxxxxxxxxxxxxxxxxxxx",
"role": "tool",
"content": "{\"content\":[{\"type\":\"text\",\"text\":\"[{\\\"id\\\":\\\"UXXXXXXX\\\",\\\"is_bot\\\":false,\\\"real_name\\\":\\\"User Name\\\",\\\"email\\\":\\\"user@example.com\\\"},{\\\"id\\\":\\\"UXXXXXXX\\\",\\\"is_bot\\\":false,\\\"real_name\\\":\\\"Another User\\\",\\\"email\\\":\\\"another@example.com\\\"}]\"}]}",
"mcp_server_integration_id": "xxxxxxxxxxxxxxxxxxxxx",
"tool_name": "getSlackUsers"
}
}]
}
4. Error Events
When errors occur:
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "This request would exceed the rate limit for your organization. Please reduce the prompt length or try again later."
}
}
Processing Streaming Responses
OpenAI Client Example
You can use the OpenAI client library with a custom base URL to handle the streaming response:
import httpx
import json
from openai import OpenAI
def update_base_url(request: httpx.Request) -> None:
request.url = httpx.URL(str(request.url).replace("chat/completions", "agent/responses"))
client = OpenAI(
base_url="https://your-gateway-url",
api_key="your-tfy-api-token",
http_client=httpx.Client(
event_hooks={"request": [update_base_url]}
)
)
def handle_tool_calls(delta):
"""Handle tool calls from assistant."""
global current_tool_calls
tool_calls = delta.tool_calls
for tool_call in tool_calls:
index = tool_call.index
# Initialize tool call if it's new
if index not in current_tool_calls:
current_tool_calls[index] = {
'id': '',
'function': {'name': '', 'arguments': ''},
'integration_id': '',
'tool_name': '',
'name_printed': False
}
# Update tool call with new data
if hasattr(tool_call, 'id') and tool_call.id:
current_tool_calls[index]['id'] = tool_call.id
if hasattr(tool_call, 'mcp_server_integration_id'):
current_tool_calls[index]['integration_id'] = tool_call.mcp_server_integration_id
if hasattr(tool_call, 'tool_name'):
current_tool_calls[index]['tool_name'] = tool_call.tool_name
if hasattr(tool_call, 'function') and tool_call.function:
function_data = tool_call.function
if hasattr(function_data, 'name') and function_data.name and not current_tool_calls[index]['name_printed']:
current_tool_calls[index]['function']['name'] = function_data.name
current_tool_calls[index]['name_printed'] = True
# Print tool call header
tool_call_id = current_tool_calls[index]['id'] or 'pending'
integration_id = current_tool_calls[index]['integration_id']
tool_name = current_tool_calls[index]['tool_name']
integration_info = f" (Integration: {integration_id}), Tool Name: {tool_name}" if integration_id else ""
print(f"\n[Tool Call: {tool_call_id}:{function_data.name}{integration_info}]")
print("Args: ", end='', flush=True)
if hasattr(function_data, 'arguments') and function_data.arguments:
current_tool_calls[index]['function']['arguments'] += function_data.arguments
print(function_data.arguments, end='', flush=True)
def handle_tool_result(delta):
"""Handle tool result messages."""
integration_id = getattr(delta, 'mcp_server_integration_id', '')
tool_name = getattr(delta, 'tool_name', '')
tool_call_id = getattr(delta, 'tool_call_id', '')
integration_info = f" (Integration: {integration_id}), Tool Name: {tool_name}" if integration_id else ""
print(f"\n[Tool Result: {tool_call_id}:{tool_name}{integration_info}]: ", end='', flush=True)
content = getattr(delta, 'content', '')
if content:
try:
# Try to parse the JSON to extract the actual result
result_json = json.loads(content)
if 'content' in result_json:
for item in result_json['content']:
if item.get('type') == 'text':
text = item.get('text', '')
print(text, end='', flush=True)
else:
# Just print the content as is if we can't extract text
print(content, end='', flush=True)
except json.JSONDecodeError:
# If not valid JSON, just print as is
print(content, end='', flush=True)
print() # Add newline after tool result
# Initialize tracking variables
current_tool_calls = {}
messages = []
# Stream the response
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Your message"}],
stream=True,
extra_body={
"mcp_servers": [
{
"integration_fqn": "truefoundry:hosted-mcp-server:hosted-devtest-mcp-servers:mcp-server:slack",
"enable_all_tools": False,
"tools": [{"name": "getSlackUsers"}, {"name": "findUserByEmail"}]
}
],
"iteration_limit": 10
}
)
print("Assistant: ", end='', flush=True)
for chunk in stream:
if chunk.choices:
choice = chunk.choices[0]
delta = choice.delta
finish_reason = choice.finish_reason
# Handle tool results
if hasattr(delta, 'role') and delta.role == 'tool':
handle_tool_result(delta)
continue
# Handle tool calls
if hasattr(delta, 'tool_calls') and delta.tool_calls:
handle_tool_calls(delta)
# Handle regular content
if hasattr(delta, 'content') and delta.content:
print(delta.content, end='', flush=True)
# Handle message completion
if finish_reason:
if current_tool_calls:
print() # Add newline after tool call arguments
current_tool_calls = {} # Reset for next iteration
print() # Final newline
The streaming API follows this flow when tools are involved:
- Assistant Response Start: Initial content from the LLM (streamed)
- Tool Call Event: Function name, then arguments streamed incrementally
- Tool Execution: The gateway executes the complete tool call
- Tool Result Event: Results are streamed back
- Assistant Follow-up: The assistant processes results and continues
Stream Termination
The stream ends with either:
- A
[DONE]
message indicating completion
- An error event if something goes wrong
- Client disconnection
Tips
- Use the API Code Snippet button in the UI for a quick start.
- Always keep your API token secure.
- You can enable all tools for a server by setting
enable_all_tools
to true
, or specify only the tools you need.
- For more details on tool names and integration FQNs, check your MCP Server configuration in the AI Gateway.