Truefoundry control-plane is a helm chart that can be installed on any Kubernetes cluster. The control-plane has the following architecture:

Architecture

Control Plane Components

ComponentUsed ForDescription
DashboardEssentialThis is the UI component to view deployments, and other resources.
Backend ServiceEssentialTruefoundry comprises of multiple backend services that handle various aspects like authorization, deployment control flow, CRUD APIs, interaction with database and external services, etc.
PostgreSQLEssentialDatabase to store user information, deployment information, etc. This can be deployed on Kubernetes in dev-mode, however, we recommend using managed database like RDS in production.
ControllerEssentialThe controller is responsible for handling all connections from the multiple tfy-agent components in different compute-plane clusters.
QueueEssentialWe use NATS as queueing and caching layer to be able to process requests and logging from the LLM Gateway.
Image BuilderDeployment OnlyWe build docker images on the Kubernetes cluster using buildkit for our deployments.
AI GatewayAI Gateway OnlyAI Gateway to unify the request and response format for all LLM providers.
ClickhouseAI Gateway OnlyDatabase to store request logs and metrics of AI Gateway.

External Cloud Components

ComponentDescription
Blob StorageControl-plane needs access to one blob storage to store the code uploaded for building the docker image. This can be backed by AWS S3, Azure Blob Storage, GCS, etc.
Secret StoreSecret store to store the secrets for the deployment. This can be backed by AWS SSM, Azure Vault, GCS, etc.
Docker registryDocker registry to store the docker images for the deployment. This can be backed by AWS ECR, Azure Container Registry, GCR, etc.

Compute Requirements

To install the control-plane, we need a Kubernetes cluster and a managed Postgres database. Truefoundry ships as a helm chart (https://github.com/truefoundry/infra-charts/tree/main/charts/truefoundry) that has configurable options to either deploy both Deployment and AI Gateway feature or just choose the one of them according to your needs. The compute requirements change based on the set of features and the scale of the number of users and requests.

Here are a few scenarios that you can choose from based on your needs.

The small tier is recommended for development purposes. Here all the components are deployed on Kubernetes and in non HA mode (single replica). This is suitable if you are just testing out the different features of Truefoundry.

This setup brings up 1 replica of the services and is not highly-available. It can enable you to test the features but we do not recommend this for production mode.

ComponentCPUMemoryStorageMin NodesRemarks
Helm-Chart
(AI Deployment + AI Gateway)
2 vCPU8GB60GB
Persistent Volumes (Block Storage) On Kubernetes
2
Pods should be spread over min 2 nodes
Cost: ~ $120 pm
Helm-Chart
(AI Deployment Only)
1 vCPU4GB50GB
Persistent Volumes (Block Storage) On Kubernetes
2
Pods should be spread over min 2 nodes
Cost: ~ $60 pm
Helm-Chart
(AI Gateway Only)
2 vCPU8GB60GB
Persistent Volumes (Block Storage) On Kubernetes
2
Pods should be spread over min 2 nodes
Cost: ~ $120 pm
Postgres
(Deployed on Kubernetes)
0.5 vCPU0.5GB5GB
Persistent Volumes (Block Storage) On Kubernetes
PostgreSQL version >= 13
Blob Storage
(S3 Compatible)
20GB

Deploying Control-Plane in your own environment

Following scenarios are supported in the provided terraform code. You can find the requirements for each scenario in each cloud provider section:

  • New network + New cluster - This is the simplest setup. The TrueFoundry terraform code takes care of spinning up and setting up everything. Make sure your cloud account is ready with the requirements as per your cloud provider page
  • Existing network + New cluster - In this setup, you come with your own VPC and truefoundry terraform code takes care of creating the cluster in the same VPC. Do make sure to adhere to the existing VPC related requirements mentioned in your cloud provider page
  • Existing cluster - In this setup, the TrueFoundry terraform code reuses the cluster created by you to setup all the integrations needed for the platform to work. Do make sure to adhere to the existing VPC and existing cluster related requirements mentioned in your cloud provider page

Deploying in Production Mode

To deploy in production mode, we will first create the appropriate infrastructure components before moving on to actual implementation. The guides for individual cloud providers wrt infrastructure related requirements and steps to create them are available here:

Provisioning Control Plane Infrastructure on AWS

Provisioning Control Plane Infrastructure on GCP

Provisioning Control Plane Infrastructure on Azure

Once the infra components are setup, we can go ahead and install the control plane using the helm chart - Installing Control Plane using Helm Chart