Architecture

Control Plane Components
Component | Used For | Description |
---|---|---|
Dashboard | Essential | This is the UI component to view deployments, and other resources. |
Backend Service | Essential | Truefoundry comprises of multiple backend services that handle various aspects like authorization, deployment control flow, CRUD APIs, interaction with database and external services, etc. |
PostgreSQL | Essential | Database to store user information, deployment information, etc. This can be deployed on Kubernetes in dev-mode, however, we recommend using managed database like RDS in production. |
Controller | Essential | The controller is responsible for handling all connections from the multiple tfy-agent components in different compute-plane clusters. |
Queue | Essential | We use NATS as queueing and caching layer to be able to process requests and logging from the LLM Gateway. |
Image Builder | Deployment Only | We build docker images on the Kubernetes cluster using buildkit for our deployments. |
AI Gateway | AI Gateway Only | AI Gateway to unify the request and response format for all LLM providers. |
Clickhouse | AI Gateway Only | Database to store request logs and metrics of AI Gateway. |
External Cloud Components
Component | Description |
---|---|
Blob Storage | Control-plane needs access to one blob storage to store the code uploaded for building the docker image. This can be backed by AWS S3, Azure Blob Storage, GCS, etc. |
Secret Store | Secret store to store the secrets for the deployment. This can be backed by AWS SSM, Azure Vault, GCS, etc. |
Docker registry | Docker registry to store the docker images for the deployment. This can be backed by AWS ECR, Azure Container Registry, GCR, etc. |
Compute Requirements
To install the control-plane, we need a Kubernetes cluster and a managed Postgres database. Truefoundry ships as a helm chart (https://github.com/truefoundry/infra-charts/tree/main/charts/truefoundry) that has configurable options to either deploy both Deployment and AI Gateway feature or just choose the one of them according to your needs. The compute requirements change based on the set of features and the scale of the number of users and requests. Here are a few scenarios that you can choose from based on your needs.The small tier is recommended for development purposes. Here all the components are deployed on Kubernetes and in non HA mode (single replica). This is suitable if you are just testing out the different features of Truefoundry.
This setup brings up 1 replica of the services and is not highly-available. It can enable you to test the features but we do not recommend this for production mode.
Component | CPU | Memory | Storage | Min Nodes | Remarks |
---|---|---|---|---|---|
Helm-Chart (AI Deployment + AI Gateway) | 2 vCPU | 8GB | 60GB Persistent Volumes (Block Storage) On Kubernetes | 2 Pods should be spread over min 2 nodes | Cost: ~ $120 pm |
Helm-Chart (AI Deployment Only) | 1 vCPU | 4GB | 50GB Persistent Volumes (Block Storage) On Kubernetes | 2 Pods should be spread over min 2 nodes | Cost: ~ $60 pm |
Helm-Chart (AI Gateway Only) | 2 vCPU | 8GB | 60GB Persistent Volumes (Block Storage) On Kubernetes | 2 Pods should be spread over min 2 nodes | Cost: ~ $120 pm |
Postgres (Deployed on Kubernetes) | 0.5 vCPU | 0.5GB | 5GB Persistent Volumes (Block Storage) On Kubernetes | PostgreSQL version >= 13 | |
Blob Storage (S3 Compatible) | 20GB |
Deploying Control-Plane in your own environment
Following scenarios are supported in the provided terraform code. You can find the requirements for each scenario in each cloud provider section:- New network + New cluster - This is the simplest setup. The TrueFoundry terraform code takes care of spinning up and setting up everything. Make sure your cloud account is ready with the requirements as per your cloud provider page
- Existing network + New cluster - In this setup, you come with your own VPC and truefoundry terraform code takes care of creating the cluster in the same VPC. Do make sure to adhere to the existing VPC related requirements mentioned in your cloud provider page
- Existing cluster - In this setup, the TrueFoundry terraform code reuses the cluster created by you to setup all the integrations needed for the platform to work. Do make sure to adhere to the existing VPC and existing cluster related requirements mentioned in your cloud provider page