GCP
This page provides an overview of the architecture, requirements and steps to install the TrueFoundry compute plane cluster in GCP
The architecture of a Truefoundry compute plane is as follows:
Requirements:
The common requirements to setup compute plane in each of the scenarios is as follows:
- Billing must be enabled for the GCP account.
- Following APIs must be enabled in the project -
- Egress access to container registries -
public.ecr.aws
,quay.io
,ghcr.io
,tfy.jfrog.io
,docker.io/natsio
,nvcr.io
,registry.k8s.io
so that we can download the docker images for argocd, nats, gpu operator, argo rollouts, argo workflows, istio, keda, etc. - We need a domain to map to the service endpoints and certificate to encrypt the traffic. A wildcard domain like *.services.example.com is preferred. Truefoundry can do path based routing like
services.example.com/tfy/*
, however, many frontend applications do not support this. For certificate, check this document for more details. - Enough quotas for CPU/GPU instances must be present depending on your usecase. You can check and increase quotas at GCP compute quotas
- Service account key creation should be allowed for the service account used by the platform.
- The new VPC subnet should have a CIDR range of /24 or larger. Secondary ranges for pods (min /20) and services (min /24) are required. Secondary range can be from a non-routable range.This is to ensure capacity for ~250 instances and 4096 pods.
- User/serviceaccount to provision the infrastructure.
- The new VPC subnet should have a CIDR range of /24 or larger. Secondary ranges for pods (min /20) and services (min /24) are required. Secondary range can be from a non-routable range.This is to ensure capacity for ~250 instances and 4096 pods.
- User/serviceaccount to provision the infrastructure.
- The existing VPC subnet should have a CIDR range of /24 or larger. Secondary ranges for pods (min /20) and services (min /24) are required. Secondary range can be from a non-routable range. This is to ensure capacity for ~250 instances and 4096 pods. Secondary ranges for pods should be named as
pods
and secondary ranges for services should be named asservices
. - The VPC should have Cloud router and cloud NAT for private subnets. Port 80 and 443 should be open for the load balancer. Allow all traffic between the subnets. Port 443, 6443, 8443, 9443 and 15017 should be allowed from the GKE control plane.
- User/serviceaccount to provision the infrastructure.
- GKE Version should be 1.30 or later.
- NAP should be enabled for the cluster. Ensure min and max for GPUS are also set.
- Workload identity should be enabled for the cluster with workload pool
project_id.svc.id.goog
.
Setting up compute plane
TrueFoundry compute plane infrastructure is provisioned using terraform. You can download the terraform code for your exact account by filling up your account details and downloading a script that can be executed on your local machine.
Choose to create a new cluster or attach an existing cluster
Go to the platform section in the left panel and click on Clusters
. You can click on Create New Cluster
or Attach Existing Cluster
depending on your use case. Read the requirements and if everything is satisfied, click on Continue
.
Fill up the form to generate the terraform code
A form will be presented with the details for the new cluster to be created. Fill in with your cluster details. Click Submit
when done
The key fields to fill up here are:
Region
- The region and availability zones where you want to create the cluster.Project ID
- The project ID where you want to create the cluster.Cluster Name
- A name for your cluster.Cluster Version
andMaster node IPv4 block
- The version of the cluster and the IPv4 block for the master nodes.Network Configuration
- Choose betweenNew network
orExisting network
depending on your use case.GCS Bucket for Terraform State
- Terraform state will be stored in this bucket. It can be a preexisting bucket or a new bucket name. The new bucket will automatically be created by our script.Platform Features
- This is to decide which features like BlobStorage, ClusterIntegration, Container Registry and Secrets Manager will be enabled for your cluster. To read more on how these integrations are used in the platform, please refer to the platform features page.
The key fields to fill up here are:
Region
- The region and availability zones where you want to create the cluster.Project ID
- The project ID where you want to create the cluster.Cluster Name
- A name for your cluster.Cluster Version
andMaster node IPv4 block
- The version of the cluster and the IPv4 block for the master nodes.Network Configuration
- Choose betweenNew network
orExisting network
depending on your use case.GCS Bucket for Terraform State
- Terraform state will be stored in this bucket. It can be a preexisting bucket or a new bucket name. The new bucket will automatically be created by our script.Platform Features
- This is to decide which features like BlobStorage, ClusterIntegration, Container Registry and Secrets Manager will be enabled for your cluster. To read more on how these integrations are used in the platform, please refer to the platform features page.
The key fields to fill up here are:
Region
- The region and availability zones where you want to create the cluster.Project ID
- The project ID where you want to create the cluster.Cluster Name
- A name for your cluster.Cluster Addons
- Truefoundry needs to install addons like ArgoCD, ArgoWorkflows, Keda, Istio, etc. Please disable the addons that are already installed on your cluster so that truefoundry installation does not overrride the existing configuration and affect your existing workloads.GCS Bucket for Terraform State
- Terraform state will be stored in this bucket. It can be a preexisting bucket or a new bucket name. The new bucket will automatically be created by our script.Platform Features
- This is to decide which features like BlobStorage, ClusterIntegration, ParameterStore, DockerRegistry and SecretsManager will be enabled for your cluster. To read more on how these integrations are used in the platform, please refer to the platform features page.
Copy the curl command and execute it on your local machine
You will be presented with a curl
command to download and execute the script. The script will take care of installing the pre-requisites, downloading terraform code and running it on your local machine to create the cluster. This will take around 40-50 minutes to complete.
Verify the cluster is showing as connected in the platform
Once the script is executed, the cluster will be shown as connected in the platform.
Create DNS Record
We can get the load-balancer’s IP address by going to the platform section in the bottom left panel under the Clusters section. Under the preferred cluster, you’ll see the load balancer IP address under the Base Domain URL
section.
Create a DNS record in your route 53 or your DNS provider with the following details
Record Type | Record Name | Record value |
---|---|---|
CNAME | *.tfy.example.com | LOADBALANCER_IP_ADDRESS |
Setup routing and TLS for deploying workloads to your cluster
Follow the instructions here to setup DNS and TLS for deploying workloads to your cluster.
Start deploying workloads to your cluster
You can start by going here
Permission required to create the infrastructure
For the compute plane, we will be using the following permissions: — coming soon
Setting up TLS in GCP
There are two ways primarily through we can add TLS to the load balancer in GCP
- Using cert-manager + GCP cloud DNS (recommended) - Through this certs get renewed automatically
- Using Certificate and key files - Through this pre-created certs are added to istio