GCP

Provisioning Control Plane Infrastructure on GCP

Infrastructure requirements:

RequirementsDescriptionReason for Requirement
Kubernetes ClusterAny Kubernetes cluster will work here - we can also choose the compute-plane cluster itself to install Truefoundry helm chart. The Truefoundry helm chart will be installed here.
CloudSQL PostgresPostgres >= 13The database is used by Truefoundry control plane to store all its metadata.
GCS bucketAny GCS bucket reachable from control-plane.This is used by control-plane to store the intermediate code while building the docker image.
Egress Access for TruefoundryAuthEgress access to https://auth.truefoundry.comThis is needed to validate the users logging into Truefoundry so that licensing can be maintained.
Egress access For Docker Registry1 public.ecr.aws
2. quay.io
3. ghcr.io
4. docker.io/truefoundrycloud
5. docker.io/natsio
6. nvcr.io
7. registry.k8s.io
This is to download docker images for Truefoundry, ArgoCD, NATS, ArgoRollouts, ArgoWorkflows, Istio.
DNS with TLS/SSLOne endpoint to point to the control plane service (something like platform.example.com where example.com is your domain. There should also be a certificate with the domain so that the domains can be accessed over TLS.

The control-plane url should be reachable from the compute-plane so that compute-plane cluster can connect to the control-plane
The developers will need to access the Truefoundry UI at domain that is provided here.
User/ServiceAccount to provision the infrastructure- Cloud SQL Admin
- Security Admin
- Service Account Admin
- Service Account Token Creator
- Service Account User
- Storage Admin
These are the permissions required by the IAM user in GCP to create the entire control plane components.

GCP Infra Architecture

Create the infrastructure:

You can follow either of the approaches below to create the infrastructure:

  1. Use OCLI which uses Terraform to spin up the infrastructure
  2. Do it your yourself manually using the steps provided below:

Manually Spin up the Infrastructure:

We only recommend this process if you cannot use OCLI for some reason. Please follow the steps below to spin the up the infrastructure:

  1. Create a Kubernetes Cluster with NAP enabled.
  2. Spin up CloudSQL Postgres DB with postgres version >= 13
  3. Create an IAM serviceaccount with access to GCS bucket, artifact registry and Secrets manager (optional). For control plane you can add role roles/iam.workloadIdentityUser for below identities
          "serviceAccount:${var.project_id}.svc.id.goog[truefoundry/servicefoundry-server]",
          "serviceAccount:${var.project_id}.svc.id.goog[truefoundry/mlfoundry-server]",
    

You can then contact the TrueFoundry team to install TrueFoundry on the cluster.