Create new GKE cluster using OCLI

The following document shows how to create a new GKE cluster using OCLI

Prerequisites

  1. Install gcloud >= 2.50 and gke-gcloud-auth-plugin plugin.

  2. You must have a GCP project and a user or serviceaccount having admin privileges. You can log in using the following command.

    gcloud config set project $PROJECT_ID
    
    # gcloud ADC login
    gcloud auth application-default login
    
  3. Enable Service usage API in your project. You can use the below command or use the console for the same.

    gcloud auth login \
      && gcloud services enable serviceusage.googleapis.com --project=$PROJECT_ID
    

📘

Suggestion

We highly recommend you to please go through the GCP Infrastructure requirements carefully.

Installing OCLI

  1. Download the binary using the below command.
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_darwin_arm64" -o ocli
    
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_darwin_amd64" -o ocli
    
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_linux_arm64" -o ocli
    
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_linux_amd64" -o ocli
    
  2. Make the binary executable and move it to $PATH
    sudo chmod +x ./ocli
    sudo mv ocli /usr/local/bin
    
  3. Confirm by running the command
    ocli --version
    

Configuring input config file

  1. To create a new cluster, you would require your GCP Project ID, Region, and Network details
  2. Run the following command to fill in the inputs interactively
    ocli infra-init
    
  3. For networking, there are two possible configurations:
    1. New network (Recommended) - This creates a new Virtual network for your new cluster.
    2. Existing network - You can enter your existing Virtual network.
  4. Once all the inputs are filled, an input config file with the name of tfy-config.yaml would be generated in your current directory. We highly recommend you to once go through the generated config file and tally your inputs. You can also customize the inputs directly by editing the file. Below is the sample for the same:
aws: null
azure: null
binaries:
  terraform:
    binary_path: null
  terragrunt:
    binary_path: null
gcp:
  cluster:
    name: coolml
  network:
    existing: true
    network_name: existing-vnet
    pod_cidr: ""
    service_cidr: ""
    subnet_cidr: ""
    subnet_id: projects/projectID/regions/us-east1/subnetworks/existing-vnet
  project:
    id: projectID
  region:
    availability_zones:
      - us-east1-b
      - us-east1-c
      - us-east1-d
    name: us-east1
  tags: {}
provider: gcp
aws: null
azure: null
binaries:
  terraform:
    binary_path: null
  terragrunt:
    binary_path: null
gcp:
  cluster:
    master_cidr_block: 172.16.0.32/28
    name: CLUSTER_NAME
    pod_range_name: pods
    service_range_name: services
    version: "1.28"
  network:
    additional_ranges: []
    existing: false
    network_name: ""
    pod_cidr: 10.244.0.0/16
    service_cidr: 10.255.0.0/16
    shared_vpc:
      enabled: false
      network_name: ""
      project_id: ""
      subnet_name: ""
    subnet_cidr: 10.10.0.0/16
    subnet_id: ""
  network_tags: []
  project:
    id: PROJECT_ID
  region:
    availability_zones:
      - us-central1-a
      - us-central1-b
    name: us-central1
  tags: {}
  tfy_control_plane:
    enabled: false
provider: gcp

Create the cluster

Run the following command to create the GKE cluster.

ocli infra-create --file tfy-config.yaml

This command may take around 30-45 minutes to complete.

Download the kubeconfig file for the cluster

gcloud container clusters get-credentials CLUSTER_NAME  --zone ZONE  --project PROJECT

Connecting the cluster

  • Head over to the TrueFoundry platform and log in. If you haven't logged in, then you can sign up here.

  • Once you have logged in, navigate to Settings tag from the left panel and create the new API key. Copy the API key as it will be used in the next set of commands

  • Run the following commands to create the cluster in the portal. The control plane URL is the URL where you are logged in

    ocli compute-plane-connect -f tfy-config.yaml --api-key API_KEY --control-plane-url CONTROL_PLANE_URL
    
  • This will generate a token that has to be used with the below command.

  • Create a values.yaml file and paste the tenant, control plane URL and the token

    ## @section Global Parameters
    ## @param tenantName Parameters for tenantName
    ## Tenant Name - This is same as the name of the organization used to sign up 
    ## on Truefoundry
    ##
    tenantName: "TENANT"
    
    ## @param controlPlaneURL Parameters for controlPlaneURL
    ## URL of the control plane - Same as the URL of the Truefoundry dashboard
    ##
    controlPlaneURL: "CONTROL_PLANE_URL"
    
    ## @param clusterName Name of the cluster
    ## Name of the cluster that you have created on AWS/GCP/Azure
    ##
    clusterName: "CLUSTER_NAME"
    
    ## @section Parameters for argocd
    ## @param argocd.enabled Flag to enable ArgoCD
    ## ArgoCD is mandatory for Truefoundry to work. You can make it false if ArgoCD is
    ## already installed in your cluster. Please make sure that the configuration of 
    ## existing ArgoCD is same as the ArgoCD configuration required by Truefoundry.
    argocd:
      enabled: true
    
    ## @section Parameters for argoWorkflows
    ## @param argoWorkflows.enabled Flag to enable Argo Workflows
    ##
    argoWorkflows:
      enabled: true
    
    ## @section Parameters for argoRollouts
    ## @param argoRollouts.enabled Flag to enable Argo Rollouts
    ## Argo Rollouts is mandatory for Truefoundry to work. 
    ##
    argoRollouts:
      enabled: true
    
    ## @section Parameters for notebookController
    ## @param notebookController.enabled Flag to enable Notebook Controller
    ## Notebook Controller is required to power notebooks in Truefoundry
    ##
    notebookController:
      enabled: true
    
    ## @section Parameters for certManager
    ## @param certManager.enabled Flag to enable Cert Manager
    ##
    certManager:
      enabled: false
    
    ## @section Parameters for metricsServer
    ## @param metricsServer.enabled Flag to enable Metrics Server
    ##
    metricsServer:
      enabled: true
    
    
    
    ## @section Parameters for gpu
    ## @param gpu.enabled Flag to enable Tfy GPU Operator
    ##
    gpu:
      enabled: true
      ## @param gpu.clusterType Cluster type for Tfy GPU Operator
      ##
      clusterType: gcpGkeStandard
    
    ## @section Parameters for truefoundry
    ## @param truefoundry.enabled Flag to enable TrueFoundry
    ## This installs the Truefoundry control plane helm chart. You can make it true
    ## if you want to install Truefoundry control plane.
    ##
    truefoundry:
      enabled: false
    
      ## @param truefoundry.dev Flag to enable TrueFoundry Dev mode
      ##
      dev: true
    
    ## @section Parameters for loki
    ## @param loki.enabled Flag to enable Loki
    ##
    loki:
      enabled: true
    
    ## @section Parameters for istio
    ## @param istio.enabled Flag to enable Istio
    ##
    istio:
      enabled: true
    
    ## @section Parameters for keda
    ## @param keda.enabled Flag to enable Keda
    ##
    keda:
      enabled: true
    
    ## @section Parameters for kubecost
    ## @param kubecost.enabled Flag to enable Kubecost
    ##
    kubecost:
      enabled: true
    
    ## @section Parameters for prometheus
    ## @param prometheus.enabled Flag to enable Prometheus
    ##
    prometheus:
      enabled: true
    
    ## @section Parameters for grafana
    ## @param grafana.enabled Flag to enable Grafana
    ##
    grafana:
      enabled: true
    
    ## @section Parameters for tfyAgent
    ## @param tfyAgent.enabled Flag to enable Tfy Agent
    ##
    tfyAgent:
      enabled: true
      ## @param tfyAgent.clusterToken Parameters for clusterToken
      ## Token for cluster authentication
      ##
      clusterToken: "CLUSTER_TOKEN"
    
    
  • Execute the command

    ocli compute-plane-install -f values.yaml --cluster-type gcp-gke-standard
    

Saving the output file

Once the above command finishes, save the output using the command below:

ocli output --file tfy-config.yaml > output.txt