Creating a GKE cluster using onboarding-cli

The Onboarding CLI is a powerful command-line tool designed to streamline the process of deploying GKE clusters along with their essential requirements. Developed to simplify the setup of Kubernetes clusters, this CLI automates the entire deployment process, minimising manual intervention and enabling users to focus on their core tasks. By asking a few crucial inputs from the user, the CLI swiftly configures the necessary infrastructure, easing the burden of cluster creation and management.

Pre-requisites

  1. Download gcloud >= 2.50

  2. You must have a GCP project and your user or serviceaccount should have full access to it.

  3. Set up application default credentials with gcloud so that CLI can authenticate

    gcloud config set project $PROJECT_ID
    
    # gcloud ADC login
    gcloud auth application-default login
    
  4. Ensuring GCP Infrastructure requirements are read carefully.

Download the CLI

  1. Download the binary using the below command.
    1. For Apple Silicon MacOS
      curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_darwin_arm64 -o ocli
      
    2. For Intel MacOS
      curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_darwin_amd64 -o ocli
      
    3. For Linux (arm)
      curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_linux_arm64 -o ocli
      
    4. For Linux (amd)
      curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_linux_amd64 -o ocli
      
  2. Make the binary executable and move it to $PATH
    sudo chmod +x ./ocli
    sudo mv ocli /usr/local/bin
    
  3. Confirm by running the command
    ocli
    

🚧

Update to latest version

Always make sure to update ocli to the latest version.

Creating a config file

In the section we will check how to create a config file. A config file is a YAML file for giving inputs to the CLI related to the GKE cluster.

  1. A GKE cluster can be created in two ways

    1. Existing Network - If you already have an existing network where you want to deploy the cluster, the CLI can leverage that and create the required components inside the subnetwork. Read existing network requirements to know more on this
    2. New Network (Recommended) - If you don't have any existing subnetwork or want to deploy the cluster in a new subnetwork, the CLI gives you an option to input your required subnet range, a pod subnet additional range and a service additional range. Read new network requirements to know more on this.
  2. Run the below command

    ocli infra init
    
  3. Screen will be cleared and you will be asked for cloud provider choice. Select gcp and proceed for giving input for your organisation ID

    Truefoundry is a platform that makes it very easy to deploy microservices, ML models training jobs, LLMs on Kubernetes. We will start the process of bootstrapping a Kubernetes cluster. This CLI is useful only if you don't have a Kubernetes cluster. If you already have a cluster, please go to https://docs.truefoundry.com/docs/creating-your-own-kubernetes-cluster
    Let's get started!
    
    1. Cloud Provider
    In which cloud provider you would like to deploy your cluster: :
       aws
       azure
    >  gcp
    gcp
    
    2. Project Details
    2(A) What is your GCP project ID:  project-id-1234
    

❗️

rpc error: code = PermissionDenied desc = The caller does not have permission

This error indicates that you don't have permission in your project or the project ID is incorrect. To go through the CLI you should have max permissions in the project.

❗️

listGCPProjects: Error creating projects client for listing projects: google: could not find default credentials

This indicates that you have not set up application default login with gcloud. To achieve that run the following command from the Prerequisites

gcloud auth application-default login
  1. Enter the name of the cluster. You are not required to enter prefixes like tfy as this will get added automtically. So if you chose you cluster name as example and region as us-central1, then all the resources will be created with the prefix tfy-example-usce1
    What is the cluster name that you want for your cluster (final name of your cluster will tfy-<NAME>-<SHORT_REGION>):
    
  1. Select the region where you want to deploy your cluster. It is important to note that you must have enough quotas in your region to run workloads. You can again use up and down arrow keys and / for searching through the list of regions.
    4. Location Details
    4(A). Regions
    Use the arrow keys to navigate: ↓ ↑ → ←  and / toggles search
    In which region you want to deploy your cluster: ?
      asia-east1
      asia-east2
      asia-northeast1
      asia-northeast2
    ↓ asia-northeast3
    
    1. Enter the no of availability zones where you want to deploy your cluster. Default value is 3 , min is 2 and max is the no of availability zones present in that region. After this go ahead and select the availability zones accordingly.
      4(B). Availability Zones
      Enter the number of availability zones (Default 3: 2 <= range <=4):  3
      Use the arrow keys to navigate: ↓ ↑ → ←  and / toggles search
      Select the availabiltity zone 1: 
        us-central1-a
        us-central1-b
        us-central1-c
        us-central1-f
      

Existing Network

  1. Select existing when you want to deploy your cluster in a an existing network.
    5. Network and VPC
    Use the arrow keys to navigate: ↓ ↑ → ← 
    Do you want to create a new network or reuse an existing network: 
      new
      existing
    
  2. You can select your existing VPC from the drop down list.
    5(A). Network Name
    Use the arrow keys to navigate: ↓ ↑ → ←  and / toggles search
    Select the network where you want to deploy your cluster: 
      default
      example-vpc
    
  3. You can select your subnet from the drop down list. This subnet must have a pods and a services additional range. Read Existing network to know more on this.
    5(B). Private Subnetwork ID
    Use the arrow keys to navigate: ↓ ↑ → ←  and / toggles search
    Select the subnetwork name where you want to deploy your cluster: 
      projects/example-project/regions/us-central1/subnetworks/example-subnet-1
      projects/example-project/regions/us-central1/subnetworks/example-subnet-2
      projects/example-project/regions/us-central1/subnetworks/example-subnet-3
      projects/example-project/regions/us-central1/subnetworks/example-subnet-4
    
  4. Generated config file will look something like this.
    aws: null
    azure: null
    binaries:
        terraform:
            binary_path: null
        terragrunt:
            binary_path: null
    gcp:
        cluster:
            name: gordon
        network:
            existing: true
            network_name: example-vpc
            pod_cidr: ""
            service_cidr: ""
            subnet_cidr: ""
            subnet_id: projects/example-project-1234/regions/us-central1/subnetworks/example-subnet-2
        project:
            id: example-project-1234
        region:
            availability_zones:
                - us-central1-a
                - us-central1-b
                - us-central1-c
            name: us-central1
        tags: {}
    provider: gcp
    

New Network

If you want to use a new network to deploy the cluster

  1. Specify the subnet range - Default value is 10.10.0.0/16
    5(A). Subnet CIDR
    What should be the CIDR for your new subnet (Default: 10.10.0.0/16. Chose a range between /8 and /24):	
    
  2. Specify the pod range - Default values is 10.244.0.0/16
    5(B). Pod CIDR
    What should be the CIDR for your pod (Default: 10.244.0.0/16. Chose a range between /8 and /24):
    
  3. Specify the services range - Default values is 10.255.0.0/16
    5(C). Service CIDR
    What should be the CIDR for your service (Default: 10.255.0.0/16. Chose a range between /8 and /24):
    
  • Generated config file
    aws: null
    azure: null
    binaries:
        terraform:
            binary_path: null
        terragrunt:
            binary_path: null
    gcp:
        cluster:
            name: example-cluster
        network:
            existing: false
            network_name: ""
            pod_cidr: 10.244.0.0/16
            service_cidr: 10.255.0.0/16
            subnet_cidr: 10.10.0.0/16
            subnet_id: ""
        project:
            id: example-project-1234
        region:
            availability_zones:
                - asia-east1-a
                - asia-east1-b
                - asia-east1-c
            name: asia-east1
        tags: {}
    provider: gcp
    

Running the config file

Once the config file is created, you can run it by the following command

ocli infra create --file tfy-config.yaml

❗️

Create GCS bucket example-us-central1-tfy-ocli-bucket' unsuccessful after 3 retries

This is an intermittent error, you just need to run the command again.

Post cluster-creation steps

Saving the output

The above process generates some output which are helpful for deployment of some applications. For this save the output in some file

ocli infra output --file tfy-config.yaml > output.txt

Downloading the kubeconfig file

  1. The CLI downloads kubectl if it is not present by default. However to connect to the GKE you have to install gke-gcloud-auth-plugin
  2. Export the following variable
    export CLUSTER_NAME=""
    export REGION=""
    
  3. Download the kubeconfig file
    gcloud container clusters get-credentials $CLUSTER_NAME  --zone $REGION
    

Connecting the cluster to the platform

Follow the Connecting the cluster guide so as to connect the cluster to TrueFoundry's platform.