Creating an AKS cluster using onboarding-cli
The Onboarding CLI is a powerful command-line tool designed to streamline the process of deploying Azure Kubernetes Service (AKS) clusters along with their essential requirements. Developed to simplify the setup of Kubernetes clusters, this CLI automates the entire deployment process, minimising manual intervention and enabling users to focus on their core tasks. By asking a few crucial inputs from the user, the CLI swiftly configures the necessary infrastructure, easing the burden of cluster creation and management.
Pre-requisites
-
Download azure cli >= 2.50
-
Download git
-
You must have a subscription and a user in Azure to create resources. This user should have
- Contributor Role in the Subscription
- RBAC admin role in the Subscription
-
Login to azure and set the subscription
# login az login # setting the subscription az account set --subscription $SUBSCRIPTION_ID
-
Ensuring Azure Infrastructure requirements are read carefully.
Download the CLI
- Download the binary using the below command.
- For Apple Silicon MacOS
curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_darwin_arm64 -o ocli
- For Intel MacOS
curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_darwin_amd64 -o ocli
- For Linux (arm)
curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_linux_arm64 -o ocli
- For Linux (amd)
curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/binaries/ocli_linux_amd64 -o ocli
- For Apple Silicon MacOS
- Make the binary executable and move it to
$PATH
sudo chmod +x ./ocli sudo mv ocli /usr/local/bin
- Confirm by running the command
ocli
Update to latest version
Always make sure to update
ocli
to the latest version.
Installation
Creating a config file
-
In this document we will check what are the options available for configuring Azure AKS cluster.
-
There are two ways to go about it
- Existing Network - This is the case when you have an already existing network setup for your existing Azure environment. The onboarding CLI can use existing Virtual Network ID and Subnet ID.
- New Network - If you don't have any existing network or want to deploy the Truefoundry AKS cluster inside a new Virtual network then you can select this option. In this option you will be prompted for the Network CIDR which is the CIDR range of the Network you want. It will also prompt you for the subnet CIDR range. If you are not sure
10.10.0.0/16
will be taken as default. You will also be asked for private and public CIDRS.
-
Run the below command
ocli infra init
-
Screen will be cleared and you will be asked for cloud provider choice. Select
azure
here and for the next question, select the subscription where you want to deploy all your resources.Truefoundry is a platform that makes it very easy to deploy microservices, ML models training jobs, LLMs on Kubernetes. We will start the process of bootstrapping a Kubernetes cluster. This CLI is useful only if you don't have a Kubernetes cluster. If you already have a cluster, please go to https://docs.truefoundry.com/docs/creating-your-own-kubernetes-cluster Let's get started! 1. Cloud Provider In which cloud provider you would like to deploy your cluster: : aws > azure gcp 2. Subscription details Which Azure Subscription you want to use ?: > Microsoft Azure Sponsorship: xxxxx-xxxxx-xxxxx-xxxxxxxxx subscription-name: xxxxx-xxxxx-xxxxx-xxxxxxxxx
failed to acquire a token
The above error indicates that azure-cli cli is not present in your local machine or you are not authenticated usingazure-cli . Make sure you have downloaded the azure-cli and you are authenticated using
az login
.
-
Select the location from the drop down. Give a name to your cluster. This name will act as a substring to your actual cluster name.
3. Location In which location you want to deploy your cluster: > Australia Central Australia Central 2 4. Cluster name What should be the name of your cluster:
-
Select whether you want to deploy the cluster in an existing resource group or a new resource group.
5. Resource group Do you want to deploy the cluster in an existing resource group or a new resource group: : > existing new
Existing Resource group
For existing resource group
- Select the resource group where you want to deploy your cluster
5(A). In which resource group you want to deploy your cluster: > resourceGroup1 resourceGroup2
- For an existing resource group you can select an existing network or create a new network
6. Vnet details Do you want to deploy the cluster in an existing vnet or a new vnet: : > existing new
Existing Network
- Select the network from the drop down
6(A). Virtual network In which network you want to deploy your cluster: : > vnet1 vnet2
- Select the subnet from the drop down
6(B). Subnet details In which network you want to deploy your cluster: : > vnet1-default-subnet frontend-subnet backend-subnet
This will create a config file with a name config.yaml
which will contain existing network and subnet ID
aws: null
azure:
cluster:
name: clustername
location: West Europe
network:
existing: true
subnet_cidr: ""
subnet_id: /subscriptions/xxxxx-xxxxx-xxxxx-xxxxxxxxx/resourceGroups/resourceGroup1/providers/Microsoft.Network/virtualNetworks/vnet1/subnets/vnet1-default-subnet
vnet_cidr: ""
vnet_id: /subscriptions/xxxxx-xxxxx-xxxxx-xxxxxxxxx/resourceGroups/resourceGroup1/providers/Microsoft.Network/virtualNetworks/vnet1
vnet_name: vnet1
resource_group:
existing: true
name: resourceGroup1
state:
container_name: ""
resource_group: ""
storage_account_name: ""
subscription:
id: xxxxx-xxxxx-xxxxx-xxxxxxxxx
name: subscription-name
binaries:
terraform:
binary_path: null
terragrunt:
binary_path: null
gcp: null
provider: azure
New Network (recommended)
- In case of a new network supply a CIDR range ( Default:
10.0.0.0/8
)6(A). Virtual network CIDR What is your expected vnet CIDR (default: 10.0.0.0/8):
- For subnet (Default:
10.0.0.0/16
)6(B). Subnet CIDR What is your expected subnet CIDR (default: 10.10.0.0/16):
This will create a YAML file with the name of config.yaml
containing the inputs
aws: null
azure:
cluster:
name: clustername
location: West Europe
network:
existing: false
subnet_cidr: 10.10.0.0/16
subnet_id: ""
vnet_cidr: 10.0.0.0/8
vnet_id: ""
vnet_name: ""
resource_group:
existing: true
name: resourceGroup1
state:
container_name: ""
resource_group: ""
storage_account_name: ""
subscription:
id: xxxxx-xxxxx-xxxxx-xxxxxxxxx
name: subscription-name
binaries:
terraform:
binary_path: null
terragrunt:
binary_path: null
gcp: null
provider: azure
New Resource Group (recommended)
If you want to deploy everything in a new separate resource group.
- Enter the name of the resource group you want to create.
- For a new resource group, it is obvious to have a new network. For this you can supply a CIDR range ( Default:
10.0.0.0/8
)6(A). Virtual network CIDR What is your expected vnet CIDR (default: 10.0.0.0/8):
- For subnet (Default:
10.0.0.0/16
)6(B). Subnet CIDR What is your expected subnet CIDR (default: 10.10.0.0/16):
This will create a YAML file named config.yaml
with the given inputs
aws: null
azure:
cluster:
name: clustername
location: West Europe
network:
existing: false
subnet_cidr: 10.10.0.0/16
subnet_id: ""
vnet_cidr: 10.0.0.0/8
vnet_id: ""
vnet_name: ""
resource_group:
existing: false
name: newrg
state:
container_name: ""
resource_group: ""
storage_account_name: ""
subscription:
id: xxxxx-xxxxx-xxxxx-xxxxxxxxx
name: subscription-name
binaries:
terraform:
binary_path: null
terragrunt:
binary_path: null
gcp: null
provider: azure
Running the config file
Once the config file is created, you can run it by the following command
ocli infra create --file config.yaml
Resource group could not be found.
This error can come up if you haven't set the subscription ID from azure-cli. Make sure to run
az account set --subscription $SUBSCRIPTION_ID
The specified service CIDR 10.0.0.0/16 is conflicted with an existing subnet CIDR
This can happen when you are using existing network
Error in creating Agent pool
This error can come up because of low quotas for spot or regional instances. Below is the error format
╷ │ Error: creating Agent Pool (Subscription: "xxxxx-xxx-x-xxxxxxx" │ Resource Group Name: "REDACTED" │ Managed Cluster Name: "REDACTED" │ Agent Pool Name: "spotpoold307"): agentpools.AgentPoolsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="PreconditionFailed" Message="Provisioning of resource(s) for Agent Pool spotpoold307 failed. Error: {\n \"code\": \"InvalidTemplateDeployment\",\n \"message\": \"The template deployment '5b1d1959-dca5-425a-a479-963bed76ae3b' is not valid according to the validation procedure. The tracking id is 'd33a7636-891f-4e51-b475-dd85f3e95156'. See inner errors for details.\",\n \"details\": [\n {\n \"code\": \"QuotaExceeded\",\n \"message\": \"Operation could not be completed as it results in exceeding approved LowPriorityCores quota. Additional details - Deployment Model: Resource Manager, Location: LOCATION, Current Limit: 3, Current Usage: 0, Additional Required: 4, (Minimum) New Limit Required: 4. Submit a request for Quota increase at https://aka.ms/ProdportalCRP/#blade/Microsoft_Azure_Capacity/UsageAndQuota.ReactView/Parameters/%7B%22subscriptionId%22:%22c03bdc39-be28-4bb7-8953-1339b663e8d0%22,%22command%22:%22openQuotaApprovalBlade%22,%22quotas%22:[%7B%22location%22:%22westus%22,%22providerId%22:%22Microsoft.Compute%22,%22resourceName%22:%22lowPriorityCores%22,%22quotaRequest%22:%7B%22properties%22:%7B%22limit%22:4,%22unit%22:%22Count%22,%22name%22:%7B%22value%22:%22lowPriorityCores%22%7D%7D%7D%7D]%7D by specifying parameters listed in the ‘Details’ section for deployment to succeed. Please read more about quota limits at https://docs.microsoft.com/en-us/azure/azure-portal/supportability/low-priority-quota\"\n }\n ]\n }" │ │ with module.aks.azurerm_kubernetes_cluster_node_pool.node_pool["spot"], │ on .terraform/modules/aks/main.tf line 523, in resource "azurerm_kubernetes_cluster_node_pool" "node_pool": │ 523: resource "azurerm_kubernetes_cluster_node_pool" "node_pool" { │ ╵
Post cluster configurations
Saving the output
The above process generates some output which are helpful for deployment of some applications. For this save the output in some file
ocli infra output --file config.yaml > output.txt
Downloading the kubeconfig file
- Once the cluster gets created we need to attach this cluster to the TrueFoundry platform.
- Export the important variables
export RESOURCE_GROUP="" export CLUSTER_NAME=""
- Run the below command to get its
kubeconfig
file on your local
az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
Connecting the cluster to the platform
Follow the Connecting the cluster guide so as to connect the cluster to TrueFoundry's platform. Once this is done there are few applications that are to be installed in the cluster, for which the output.txt
needs to be given to the Truefoundry's team.
Updated 9 months ago