Azure Infra requirements

For clusters that are to be onboarded there are certain requirements which must be fulfilled. These requirements vary from network, CPU, GPUs and access. These requirements are not specific to type of deployment, be it terraform/terragrunt or az.

๐Ÿ“˜

Control plane requirements

Below requirements exist for both workload and control plane clusters. However, the requirements which assumes an existing AKS cluster are not valid for Control plane. Control plane clusters require setup from Truefoundry team.

Common requirements

  • If cluster is to be setup at client's end using terraform/ terragrunt then following things must be installed
  • If control plane installation is there
  • If terragrunt is used to spin up the infrastructure the code needs a storage account name and a container so as to store the terraform state.

Network requirements

  • For existing network enough IPs must be free
    • Your subnet can't fall in range of IP address of 10.244.0.0/16 and 10.255.0.0/16 as it is used by the cluster we deploy.
  • For new network
    • CIDR range - /24 (min) and /16 (recommended)
  • Security groups
    • Allow node to node connectivity
    • Allow Egress traffic from nodes
    • Ingress traffic at port 80, 443
  • For setting up DNS so that endpoints get exposed.
    • If Istio is already deployed then make sure the host field is set in Istio Gateway to the endpoint which you want your workloads to expose (publicly). This endpoint must then be passed in the workload cluster from the UI.
    • If Istio is not deployed then a load balancer address will come up when Istio gets installed in the cluster during onboarding. The value of the loadbalancer's IP address must be mapped as a A record to the endpoint where your workload will be hosted (publicly)
    • TLS/SSL termination can happen in three ways
      • Using cert-manager - cert-manager can be installed in the cluster which can then talk to your DNS provider to create DNS challenges in order to create secrets in the cluster which can be then be used by Isito Gateway
      • Certificate and key-pair file - Raw certificate file can also be used. For this a secret must be created which contains the TLS certificate. Refer here for more details on this. Secret then can be passed in Istio Gateway configuration.

Compute requirements

Compute requirements refers to the amount of compute (CPU/GPU/memory) that is available for use in your region. In Azure compute requirements refers to setting up of node pool according to your needs.

  • Minimum of 2 node pools must be created to ensure smooth functioning of cluster
    • Critical Node pool - This node pool should atleast contain 2 nodes of min 2vCPU and 4 GB RAM
      • This node pool should contain a taint CriticalAddonsOnly=true:NoSchedule taint on it
      • This node pool is used to deploy the agent and the critical components which powers important parts of the platform. These components are necessary for smooth functioning of the platform. These components are argocd, argo-rollouts, tfy-agent and istio.
      • Autoscaling is not required for this node pool but can be set to min 2 to max 3 nodes.
    • Spot Node pool - This node pool should atleast container 1 node of min 4vCPU and 8 GB RAM.
      • This must be a spot node pool which is used to deploy remaining components of the platform.
      • These components are heavy in compute and can handle interruptions so are deployed on spot
      • It is recommended to enable cluster autoscaler for this node pool.
      • A GPU node pool must be attached if there are requirement to use GPUs in the platform. The GPU node pool must have the below taint attached to it.
        key: nvidia.com/gpu
        value: Present
        effect: NoSchedule
        

Authentication

To create a kubernetes cluster and all the required resources. you must have the following criteria accomplished

  • You must have a valid azure subscription
  • You must have a user with the below permissions
    • Contributor Role to the above Subscription
    • Role Based Access Administrator to the above subscription

Authentication is required for various applications and users to get connected to the cluster. Below Identity management should be enabled in the cluster by default.

  • User assigned identity - User Assigned Identity is an Azure Active Directory (AAD) object that can be created independently and assigned to Azure resources. It allows you to associate an identity with your AKS cluster. With user assigned identities, you can manage and control access to Azure resources, such as Azure Key Vault, without storing any credentials or secrets within your application's code.
  • Managed identity - Managed Identity is a feature of Azure that automatically creates and manages an identity for an Azure resource. In the context of AKS, when you enable managed identity for your AKS cluster, Azure creates an identity for your cluster in the Azure AD tenant. This identity can be used to authenticate and authorize requests made by the AKS cluster to other Azure resources.
  • Workload identity - Workload Identity is a feature in AKS that allows you to assign Azure AD identities to pods running within the cluster. With workload identity, you can authenticate and authorize requests made by your applications running in AKS without the need for additional credentials or tokens. It enables seamless integration with other Azure services that rely on Azure AD authentication, such as Azure Key Vault or Azure Storage.
  • OIDC issuer - OIDC (OpenID Connect) is an identity layer built on top of OAuth 2.0, which allows clients to verify the identity of end-users based on authentication performed by an authorization server. In AKS, you can configure an OIDC issuer to enable integration with external identity providers. This allows you to authenticate users or services against an external identity provider, such as Azure AD, Google, or Okta, and retrieve identity tokens for secure access to AKS resources.