Following is the list of requirements to set up compute plane in your Azure subscription

Azure Infra Requirements

New VPC + New Cluster

Following is the list of requirements to set up compute plane in your Azure account

RequirementsDescriptionReason for requirement
Azure SubscriptionAzure subscription should have billing enabledRequired for provisioning and managing Azure resources
Egress access For Docker Registry- public.ecr.aws
- quay.io
- ghcr.io
- docker.io/truefoundrycloud
- docker.io/natsio
- nvcr.io
- registry.k8s.io
This is to download docker images for Truefoundry, ArgoCD, NATS, GPU operator, ArgoRollouts, ArgoWorkflows, Istio, Keda.
DNSDomain for service endpointsExamples: *.internal.example.com, *.external.example.com, tfy.example.com. Wildcard preferred for developer service deployments
CertificateCertificate for the domainsRequired for terminating TLS traffic to the services. Can be managed through cert-manager or custom certificates. Check here for more details.
ComputeQuotas need be present to bring up the CPU and GPU machines required for your use caseRequired to ensure sufficient resources are available for your workloads. Check Viewing quotas in Azure portal for more details.
Microsoft.StorageGiving access to create storage account and other resourceEnsure that Microsoft.Storage resource provider is registered. Check this link for more details.
Host encryptionEnsure that host encryption is enabledEnable host encryption for data at rest. Check this link for more details.
Azure AD applicationAzureAD application for a service principal having read only access to the AKS clusterThis is used to read the node pools created in the AKS cluster for workloads to get deployed on them. Check here for more details

Existing Network

RequirementsDescriptionReason for requirement
VPCThe existing VNet should have the following available:
- Min CIDR /24 for the private subnet
- Pod CIDR - /16
- Service CIDR - /20
- Networking mode (for existing cluster) - Azure CNI or Azure CNI overlay
This is needed to ensure around 250 instances and 4096 pods can be run in the Kubernetes cluster. If we expect the scale to be higher, the subnet range should be increased. Cloud Router and NAT are required for egress internet access.

Existing Cluster

RequirementsDescriptionReason for requirement
Kubernetes VersionKubernetes version 1.30 or higherRequired for latest security features and compatibility
Worker Nodes- Minimum 3 worker nodes
- Each worker node: 4 vCPUs, 16GB RAM
- For GPU workloads: NVIDIA GPU-enabled nodes
- Azure CNI or Azure CNI Overlay networking
Required for running core TrueFoundry components and user workloads
Compute Quotas- Sufficient quota for on-demand instances (minimum 50 vCPUs)
- Sufficient quota for spot instances if using spot node pools (recommended minimum 24 vCPUs)
Required to ensure sufficient resources are available. Spot instances can help optimize costs for interruptible workloads.

Permissions required to create the infrastructure

The IAM user should have the following permissions -

  • Contributor Role to the above Subscription

  • Role Based Access Administrator to the above subscription

  • Either Azure AD Administrator or Azure AD Application Developer role to:

    • Create app registrations and service principals
    • Assign Reader role to AD application for read-only AKS cluster access
    • Assign Monitoring Reader role to applications for cluster monitoring (Ref: How to add Azure admin permission