Azure
Requirements
Requirements for TrueFoundry installation on Azure
Following is the list of requirements to set up compute plane in your Azure subscription
Azure Infra Requirements
New VPC + New Cluster
Following is the list of requirements to set up compute plane in your Azure account
Requirements | Description | Reason for requirement |
---|---|---|
Azure Subscription | Azure subscription should have billing enabled | Required for provisioning and managing Azure resources |
Egress access For Docker Registry | - public.ecr.aws - quay.io - ghcr.io - docker.io/truefoundrycloud - docker.io/natsio - nvcr.io - registry.k8s.io | This is to download docker images for Truefoundry, ArgoCD, NATS, GPU operator, ArgoRollouts, ArgoWorkflows, Istio, Keda. |
DNS | Domain for service endpoints | Examples: *.internal.example.com , *.external.example.com , tfy.example.com . Wildcard preferred for developer service deployments |
Certificate | Certificate for the domains | Required for terminating TLS traffic to the services. Can be managed through cert-manager or custom certificates. Check here for more details. |
Compute | Quotas need be present to bring up the CPU and GPU machines required for your use case | Required to ensure sufficient resources are available for your workloads. Check Viewing quotas in Azure portal for more details. |
Microsoft.Storage | Giving access to create storage account and other resource | Ensure that Microsoft.Storage resource provider is registered. Check this link for more details. |
Host encryption | Ensure that host encryption is enabled | Enable host encryption for data at rest. Check this link for more details. |
Azure AD application | AzureAD application for a service principal having read only access to the AKS cluster | This is used to read the node pools created in the AKS cluster for workloads to get deployed on them. Check here for more details |
Existing Network
Requirements | Description | Reason for requirement |
---|---|---|
VPC | The existing VNet should have the following available: - Min CIDR /24 for the private subnet - Pod CIDR - /16 - Service CIDR - /20 - Networking mode (for existing cluster) - Azure CNI or Azure CNI overlay | This is needed to ensure around 250 instances and 4096 pods can be run in the Kubernetes cluster. If we expect the scale to be higher, the subnet range should be increased. Cloud Router and NAT are required for egress internet access. |
Existing Cluster
Requirements | Description | Reason for requirement |
---|---|---|
Kubernetes Version | Kubernetes version 1.30 or higher | Required for latest security features and compatibility |
Worker Nodes | - Minimum 3 worker nodes - Each worker node: 4 vCPUs, 16GB RAM - For GPU workloads: NVIDIA GPU-enabled nodes - Azure CNI or Azure CNI Overlay networking | Required for running core TrueFoundry components and user workloads |
Compute Quotas | - Sufficient quota for on-demand instances (minimum 50 vCPUs) - Sufficient quota for spot instances if using spot node pools (recommended minimum 24 vCPUs) | Required to ensure sufficient resources are available. Spot instances can help optimize costs for interruptible workloads. |
Permissions required to create the infrastructure
The IAM user should have the following permissions -
-
Contributor Role to the above Subscription
-
Role Based Access Administrator to the above subscription
-
Either Azure AD Administrator or Azure AD Application Developer role to:
- Create app registrations and service principals
- Assign Reader role to AD application for read-only AKS cluster access
- Assign Monitoring Reader role to applications for cluster monitoring (Ref: How to add Azure admin permission