Architecture & Infrastructure Requirements

This guide describes the architecture diagram, access policies and infrastructure requirements to set up compute plane in your Azure account

Azure Architecture Diagram


Access Requirements

  • ACR - We use admin username and password. This is for the platform to be able to push and pull from ACR.
  • Blob storage - We use connection string to get access. The blob storage is used to store model artifacts.

Infrastructure requirements

Following is the list of requirements to set up compute plane in your Azure account

RequirementsDescriptionReason for requirement
VPCExisting VPC
- Min CIDR /24 for the private subnet
- Pod CIDR - /16
- Service CIDR - /20
- Networking mode (for existing cluster) - Azure CNI or Azure CNI overlay
This is needed to ensure around 250 instances and 4096 pods can be run in the Kubernetes cluster. If we expect the scale to be higher, the subnet range should be increased. Cloud Router and NAT are required for egress internet access.
Egress access For Docker Registry1. public.ecr.aws
2. quay.io
3. ghcr.io
4. docker.io/truefoundrycloud
5. docker.io/natsio
6. nvcr.io
7. registry.k8s.io
This is to download docker images for Truefoundry, ArgoCD, NATS, GPU operator, ArgoRollouts, ArgoWorkflows, Istio, Keda.
IAM user / serviceaccount to provision the infrastructure- azure subscription with billing enabled
- Contributor Role to the above Subscription.
- Role Based Access Administrator to the above subscription
You can read Azure admin permission
DNS with SSL/TLSSet of endpoints (preferably wildcard) to point to the deployments being made. Something like .internal.example.com, .external.example.com.

Certificate can be generated using cert-manager by creating a few DNS records. Or you can bring your own custom certificate.
When developers deploy their services, they will need to access the endpoints of their services to test it out or call from other services. Its better if we can make it a wildcard since then developers can deploy services like service1..internal.example.com, service2.internal.example.com
Compute QuotasQuotas need be present to bring up the CPU and GPU machines required for your usecase. Viewing quotas in Azure portal