Architecture & Infrastructure Requirements
This guide describes the architecture diagram, access policies and infrastructure requirements to set up compute plane in your Azure account
Azure Architecture Diagram
Access Requirements
- ACR - We use admin username and password. This is for the platform to be able to push and pull from ACR.
- Blob storage - We use connection string to get access. The blob storage is used to store model artifacts.
Infrastructure requirements
Following is the list of requirements to set up compute plane in your Azure account
Requirements | Description | Reason for requirement |
---|---|---|
VPC | Existing VPC - Min CIDR /24 for the private subnet - Pod CIDR - /16 - Service CIDR - /20 - Networking mode (for existing cluster) - Azure CNI or Azure CNI overlay | This is needed to ensure around 250 instances and 4096 pods can be run in the Kubernetes cluster. If we expect the scale to be higher, the subnet range should be increased. Cloud Router and NAT are required for egress internet access. |
Egress access For Docker Registry | 1. public.ecr.aws 2. quay.io 3. ghcr.io 4. docker.io/truefoundrycloud 5. docker.io/natsio 6. nvcr.io 7. registry.k8s.io | This is to download docker images for Truefoundry, ArgoCD, NATS, GPU operator, ArgoRollouts, ArgoWorkflows, Istio, Keda. |
IAM user / serviceaccount to provision the infrastructure | - azure subscription with billing enabled - Contributor Role to the above Subscription. - Role Based Access Administrator to the above subscription | You can read Azure admin permission |
DNS with SSL/TLS | Set of endpoints (preferably wildcard) to point to the deployments being made. Something like .internal.example.com, .external.example.com. Certificate can be generated using cert-manager by creating a few DNS records. Or you can bring your own custom certificate. | When developers deploy their services, they will need to access the endpoints of their services to test it out or call from other services. Its better if we can make it a wildcard since then developers can deploy services like service1..internal.example.com, service2.internal.example.com |
Compute Quotas | Quotas need be present to bring up the CPU and GPU machines required for your usecase. | Viewing quotas in Azure portal |
Updated 6 months ago