Following is the list of requirements to set up compute plane in your AWS account

AWS Infra Requirements

New VPC + New Cluster

These are the requirements for a fresh TrueFoundry installation. If you are reusing an existing network or cluster, refer to the sections further below, in addition to this one

RequirementsRequired ConfigurationDescription
AWS AccountBilling must be enabledRequired for AWS service usage
VPC- VPC CIDR: /20 or larger
- Min 2 availability zones
- Private subnets: /24 or larger
- For custom networking CGNAT IP address required in each AZ
Ensures capacity for ~250 instances and 4096 pods. NAT Gateway required for private subnet internet access. For custom networking, CGNAT space and route tables needed.
Egress access For Docker RegistryAccess to:
- public.ecr.aws
- quay.io
- ghcr.io
- docker.io/truefoundrycloud
- docker.io/natsio
- nvcr.io
- registry.k8s.io
Required for downloading container images for TrueFoundry, ArgoCD, NATS, GPU operator, ArgoRollouts, ArgoWorkflows, Istio, Keda
DNSDomain for service endpointsExamples: *.internal.example.com, *.external.example.com, tfy.example.com. Wildcard preferred for developer service deployments
CertificateCertificate ARN for the domainsRequired for terminating TLS traffic to the services. Check here for more details
Cloud QuotasGPU (if using):
- G and VT Spot/On-demand Instances
- P Spot/On-demand Instance Requests

Inferentia (optional):
- Inferentia Spot/On-demand machines
This is to make sure TrueFoundry can bring up the machines as needed. Check and increase quotas at AWS EC2 service quotas
User / ServiceAccount- sts enabled
- Permissions listed below
See Enabling STS in a region

Existing network

RequirementsRequired ConfigurationDescription
VPC- Min 2 private subnets in different AZs with CIDR /24
- For custom networking: CGNAT space subnets
- Tags as described below
- NAT gateway for private subnets
- Min 1 public subnet (/28) for public load balancer
- Auto-assign IP address enabled
- DNS support and DNS hostnames enabled
Ensures capacity for ~250 instances and 4096 pods. NAT Gateway required for private subnet internet access. For custom networking, CGNAT space and route tables needed.

VPC Tags

Your subnets must have the following tags for the TrueFoundry terraform code to work with them. You can skip it if you are creating a new network in which case these will automatically be created.

Resource TypeRequired TagsDescription
Private Subnets- kubernetes.io/cluster/${clusterName}: "shared"
- subnet: "private"
- kubernetes.io/role/internal-elb: "1"
Tags required for EKS to properly manage internal load balancers and subnet identification
Public Subnets- kubernetes.io/cluster/${clusterName}: "shared"
- subnet: "public"
- kubernetes.io/role/elb: "1"
Tags required for EKS to properly manage external load balancers and subnet identification
EKS Node Security Group- karpenter.sh/discovery: "${clusterName}"This tag is required for Karpenter to discover and manage node provisioning for the cluster

Existing cluster

RequirementsRequired ConfigurationDescription
Compute- All Standard (A, C, D, H, I, M, R, T, Z) Spot/On-demand:
- Min 4vCPU and 8 GB RAM
- Min 2 nodes for system components
Required for running TrueFoundry components and user workloads
EKS VersionVersion 1.30 or higherRequired for compatibility with TrueFoundry components and latest security features
Storage- EBS CSI Driver installed
- Installation Guide
- EFS CSI Driver (if using shared storage)
- Installation Guide
Required for persistent volume provisioning and shared storage support
Load Balancer- AWS Load Balancer Controller v2.12.0 or higher
- Installation Guide
- Appropriate IAM roles for service account (IRSA)
Required for Ingress and Service type LoadBalancer support

IAM permissions

For the TrueFoundry terraform code to work, you need to have the following IAM permissions.

# will be added soon