The compute plane is always in the customer’s own cloud environment. Truefoundry doesn’t provide Kubernetes clusters as compute on its own. This ensures all data and compute stay within the customer’s own infrastructure.Truefoundry can help create a new compute plane cluster using Terraform (recommended) or also use an existing cluster. If using an existing cluster, please make sure you conform to the key requirements mentioned below.

ArgoCD (Essential)
ArgoCD (Essential)
TrueFoundry relies on ArgoCD to deploy applications to the compute-plane cluster. The infra applications are deployed in the default project in argocd
while the user deployed applications are deployed in tfy-apps project.If you are using your own ArgoCD, please make sure of the following requirements:You can find the ArgoCD configuration file that Truefoundry installs by default here.
- Ensure argocd has access to create argo applications in all namespace. For this following things must be set
- Create a tfy-apps project with the following spec.
Prometheus (Essential)
Prometheus (Essential)
Prometheus is used to power the metrics feature on the platform. It also powers the autoscaling, autoshutdown and autopilot features of the platform. TrueFoundry uses the opensource kube-prometheus-stack for running prometheus in the cluster.If you are already using kube-prometheus-stack in your cluster, TrueFoundry should be able to work with it with the following configuration changes:andYou can find the argocd configuration here.
Istio (Optional)
Istio (Optional)
Istio is a really powerful service mesh and ingress controller. TrueFoundry uses Istio as the primary ingress controller in the compute-plane cluster. If you are using any other Ingress controller, most of the features in the platform will still work except the ones listed below that specifically rely on Istio envoy proxy or envoy filters.The key features that rely on Istio and will not work otherwise are:There are three istio components that TrueFoundry installs:
We don’t inject the sidecar by
default - its only injected in cases where needed for usecases mentioned below
- Request Count Based autoscaling
- Oauth based authentication and authorization for Jupyter Notebooks. Without Istio, there will be no authentication and authorization for the notebooks. ` 3. Intercepts feature to redirect / mirror traffic to other applications.
- Authentication for services deployed on the cluster.
Please ensure that if you have multiple Istio gateways, they do not have the same domains configured. If that is the case, then we will need to specify which gateway to use for the Truefoundry components as a variable in the tfy-agent helm chart.
- istio-base - These are the bunch of CRDs that are required for Istio to work. You can find the argocd configuration here.
- istio-discovery - This is pilot service that is responsible for the discovery of the services in the cluster. You can find the argocd configuration here.
- tfy-istio-ingress - This is the ingress gateway that is responsible for the ingress of the services to the cluster. You can find the argocd configuration here.
ArgoRollouts (Optional)
ArgoRollouts (Optional)
Argo Rollouts is used to power the canary and blue-green rollout strategies in TrueFoundry.If you are already using Argo Rollouts in your cluster, Truefoundry should be able to work with it without any additional configuration.You can find the argocd configuration here.
ArgoWorkflows (Optional)
ArgoWorkflows (Optional)
TrueFoundry uses Argo Workflows to power the Jobs feature on the platform.If you are already using Argo Workflows in your cluster, Truefoundry should be able to work with the following configuration:You can find the argocd configuration here.
Keda (Optional)
Keda (Optional)
Keda is used to power the autoscaling feature on the platform. TrueFoundry uses the opensource keda for event driver autoscaling in the cluster.If you are already using Keda in your cluster, TrueFoundry should be able to work without any additional configuration.You can find the argocd configuration here.
TFY Logs (Optional)
TFY Logs (Optional)
Victoria logs and Vector are used to power the logs feature on the platform. This is optional and you can choose to provide your own logging solution.If you are already using Victoria logs in your cluster, Truefoundry should be able to work without any additional configuration. If you are already using vector to ingest logs, Truefoundry should be able to work with the following configuration:You can find the argocd configuration here.
Without tfy-logs, we will not be able to show the aggregated logs on the platform for the services.
GPU Operator (Optional)
GPU Operator (Optional)
GPU Operator is used to deploy workloads on the GPU nodes. It’s a TrueFoundry provided helm chart that’s based on Nvidia’s GPU operator.If you are already using nvidia’s GPU Operator in your cluster, TrueFoundry should be able to work without any additional configuration.You can find the argocd configuration for the following cloud providers
Grafana (Optional)
Grafana (Optional)
Grafana is a monitoring tool that can be installed to view the metrics, logs and create dashboards on the cluster. TrueFoundry doesn’t direcly use grafana to power the monitoring dashboard on the platform but it is available to view additional cluster level metrics as a separate addon.If you are using Grafana in your cluster, you can use it for monitoring the cluster. But if you want to use the TrueFoundry provided Grafana, you can install the TrueFoundry grafana helm chart that comes with a lot of inbuilt dashboards for cluster monitoring.You can find the argocd configuration here.
[AWS Only] Karpenter (Essential)
[AWS Only] Karpenter (Essential)
Karpenter is required for supporting dynamic node provisioning on AWS EKS.If you are already using Karpenter in your cluster, Truefoundry should be able to work with the following additional configuration:You can find the karpenter argocd configuration here.We also install tfy-karpenter-config which is another helm chart that installs the nodepools and nodeclasses. If you are already using Karpenter in your cluster, TrueFoundry requires following nodepool types to be present:
You can find the tfy-karpenter-config argocd configuration here.
- Install eks-node-monitoring-agent helm chart.
- Configure Karpenter to use the eks-node-monitoring-agent.
Nodepool Type | Configuration | Purpose |
---|---|---|
Critical | amd64 linux on-demand nodepool with taint class.truefoundry.com/component=critical:NoSchedule and label class.truefoundry.com/component=critical | For running TrueFoundry critical workloads like prometheus, victoria-logs and tfy-agent. |
GPU nodepool | amd64 linux on-demand/spot (both) with taint nvidia.com/gpu=true:NoSchedule and label nvidia.com/gpu.deploy.operands=true | For running user deployed GPU applications. |
Default nodepool | amd64 linux on-demand/spot (both) without any taints | For running user deployed CPU applications. |
[AWS Only] Metrics-Server (Essential)
[AWS Only] Metrics-Server (Essential)
Metrics-Server is required on AWS EKS cluster for autoscaling.If you are already using Metrics-Server in your cluster, Truefoundry should be able to work without any additional configuration.You can find the argocd configuration here.
[AWS Only] AWS EBS CSI Driver (Essential)
[AWS Only] AWS EBS CSI Driver (Essential)
AWS EBS CSI Driver is required for supporting EBS volumes on EKS cluster.If you are already using AWS EBS CSI Driver in your cluster, Truefoundry should be able to work without any additional configuration. We do expect a default storage class to be present in the cluster preferrably gp3 backed by encrypted volumes.You can find the argocd configuration here.
[AWS Only] AWS EFS CSI Driver (Optional)
[AWS Only] AWS EFS CSI Driver (Optional)
AWS EFS CSI Driver is required for supporting EFS volumes for EKS cluster.If you are already using AWS EFS CSI Driver in your cluster, Truefoundry should be able to work without any additional configuration. We do expect a storage class to be present in the cluster which can be used for mounting EFS volumes.You can find the argocd configuration here.
[AWS Only] AWS Load Balancer Controller (Essential)
[AWS Only] AWS Load Balancer Controller (Essential)
AWS Load Balancer Controller is required for supporting load balancer on EKS.If you are already using AWS Load Balancer Controller in your cluster, Truefoundry should be able to work without any additional configuration.You can find the argocd configuration here.
[AWS Only] TFY Inferentia Operator (Optional)
[AWS Only] TFY Inferentia Operator (Optional)
TFY Inferentia Operator is required for supporting Inferentia machines on EKS.If you are already using Inferentia Operator in your cluster, TrueFoundry should be able to work without any additional configuration.You can find the argocd configuration here.
Cert-Manager (Optional)
Cert-Manager (Optional)
Cert-Manager is required for provisioning certificates for exposing services. In AWS you can use the AWS Certificate Manager to provision the certificates. For more details on how to setup the certificates, please refer to the TrueFoundry documentation.