Connect Existing AWS EKS Cluster

Truefoundry can help connect an existing AWS cluster to the control-plane. To do this, you can install the tfy-k8s-aws-eks-inframold helm chart to the cluster to install all the components.

This chart will install all the components needed for Truefoundry compute-plane. You can find the default values of this chart here.

🚧

Please make sure to provide all the required parts in the values file before installing the helm chart. Also make sure that you are not overriding any already installed components in the cluster.

Installing OCLI

  1. Download the binary using the below command.
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_darwin_arm64" -o ocli
    
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_darwin_amd64" -o ocli
    
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_linux_arm64" -o ocli
    
    curl -H 'Cache-Control: max-age=0' -s "https://releases.ocli.truefoundry.tech/binaries/ocli_$(curl -H 'Cache-Control: max-age=0' -s https://releases.ocli.truefoundry.tech/stable.txt)_linux_amd64" -o ocli
    
  2. Make the binary executable and move it to $PATH
    sudo chmod +x ./ocli
    sudo mv ocli /usr/local/bin
    
  3. Confirm by running the command
    ocli --version
    

Download the kubeconfig file

aws eks update-kubeconfig --name CLUSTER_NAME  --region REGION --profile PROFILE

Connecting the existing cluster to TrueFoundry

  • Head over to the TrueFoundry platform and log in. If you haven't logged in, then you can sign up here.

  • Once you have logged in, navigate to Settings tag from the left panel and create the new API key. Copy the API key as it will be used in the next set of commands.

  • To connect the cluster we will create two files

    • cluster.yaml - yaml based file to setup your cluster which can be stored in git
    • integrations.yaml - yaml based file to setup ECR, Parameter store and S3 bucket which can be stored in git.
  • Format of the cluster.yaml

    name: CLUSTER_NAME
    type: cluster
    cluster_type: aws-eks
    collaborators:
    - role_id: cluster-admin
      user_fqn: user:TRUEFOUNDRY_USERNAME
    environment_names:
      - ENVIRONMENT_ID
    
    • Replace the following in the cluster file
      • CLUSTER_NAME - name of the cluster
      • TRUEFOUNDRY_USERNAME - your truefoundry username
      • ENVIRONMENT_ID - environment ID is available in Settings -> Environments . Right now the environment ID is not displayed in the UI and is coming soon. You can use the value of tenantName-TagName as your environment ID.
    • For a full list of things to add in the cluster spec, please check the following spec
  • Format of the integrations file

    name: CLUSTER_NAME
    type: provider-account/aws
    provider: aws
    auth_data:
      type: assumed-role-based
      assumed_role_arn: ASSUME_ROLE_ARN
    aws_account_id: AWS_ACCOUNT_ID
    integrations:
      - name: CLUSTER_NAME-s3
        type: blob-storage
        region: REGION
        storage_root: s3://BUCKET_NAME
      - name: CLUSTER_NAME-ecr
        type: docker-registry
        registry_url: AWS_ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com
      - name: CLUSTER_NAME-ssm
        type: secret-store
        region: REGION
      - name: CLUSTER_NAME-integration
        type: cluster-integration
        region: REGION
        cluster_name: CLUSTER_NAME
        tfy_cluster_id: CLUSTER_NAME
    
  • Run the following command by replacing the API_KEY and the control plane URL which is the TrueFoundry URL you are currently logged in with protocol (https://).

    ocli compute-plane-connect --cluster-file cluster.yaml --integration-file integration.yaml \
    --api-key API_KEY --control-plane-url CONTROL_PLANE_URL
    
  • Now you will see a cluster is created on the platform. We will continue ahead in the deployment section to install the agent so that the EKS cluster can connect to the platform.

  • Prepare the following values.yaml file. Paste the tenant , controlPlaneURL, clusterName and the token generated from the previous command.

    ## @param tenantName Parameters for tenantName
    ## Name of the tenant
    ##
    tenantName: TENANT_NAME
    
    ## @param controlPlaneURL Parameters for controlPlaneURL
    ## URL of the control plane
    ##
    controlPlaneURL: CONTROL_PLANE_URL
    
    ## @param clusterName Name of the cluster
    ## Name of the cluster
    ##
    clusterName: CLUSTER_NAME
    
    
    ## @section Parameters for AWS
    ## Parameters for AWS
    ##
    aws:
      ## @subsection Parameters for awsLoadBalancerController
      ##
      awsLoadBalancerController:
        ## @param aws.awsLoadBalancerController.roleArn Role ARN for AWS Load Balancer Controller
        ##
        roleArn: AWS_LOAD_BALANCER_CONTROLLER_ARN
    
      ## @subsection Parameters for karpenter
      ##
      karpenter:
        ## @param aws.karpenter.clusterEndpoint Cluster endpoint for Karpenter
        ##
        clusterEndpoint: EKS_CLUSTER_ENDPOINT
        ## @param aws.karpenter.roleArn Role ARN for Karpenter
        ##
        roleArn: KARPENTER_IAM_ROLE_ARN
        ## @param aws.karpenter.instanceProfile Instance profile for Karpenter
        ##
        instanceProfile: KARPENTER_INSTANCE_PROFILE
        ## @param aws.karpenter.defaultZones Default zones for Karpenter
        ##
        defaultZones: [EKS_AVAILABILITY_ZONES]
    
        ## @param aws.karpenter.interruptionQueue Interruption queue name for Karpenter
        ##
        interruptionQueue: KARPENTER_INTERRUPTION_QUEUE_NAME
    
      ## @subsection Parameters for awsEbsCsiDriver
      ##
      awsEbsCsiDriver:
        ## @param aws.awsEbsCsiDriver.roleArn Role ARN for AWS EBS CSI Driver
        ##
        roleArn: AWS_EBS_CSI_DRIVER_ROLE_ARN
    
      ## @subsection Parameters for awsEfsCsiDriver
      ##
      awsEfsCsiDriver:
        ## @param aws.awsEfsCsiDriver.fileSystemId File system ID for AWS EFS CSI Driver
        ##
        fileSystemId: AWS_EFS_FS_ID
        ## @param aws.awsEfsCsiDriver.region Region for AWS EFS CSI Driver
        ##
        region: AWS_REGION
        ## @param aws.awsEfsCsiDriver.roleArn Role ARN for AWS EFS CSI Driver
        ##
        roleArn: AWS_EFS_IAM_ROLE_ARN
    
    
    ## @section Parameters for tfyAgent
    ##
    tfyAgent:
      ## @param tfyAgent.clusterToken Parameters for clusterToken
      ## Token for cluster authentication
      ##
      clusterToken: TOKEN
    
    ## @section istio parameters
    ## @param istio.enabled Flag to enable Istio
    ##
    istio:
      enabled: true
      ## @skip istio.gateway.annotations Annotations for Istio Gateway
      gateway:
        annotations:
          "service.beta.kubernetes.io/aws-load-balancer-name": "CLUSTER_NAME"
          "service.beta.kubernetes.io/aws-load-balancer-type": "external"
          "service.beta.kubernetes.io/aws-load-balancer-scheme": "internet-facing"
          "service.beta.kubernetes.io/aws-load-balancer-ssl-ports": "https"
          "service.beta.kubernetes.io/aws-load-balancer-alpn-policy": "HTTP2Preferred"
          "service.beta.kubernetes.io/aws-load-balancer-backend-protocol": "tcp"
          "service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags": cluster-name=CLUSTER_NAME,  truefoundry.com/managed=true, owner=Truefoundry, application=tfy-istio-ingress
          "service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled": "true"
    
      ## @param istio.tfyGateway.httpsRedirect Flag to enable HTTPS redirect for Istio Gateway
      tfyGateway:
        httpsRedirect: true
    
    • Get a detailed list of values.yaml. If you already have components you can chose to disable them.
    • To set up karpenter follow along the document and then fill in the above values.
    • To setup EFS and EBS follow along the document and then fill in the above values.
    • To setup AWS load balancer controller - documentation is coming soon
  • Once the values.yaml is filled run the following command to install the helm chart

    • Using ocli
      go run main.go compute-plane-install -f values.yaml --cluster-type aws-eks
      
    • Using helm command
      helm repo add truefoundry https://truefoundry.github.io/infra-charts/
      helm repo update truefoundry
      helm install my-tfy-k8s-aws-eks-inframold truefoundry/tfy-k8s-aws-eks-inframold -f values.yaml
      

Cluster spec

Example -

name: test-clsuter
type: cluster
monitoring:
  kubecost_url: http://kubecost-cost-analyzer.kubecost.svc.cluster.local:9090
  loki_url: http://loki.loki.svc.cluster.local:3100
  prometheus_url: http://prometheus-operated.prometheus.svc.cluster.local:9090
base_domains:
  - '*.example.com'
  - 'example.com
cluster_type: aws-eks
collaborators:
  - role_id: cluster-admin
    user_fqn: user:user1
  - role_id: cluster-member
    user_fqn: user:user2
environment_ids:
  - test-id-1
  - dev-id
default_registry_fqn: registryFQN
cluster_integration_fqn: clusterIntegrationFQN
ssh_server_config:
  base_domain: ssh.example.com
  port: 80
supported_nodepools:
  - name: nodepoolname

Integration spec

Example -

name: test-cluster
type: provider-account/aws
provider: aws
auth_data:
  type: assumed-role-based
  assumed_role_arn: "arn:aws:iam::123456789123:role/test-cluster-iam-role"
aws_account_id: "123456789123"
integrations:
- name: test-cluster-s3
  type: blob-storage
  region: us-east-1
  storage_root: s3://test-cluster-bucket
- name: test-cluster-ecr
  type: docker-registry
  registry_url: 123456789123.dkr.ecr.us-east-1.amazonaws.com
- name: test-cluster-ssm
  type: secret-store
  region: us-east-1
- name: test-cluster-integration
  type: cluster-integration
  region: us-east-1
  cluster_name: test-cluster
  tfy_cluster_id: test-cluster

Setup Karpenter on AWS Account

Karpenter is essential for cluster autoscaling and dynamic provisioning of nodes. Karpenter enables scaling up and down the cluster without doing any preset nodepool configuration. The following steps take care of enabling karpenter on an AWS account.

  1. Create and bootstrap the node role which karpenter nodes will use
$ export CLUSTER_NAME=<cluster_name>

$ export AWS_REGION=""

$ echo '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}' > node-trust-policy.json

$ aws iam create-role --role-name karpenter-node-role-${CLUSTER_NAME} \
    --assume-role-policy-document file://node-trust-policy.json

$ aws iam attach-role-policy --role-name karpenter-node-role-${CLUSTER_NAME} \
    --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

$ aws iam attach-role-policy --role-name karpenter-node-role-${CLUSTER_NAME} \
    --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

$ aws iam attach-role-policy --role-name karpenter-node-role-${CLUSTER_NAME} \
    --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

$ aws iam attach-role-policy --role-name karpenter-node-role-${CLUSTER_NAME} \
    --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

$ aws iam create-instance-profile \
    --instance-profile-name karpenter-instance-profile-${CLUSTER_NAME}

$ aws iam add-role-to-instance-profile \
    --instance-profile-name karpenter-instance-profile-${CLUSTER_NAME} \
    --role-name karpenter-node-role-${CLUSTER_NAME}
  1. Create service account for the karpenter controller
$ CLUSTER_ENDPOINT="$(aws eks describe-cluster \
    --name ${CLUSTER_NAME} --query "cluster.endpoint" \
    --output text)"
$ OIDC_ENDPOINT="$(aws eks describe-cluster --name ${CLUSTER_NAME} \
    --query "cluster.identity.oidc.issuer" --output text)"
$ AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' \
    --output text)

$ echo "{
    \"Version\": \"2012-10-17\",
    \"Statement\": [
        {
            \"Effect\": \"Allow\",
            \"Principal\": {
                \"Federated\": \"arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}\"
            },
            \"Action\": \"sts:AssumeRoleWithWebIdentity\",
            \"Condition\": {
                \"StringEquals\": {
                    \"${OIDC_ENDPOINT#*//}:aud\": \"sts.amazonaws.com\",
                    \"${OIDC_ENDPOINT#*//}:sub\": \"system:serviceaccount:karpenter:karpenter\"
                }
            }
        }
    ]
}" > controller-trust-policy.json

$ aws iam create-role --role-name karpenter-controller-role-${CLUSTER_NAME} \
    --assume-role-policy-document file://controller-trust-policy.json

$ echo '{
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "iam:PassRole",
                "ec2:DescribeImages",
                "ec2:RunInstances",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ec2:DeleteLaunchTemplate",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:DescribeSpotPriceHistory",
                "pricing:GetProducts"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Karpenter"
        },
        {
            "Action": "ec2:TerminateInstances",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/Name": "*karpenter*"
                }
            },
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ConditionalEC2Termination"
        }
    ],
    "Version": "2012-10-17"
}' > controller-policy.json

$ aws iam put-role-policy --role-name karpenter-controller-role-${CLUSTER_NAME} \
    --policy-name karpenter-controller-policy-${CLUSTER_NAME} \
    --policy-document file://controller-policy.json
  1. We need to tag all the subnets where karpenter nodes should be created
# This will give you all the subnet ids available. Choose the subnets that karpenter should create nodes in
$ aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.resourcesVpcConfig.subnetIds"

# Execute the following two commands for each of the subnets
$ aws ec2 create-tags --tags "Key=kubernetes.io/cluster/${CLUSTER_NAME},Value=shared" --resources <subnet_id>

$ aws ec2 create-tags --tags "Key=subnet,Value=private" --resources <subnet_id>
  1. We also need to tag the security group where the karpenter nodes are to be created
$ SECURITY_GROUP_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text)

$ aws ec2 create-tags --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}" --resources ${SECURITY_GROUP_ID}
  1. Update the aws-auth configmap for the karpenter nodes to access the control plane and add the section under mapRoles
$ kubectl edit configmap aws-auth -n kube-system
- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:aws:iam::${AWS_ACCOUNT_ID}:role/karpenter-node-role-${CLUSTER_NAME}
  username: system:node:{{EC2PrivateDNSName}}
  1. Enable spot instance creation: If it returns an error, that means spot instances were already enabled.
$ aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

The outputs from the above steps will need to be provided in the Karpenter section in the values file as mentioned below:

  karpenter:
    enabled: true
    ## You can put this value from the CLUSTER_ENDPOINT variable in step 2 above.
    clusterEndpoint: ""
    roleArn: ""
    instanceProfile: ""
    defaultZones: ""

    gpuProvisioner:
      capacityTypes:  ["spot", "on-demand"]
      instanceFamilies: ["p2", "p3", "p4d", "p4de", "p5", "g4dn", "g5"]
      zones: ""

    inferentiaProvisioner:
      capacityTypes: ["spot", "on-demand"]
      instanceFamilies: ["inf1", "inf2"]
      zones: ""

    interruptionQueueName: ""

Setup EBS on AWS Account

The Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver manages the lifecycle of Amazon EBS volumes as storage for the Kubernetes Volumes that you create. To set up EBS with your cluster, we will need to generate a role and provide it in the ebs section in values file.

  1. Substitute the correct values in the script below
export CLUSTER_NAME=""
export AWS_REGION=""
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export OIDC_ENDPOINT=$(aws eks describe-cluster --name ${CLUSTER_NAME} \
    --query "cluster.identity.oidc.issuer" --output text)
  1. Create the following policy document
cat > ebs-assume-role-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_ENDPOINT#*//}:aud": "sts.amazonaws.com",
          "${OIDC_ENDPOINT#*//}:sub": "system:serviceaccount:aws-ebs-csi-driver:ebs-csi-controller-sa"
        }
      }
    }
  ]
}
EOF
  1. Create the role using the below command
# Create the role
aws iam create-role \
  --role-name AmazonEKS_EBS_CSI_DriverRole-${CLUSTER_NAME} \
  --assume-role-policy-document file://"ebs-assume-role-policy.json"
  
# Attach the policy
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --role-name AmazonEKS_EBS_CSI_DriverRole-${CLUSTER_NAME}

The role arn output from step 3 should be provided in the values file:

  awsEbsCsiDriver:
    enabled: true
    roleArn: <Put the value from the first command output in step 3>

Setup EFS on AWS Account

This section describe how you can achieve EFS support in your EKS cluster.

  1. Substitute the correct values in the script below
export CLUSTER_NAME=""
export AWS_REGION=""
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
export OIDC_ENDPOINT=$(aws eks describe-cluster --name ${CLUSTER_NAME} \
    --query "cluster.identity.oidc.issuer" --output text)
export VPC_ID=$(aws eks describe-cluster \
    --name "${CLUSTER_NAME}" \
    --query "cluster.resourcesVpcConfig.vpcId" \
    --region "${AWS_REGION}" \
    --output text)
export VPC_CIDR_RANGE=$(aws ec2 describe-vpcs \
    --vpc-ids "${VPC_ID}" \
    --query "Vpcs[].CidrBlock" \
    --output text \
    --region "${AWS_REGION}")
export CLUSTER_SUBNET_LIST=$(aws eks describe-cluster \
    --name "${CLUSTER_NAME}" \
    --query 'cluster.resourcesVpcConfig.subnetIds' \
    --output text)
  1. Create the IAM role
cat > efs-assume-role-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_ENDPOINT#*//}:sub": "system:serviceaccount:aws-efs-csi-driver:efs-csi-controller-sa"
        }
      }
    }
  ]
}
EOF

export EFS_ROLE_ARN=$(aws iam create-role \
  --role-name "${CLUSTER_NAME}-csi-efs" \
  --assume-role-policy-document file://"efs-assume-role-policy.json" \
  --query 'Role.Arn' --output text)
  1. Attach the policy to IAM role
aws iam attach-role-policy \
  --policy-arn "arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy" \
  --role-name "${CLUSTER_NAME}-csi-efs"
  1. Creating a security group to allow 2049 port access from VPC.
# create a security group
SECURITY_GROUP_ID=$(aws ec2 create-security-group \
    --group-name TfyEfsSecurityGroup \
    --description "Truefoundry EFS security group" \
    --vpc-id "${VPC_ID}" \
    --region "${AWS_REGION}" \
    --output text)

# authorize the security group to connect from VPC CIDR. It can be customized to be connected from subnets 
# IDs of the security group
aws ec2 authorize-security-group-ingress \
    --group-id $SECURITY_GROUP_ID \
    --protocol tcp \
    --port 2049 \
    --region "${AWS_REGION}" \
    --cidr "${VPC_CIDR_RANGE}"
  1. Create the efs file system and mount them to cluster subnets
FILE_SYSTEM_ID=$(aws efs create-file-system \
    --region "${AWS_REGION}" \
    --performance-mode generalPurpose \
    --encrypted \
    --throughput-mode elastic \
    --tags Key=Name,Value="${CLUSTER_NAME}-efs" Key=Created-By,Value=Truefoundry Key=cluster-name,Value=$CLUSTER_NAME \
    --query 'FileSystemId' \
    --output text)

for subnet_id in ${CLUSTER_SUBNET_LIST[@]}; do 
	aws efs create-mount-target \
  --file-system-id "${FILE_SYSTEM_ID}" \
  --subnet-id $subnet_id \
  --security-groups "${SECURITY_GROUP_ID}" \
  --region "${AWS_REGION}"
done

Setup AWS load balancer controller

Coming soon