Creating an EKS cluster using onboarding-cli
In this document we will check how can we create a fresh EKS cluster using onboarding CLI.
Pre-requisites
- Download aws cli == 2.x.x
- Download git
- Create an AWS profile locally which is using an IAM user having admin access to the AWS account where you want to deploy the cluster.
- Ensuring AWS Infrastructure requirements are read carefully.
- Install ocli
Installation
Creating a config file
-
In this document we will check what are the options available for configuring AWS EKS cluster.
-
There are two options available for the AWS network
- Existing VPC - This is the case when you have an already existing network for your existing AWS services. The onboarding CLI can use the existing VPC to deploy the EKS cluster inside it. Read Existing VPC requirements to know more on this.
- New VPC - If you don't have any existing VPC or want to deploy the Truefoundry EKS cluster inside a new VPC then you can select this option. In this option you will be prompted for the VPC CIDR which is the CIDR range of the VPC you want. If you are not sure
10.10.0.0/16
will be taken as default. You will also be asked for private and public CIDRS. Read New VPC requirements to know more on this
-
Run the below command
ocli infra init
-
Screen will be cleared and you will be asked for cloud provider choice. Select
aws
here and for the next question add your account IDTruefoundry is a platform that makes it very easy to deploy microservices, ML models training jobs, LLMs on Kubernetes. We will start the process of bootstrapping a Kubernetes cluster. This CLI is useful only if you don't have a Kubernetes cluster. If you already have a cluster, please go to https://docs.truefoundry.com/docs/creating-your-own-kubernetes-cluster Let's get started! 1. Cloud Provider In which cloud provider you would like to deploy your cluster: : > aws azure gcp aws 2. Account ID What is the AWS Account ID where you want to deploy your cluster:
exec: "aws": executable file not found in $PATH
The above error indicates that aws cli is not present in your local machine. Make sure you have downloaded the aws CLI.
GetLocalAWSProfiles: exit status 2
The above error indicates that the version of aws CLI is not matching the required version. For
ocli
to work aws == 2.x.x
Error: initCmd: aws.InitAws(): InitAWS: Error getting AWS profiles: inputAWSProfile: No profile found, atleast one profile must exist
This is an error when the CLI is not able to find any AWS profile in your local
-
Select the right profile from the dropdown list of all the profile present in your local. You can use up and down arrow key and use
/
to search with keyword amongst the list. If you useaws configure
to set up credentials then your profile name will be prompted asdefault
3. AWS profiles Use the arrow keys to navigate: ↓ ↑ → ← and / toggles search 3(A). Which AWS profile you want to use ?
-
Enter the name for you EKS cluster. A prefix of
tfy
and region in short form will added before all the resources that will be created in the cluster so you can avoid addingtfy
in the cluster name itself4. AWS cluster name What is the cluster name that you want for your cluster (final name of your cluster will tfy-<SHORT_REGION>-<NAME>)
-
Select the region from the dropdown list. You can use up and down arrow key alongwith
/
to toggle search with keyword.5. Region Use the arrow keys to navigate: ↓ ↑ → ← and / toggles search 5(A). In which region you want to deploy your cluster: eu-west-1 me-central-1 eu-central-2 ap-northeast-3 ↓ us-east-2
-
Enter the no of availability zone and select the zones. Default (recommended) value is 3 with minimum being 2. In the below example I have selected 2 as the count of availability zones and I will get two options to select the zones from the 3 available ones.
5(B). Avilability Zones Enter the number of availability zones (Default 3: 2 <= range <=3): 2 Use the arrow keys to navigate: ↓ ↑ → ← and / toggles search Select the availabiltity zone 1: eu-west-1a eu-west-1b eu-west-1c # After selecting eu-west-1a Use the arrow keys to navigate: ↓ ↑ → ← and / toggles search Select the availabiltity zone 2: eu-west-1b eu-west-1c
Existing VPC
-
Select
existing
when you want to deploy the cluster in an existing VPC, followed by inputting the VPC ID. Make sure the subnets have enough IP address and ideally should be in the range of less then/20
blocks.6. Network and VPC Do you want to create a new network or reuse an existing network: existing
-
Select the VPC ID from the dop down list
6(A). VPC ID Use the arrow keys to navigate: ↓ ↑ → ← and / toggles search 6(A). What is your existing VPC ID ?? vpc-xxxxxxxxxxxxxxxxx vpc-xxxxxxxxxxxxxxxxx
-
Select the private subnet IDs. Subnet IDs must be equal to the no of availability zones inputted before.
Private subnets must be equal to the no of availability zones: 2. Skipping input for no of private subnets ... 6(B). Private Subnet IDs Use the arrow keys to navigate: ↓ ↑ → ← and / toggles search Select the availabiltity zone 1: subnet-xxxxxxxxxxxxxxxxx subnet-xxxxxxxxxxxxxxxxx subnet-xxxxxxxxxxxxxxxxx subnet-xxxxxxxxxxxxxxxxx ↓ subnet-xxxxxxxxxxxxxxxxx
-
Number of public subnet IDs can be different. You can use minimum to zero if you don't want the load balancer of EKS to be created in a public subnet. Here we will select 1 as we want to deploy a load balancer to host our external endpoints.
6(C). Public Subnet IDs Enter the no of public subnets you want (Default: 2): 1
-
Next enter the subnet IDs of both private and public subnets
6(A). VPC ID What is you existing VPC ID: vpc-029827189eaa2c22e vpc-029827189eaa2c22e Below we will ask you to enter the subnet ID details for your existing VPC. We need total of 2 subnets, private and public each6(B). Private Subnet IDs Enter the ID private subnet 1: subnet-0be5bd498c2869c67 Enter the ID private subnet 2: subnet-0321f13d89fce5bdf "subnet-0be5bd498c2869c67" "subnet-0321f13d89fce5bdf" 6(B). Public Subnet IDs Enter the ID of public subnet 1: subnet-0da043d78612040f3 Enter the ID of public subnet 2: subnet-0cc42609184649379 "subnet-0da043d78612040f3" "subnet-0cc42609184649379"
-
Config file will look something like
aws: account: id: "xxxxxxxxxxxx" cluster: name: newcl public_access: cidrs: - 0.0.0.0/0 enabled: true version: "1.28" iam_role: assume_role_arns: - arn:aws:iam::416964291864:role/tfy-ctl-euwe1-production-truefoundry-deps ecr: enabled: true enabled: true role_enable_override: false role_override_name: "" s3: bucket_enable_override: false bucket_override_name: "" enabled: true ssm: enabled: true network: existing: true private_subnets_cidrs: [] private_subnets_ids: - subnet-xxxxxxxxxxxx - subnet-xxxxxxxxxxxx - subnet-xxxxxxxxxxxx public_subnets_cidrs: [] public_subnets_ids: - subnet-xxxxxxxxxxxx - subnet-xxxxxxxxxxxx vpc_cidr: "" vpc_id: vpc-xxxxxxxxxxxx profile: name: administrator-devtest region: availability_zones: - us-east-1a - us-east-1b - us-east-1c name: us-east-1 tags: {} azure: null binaries: terraform: binary_path: null terragrunt: binary_path: null gcp: null provider: aws
New VPC (Recommended)
-
Select
new
when you want to deploy the cluster in a new VPC, followed by your expected CIDR range. If you press enter10.10.0.0/16
will be selected as default and then subnets will be automatically selected. -
If you chose a different CIDR range for your VPC you have to select the subnet CIDR explicitly.
6(A). VPC CIDR What should be the CIDR for your new VPC (Default: 10.10.0.0/16. Chose a range between /8 and /24): 10.20.0.0/16 10.20.0.0/16 Below we will ask you to enter the subnet CIDR details for your new VPC. We need to create total of 3 subnets for each availability zones 6(B). Private Subnet CIDRS Enter the CIDR of private subnet 1: 10.20.0.0/20 Enter the CIDR of private subnet 2: 10.20.16.0/20 Enter the CIDR of private subnet 3: 10.20.32.0/20 "10.20.0.0/20" "10.20.16.0/20" "10.20.32.0/20" 6(C). Public Subnet CIDRS Enter the CIDR of public subnet 1: 10.20.128.0/20 Enter the CIDR of public subnet 2: 10.20.144.0/20 Enter the CIDR of public subnet 3: 10.20.160.0/20
-
For new VPC config file will look something like this
aws: account: id: "xxxxxxxxxxxx" cluster: name: clusterxyz public_access: cidrs: - 0.0.0.0/0 enabled: true version: "1.28" iam_role: assume_role_arns: - arn:aws:iam::416964291864:role/tfy-ctl-euwe1-production-truefoundry-deps ecr: enabled: true enabled: true role_enable_override: false role_override_name: "" s3: bucket_enable_override: false bucket_override_name: "" enabled: true ssm: enabled: true network: existing: false private_subnets_cidrs: - 10.10.0.0/20 - 10.10.16.0/20 - 10.10.32.0/20 private_subnets_ids: [] public_subnets_cidrs: - 10.10.176.0/20 - 10.10.192.0/20 - 10.10.208.0/20 public_subnets_ids: [] vpc_cidr: 10.10.0.0/16 vpc_id: "" profile: name: admin region: availability_zones: - us-east-1a - us-east-1b - us-east-1c name: us-east-1 tags: {} azure: null binaries: terraform: binary_path: null terragrunt: binary_path: null gcp: null provider: aws
IAM Role section (new)
Considering the security best practices we have removed the creation of user which used to have access to your account's ECR, S3 bucket and SSM and replaced that with an IAM role utilizing cross account IAM role which is password-less.
Now, we create an IAM role which allows assumeRole
on Truefoundry's production IAM role arn:aws:iam::416964291864:role/tfy-ctl-euwe1-production-truefoundry-deps
. If you are using Truefoundry's control plane then you can leave this IAM role ARN as it is. However, if you are using your own control plane, then you can add the control plane IAM roles in the aws.iam_role.assume_role_arns
section.
To override the name of the IAM role that will get created in your account change the following settings in your tfy-config.yaml
file
role_enable_override: true
role_override_name: "<your-preferred-role-name>"
You can disable the role to have access to ecr, ssm or S3 bucket. In case of S3 bucket (if disabled), the bucket will also not get created. Following changes will disable ECR and S3 bucket but SSM will be kept enabled
iam_role:
assume_role_arns:
- arn:aws:iam::416964291864:role/tfy-ctl-euwe1-production-truefoundry-deps
ecr:
enabled: false
enabled: true
role_enable_override: false
role_override_name: ""
s3:
bucket_enable_override: false
bucket_override_name: ""
enabled: false
ssm:
enabled: true
Moreover for S3 bucket, a default name will be given which can be overriden using the following parameters in the aws.iam_role.s3
section
s3:
bucket_enable_override: true
bucket_override_name: "<your-preferred-bucket-name>"
enabled: true
You can entirely disable creation of the IAM role by keeping aws.iam_role.enabled
as false
.
Applying tags
There is a common requirement amongst customers to tag all the resources created by truefoundry and for this a section of tags: {}
is given to apply key-value pairs on all the resources deployed by TrueFoundry. An example of this section
tags:
Owner: TrueFoundry
Email: [email protected]
Purpose: LLM
Risk: Low
Running the config file
- Run the config file by
ocli infra create --file tfy-config.yaml
Post cluster-creation steps
Saving the output
The above process generates some output which are helpful for deployment of some applications. For this save the output in some file
ocli infra output --file tfy-config.yaml > output.txt
Downloading the kubeconfig
file
kubeconfig
file- The CLI downloads kubectl if it is not present by default. We need to use the
aws
CLI to download the kubectlexport AWS_REGION="" export CLUSTER_NAME="" export AWS_PROFILE=""
- Download the kubeconfig file
aws eks --region $AWS_REGION update-kubeconfig --name $CLUSTER_NAME --profile $AWS_PROFILE
Connecting the cluster to the platform
Once your cluster is created we need to run a second step to install the truefoundry-agent which will connect the cluster to the control plane.
- If you haven't registered for TrueFoundry yet, follow along the doc to register your company.
- Once you have logged head over to Bring Your Own Cluster section to add the cluster details.
- Copy the bash script given in the tab and execute the script. This part will be replaced by ocli script soon.
Updated about 1 month ago