Infra Set-Up for Workflows

Setting up workflows in a Workload Cluster (already connected to truefoundry) requires the following configuration to be done:

Requirements:

  • Cloud Storage Bucket (S3/GCS/AzureBlob)
  • Service Account (or Key based access) with "Admin" permission to the bucket.

Steps

  1. Create an Integration of Blob Storage on the platform ( from Integrations section).
    Ignore this step if you already have the integration added.
  2. In the "Clusters" section, Edit your cluster and Update the "Workflow Storage Integration" with the integration from Step 1.
  3. Install Workflow Propeller in the cluster( Dataplane Component of Flyte ) by following the instruction below (Depending on the Cloud Provider).
    Note: Workflow propeller requires access to the blob storage via a serviceaccount.

Note: Ideally the Blob Storage and the cluster should be in the same Region.


Setting Up tfy-workflow-propeller

To install tfy-workflow-propeller, follow the following steps:

  1. Create a workspace in your cluster with the name: tfy-workflow-propeller[ Workspaces -> New Workspace]

  2. Open the deployments page and click on New Deployment

  3. Select the workspace as tfy-workflow-propeller and chose the deployment type as Helm as shown in image below

  4. Once you click on next, fill the following values in the form as shown below:

Name: tfy-workflow-propeller
Helm repository URL: https://truefoundry.github.io/infra-charts/
Chart Name: tfy-workflow-propeller
Version: 0.0.2
Values: <Cloud Specific, refer to the next section of docs>
  1. Now, you need fill the values section of tfy-workflow-propeller. These are described in the next section (depending on the cloud)

Values of Workflow Propeller

AWS

Steps:

  1. Create an S3 bucket (or use an already existing bucket). This should be in same region as your cluster.
  2. Create a Role which has Admin access to the above S3 bucket.
  3. Add policy to ensure that this role can be assumed by the following ServiceAccount:
    flytepropeller in tfy-workflow-propellernamespace [ No need to create service account, it will be created by the helm chart itself]
  4. Fill the values in the file below and deploy the tfy-workflow-propeller.
global:
  tenantName: <Enter your tenant name>
  controlPlaneUrl: <Enter the full control plane url e.g. https://xyz.truefoundry.tech>
flyte-core:
  common:
    ingress:
      enabled: false
  secrets:
    adminOauthClientCredentials:
      enabled: true
  storage:
    type: s3
    limits:
      maxDownloadMBs: 2
    bucketName: <Enter your S3 bucket name>
    connection:
      region: <Enter your AWS region>
      auth-type: iam
    enable-multicontainer: true
  webhook:
    enabled: false
  configmap:
    k8s:
      plugins:
        k8s:
          default-env-vars:
            - TFY_INTERNAL_SIGNED_URL_SERVER_HOST: >-
                http://tfy-signed-url-server.tfy-workflow-propeller.svc.cluster.local:3001
    core:
      propeller:
        leader-election:
          enabled: true
          retry-period: 2s
          lease-duration: 15s
          renew-deadline: 10s
          lock-config-map:
            name: propeller-leader
            namespace: tfy-workflow-propeller
        metadata-prefix: s3://<Enter your S3 bucket name>/tfy-workflow-propeller/metatdata
        rawoutput-prefix: s3://<Enter your S3 bucket name>/tfy-workflow-propeller/raw_data
        publish-k8s-events: true
    admin:
      admin:
        Command:
          - echo
          - <Enter your cluster token (can be found in tfy-agent helm chart>
        AuthType: ExternalCommand
        # endpoint: app.truefoundry.com:443
        endpoint: <Enter your control plane host (without https://)>:443
        insecure: false
        AuthorizationHeader: authorization
      event:
        rate: 500
        type: admin
        capacity: 100
    logger:
      level: 5
      show-source: true
  flyteadmin:
    enabled: false
    serviceAccount:
      alwaysCreate: true
  datacatalog:
    enabled: false
  flyteconsole:
    enabled: false
  flytepropeller:
    enabled: true
    serviceAccount:
      # service account is created with name `flytepropeller`
      create: true
      annotations:
        eks.amazonaws.com/role-arn: >-
          <Enter AWS role ARN with access to storage bucket>
  workflow_scheduler:
    enabled: false
  workflow_notifications:
    enabled: false
  cluster_resource_manager:
    enabled: false
tfySignedURLServer:
  env:
    AWS_REGION: <Enter your AWS region here>
    S3_BUCKET_NAME: s3://<Enter your s3 bucket name here>
    DEFAULT_CLOUD_PROVIDER: aws
  enabled: true

GCP

Steps:

  1. Create a GCS bucket (or use an already existing bucket). This should be in same region as your cluster.
  2. Create a Google Cloud Service Account which has Admin access to the above GCS bucket.
  3. Add policy to ensure that this role can be assumed by the following Kubernetes ServiceAccount:
    flytepropeller in tfy-workflow-propellernamespace [ No need to create service account, it will be created by the helm chart itself]
    You may use the script below to do the following:
#!/bin/bash

PROJECT=<Enter your GCP PROJECT ID>
CLUSTER_NAME=<Enter your GCP Cluster Name>
REGION=<Enter your gcp region>
BUCKET_NAME=<Enter your GCP bucket name without gs:// prefix>
TARGET_NAMESPACE=tfy-workflow-propeller

# Clip bucket name to 20 characters
CLIPPED_BUCKET_NAME="${BUCKET_NAME:0:20}"

K8S_SA_NAME="flytepropeller"
GCP_SA_NAME="$CLIPPED_BUCKET_NAME-flyte-sa"

GCP_SA_ID="$GCP_SA_NAME@$PROJECT.iam.gserviceaccount.com"


gcloud iam service-accounts create $GCP_SA_NAME  --project=$PROJECT
gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME \
    --member "serviceAccount:$GCP_SA_ID" \
    --role "roles/storage.objectAdmin" \
    --project $PROJECT

gcloud iam service-accounts add-iam-policy-binding $GCP_SA_ID \
    --role "roles/iam.serviceAccountTokenCreator" \
    --member "serviceAccount:$GCP_SA_ID" \
    --project $PROJECT
    

gcloud iam service-accounts add-iam-policy-binding $GCP_SA_ID \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:$PROJECT.svc.id.goog[$TARGET_NAMESPACE/$K8S_SA_NAME]" \
    --project $PROJECT

gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME \
    --member "serviceAccount:$GCP_SA_ID" \
    --role "roles/storage.legacyBucketReader" \
    --project $PROJECT
  1. Fill the values in the file below and deploy the tfy-workflow-propeller.
global:
  tenantName: <Enter your tenant name>
  controlPlaneUrl: <Enter the full control plane url e.g. https://xyz.truefoundry.tech>
flyte-core:
  common:
    ingress:
      enabled: false
  secrets:
    adminOauthClientCredentials:
      enabled: true
  storage:
    gcs:
      projectId: <Enter GCP Project Id here>
    type: gcs
    limits:
      maxDownloadMBs: 2
    bucketName: <Enter GCS bucket name>
  webhook:
    enabled: false
  configmap:
    k8s:
      plugins:
        k8s:
          default-env-vars:
            - TFY_INTERNAL_SIGNED_URL_SERVER_HOST: >-
                http://tfy-signed-url-server.tfy-workflow-propeller.svc.cluster.local:3001
    core:
      propeller:
        leader-election:
          enabled: true
          retry-period: 2s
          lease-duration: 15s
          renew-deadline: 10s
          lock-config-map:
            name: propeller-leader
            namespace: tfy-workflow-propeller
        metadata-prefix: gs://<Enter GCS bucket name>/tfy-workflow-propeller/metadata
        rawoutput-prefix: gs://<Enter GCS bucket name>/tfy-workflow-propeller/raw_data
        publish-k8s-events: true
    admin:
      admin:
        Command:
          - echo
          - <Enter your cluster token (can be found in tfy-agent helm chart>
        AuthType: ExternalCommand
        # endpoint: app.truefoundry.com:443
        endpoint: <Enter your control plane host (without https://)>:443
        insecure: false
        AuthorizationHeader: authorization
      event:
        rate: 500
        type: admin
        capacity: 100
    logger:
      level: 5
      show-source: true
  flyteadmin:
    enabled: false
  datacatalog:
    enabled: false
  flyteconsole:
    enabled: false
  flytepropeller:
    enabled: true
    serviceAccount:
      create: true
      annotations:
        iam.gke.io/gcp-service-account: >-
          <Enter your GCP Service Account Email>
  workflow_scheduler:
    enabled: false
  workflow_notifications:
    enabled: false
  cluster_resource_manager:
    enabled: false
tfySignedURLServer:
  env:
    GS_BUCKET_NAME: <Enter you you GCS bucket name (without gs:// prefix)>
    DEFAULT_CLOUD_PROVIDER: gcp
  enabled: true

Azure

  1. Create a Azure Blob Storage Account and Container (or use an already existing container). This should be in same region as your cluster.
  2. Create a Storage Account Key (Can be found in Connection String) with Admin access to the storage account.
  3. Fill the values in the file below and deploy the tfy-workflow-propeller.
flyte-core:
  common:
    ingress:
      enabled: false
  secrets:
    adminOauthClientCredentials:
      enabled: true
  storage:
    type: custom
    custom:
      stow:
        kind: azure
        config:
          key: <Enter your Azure Storage Account Key>
          account: <Enter your Storage Account Name>
      type: stow
      container: <Enter name of the Storage Container>
      connection: {}
      enable-multicontainer: true
    limits:
      maxDownloadMBs: 2
  webhook:
    enabled: false
  configmap:
    k8s:
      plugins:
        k8s:
          default-env-vars:
            - AZURE_STORAGE_ACCOUNT_NAME: <Enter your Storage Account Name>
            - AZURE_STORAGE_ACCOUNT_KEY: <Enter your Azure Storage Account Key>
    core:
      propeller:
        leader-election:
          enabled: true
          retry-period: 2s
          lease-duration: 15s
          renew-deadline: 10s
          lock-config-map:
            name: propeller-leader
            namespace: tfy-workflow-propeller
        metadata-prefix: abfs://<Enter name of the Storage Container>/tfy-workflow-propeller/metadata
        rawoutput-prefix: abfs://<Enter name of the Storage Container>/tfy-workflow-propeller/raw_data
        publish-k8s-events: true
    admin:
      admin:
        Command:
          - echo
          - <Enter your cluster token (can be found in tfy-agent helm chart>
        AuthType: ExternalCommand
        # endpoint: app.truefoundry.com:443
        endpoint: <Enter your control plane host (without https://)>:443
        insecure: false
        AuthorizationHeader: authorization
      event:
        rate: 500
        type: admin
        capacity: 100
    logger:
      level: 5
      show-source: true
  flyteadmin:
    enabled: false
  datacatalog:
    enabled: false
  flyteconsole:
    enabled: false
  flytepropeller:
    enabled: true
    serviceAccount:
      create: true
  workflow_scheduler:
    enabled: false
  workflow_notifications:
    enabled: false
  cluster_resource_manager:
    enabled: false