Creating a workflow with different container images

This guide helps you to create a workflow with tasks that have different task configs and different container applications. This can have variety of applications like keeping different configuration (resources and container image) for steps requiring GPUs and steps not requiring GPUs

In this example, We will write a simple workflow to get the latest price of Bitcoin, Ethereum, and Litecoin, the main aim of this example is to understand the different types of configs. we will create two configs, one for GPU and one for CPU like this.

Prerequsite

Before you proceed with the guide, make sure you have the following:

  • Truefoundry CLI: Set up and configure the TrueFoundry CLI tool on your local machine by following the Setup for CLI guide.
  • Workspace: To deploy your workflow, you'll need a workspace. If you don't have one, you can create it using this guide: Creating a Workspace or seek assistance from your cluster administrator.

Creating the workflow

Create a workflow.py where we will write the code for our workflow and place in the project root folder. (With other dependent files and requirements.txt)

workflow.py

from truefoundry.workflow import (
    task,
    workflow,
    PythonTaskConfig,
    TaskPythonBuild,
)
from truefoundry.deploy import Resources, NvidiaGPU
import requests
import json

cpu_config = PythonTaskConfig(
    image=TaskPythonBuild(
        python_version="3.9",
        pip_packages=["truefoundry[workflow]==0.4.8"],
        # requirements_path="requirements.txt"
    ),
    resources=Resources(cpu_request=0.5)
)

gpu_config = PythonTaskConfig(
    image=TaskPythonBuild(
        python_version="3.9",
        pip_packages=["truefoundry[workflow]==0.4.8", "pynvml==11.5.0"],
        # requirements_path="requirements.txt",
        cuda_version="11.5-cudnn8",
    ),
    env={
        "NVIDIA_DRIVER_CAPABILITIES": "compute,utility",
        "NVIDIA_VISIBLE_DEVICES": "all",
    },
    resources=Resources(cpu_request=1.5, devices=[NvidiaGPU(name="T4", count=1)])
)


# Python Task: Fetch real-time prices for multiple cryptocurrencies from the CoinGecko API
@task(task_config=cpu_config)
def fetch_crypto_data() -> str:
    response = requests.get(
        "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin,ethereum,litecoin&vs_currencies=usd"
    )
    return response.text


# Python Task: Process the data to extract prices
@task(task_config=gpu_config)
def extract_prices(data: str) -> dict:
    from pynvml import nvmlDeviceGetCount, nvmlInit

    nvmlInit()
    assert nvmlDeviceGetCount() > 0

    json_data = json.loads(data)
    prices = {
        "bitcoin": json_data["bitcoin"]["usd"],
        "ethereum": json_data["ethereum"]["usd"],
        "litecoin": json_data["litecoin"]["usd"],
    }
    return prices


# Workflow: Combine all tasks
@workflow
def crypto_workflow() -> dict:
    data = fetch_crypto_data()
    prices = extract_prices(data=data)
    return prices
  • As you can see for the fetch_crypto_data function we have defined the resource which does not use GPU whereas task extract_prices uses GPU and hence we have defined the env variable and we have also used the pynvml package to check whether GPU is present or not by asserting the condition that assert nvmlDeviceGetCount() > 0.

Now run the below command in the terminal to deploy your workflow, replace <workfspace-fqn> with the workspace fqn which you can find on the UI.

tfy deploy workflow \
  --name multi-image-workflow \
  --file workflow.py \
  --workspace_fqn "Paste your workspace FQN here"