Truefoundry Docs

On this page

Task Retries
Workflow Failure Handling

TrueFoundry workflows provide robust mechanisms for handling task failures and retrying failed tasks. Here’s how you can implement these features:

Task Retries

You can configure automatic retries for individual tasks using the retries parameter in the @task decorator:

@task(task_config=task_config, retries=3)
def my_task():
   ...

This configuration will attempt to execute the task up to 3 additional times if it fails. Note: These retries are specifically user retries (if the code fails due to a code error). If there are infrastructure issues, like spot-interruptions and errors like OOM killed, they are considered as infra failure and can be configured using a parameter called max-node-retries-system-failures which is a cluster level setting. The default value of this field is 3. Please contact your system admin to change this value.

Workflow Failure Handling

To handle failures at the workflow level, you can define a failure handler task and specify it using the on_failure parameter in the @workflow decorator:

from truefoundry.workflow import task, workflow

@task(task_config=task_config)
def handle_failure():
   print("Handling Failure/Sending Notification")
	 ...

@workflow(on_failure=handle_failure)
def data_pipeline():
   ...

If your workflow fails, this will run the “handle_failure” task towards the end. This can be used to clean up some resources or database entries or files and also send alert notification.

Example Of Task Config With Different Parameters Running Workflow Locally

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Platform

Deploying On Your Own Cloud

Adding Retries And Handling Failures

Task Retries

Workflow Failure Handling

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Platform

Deploying On Your Own Cloud

​Task Retries

​Workflow Failure Handling

Task Retries

Workflow Failure Handling