Introduction to a Job

TrueFoundry jobs enable you to run task-oriented workloads which are meant to run for a certain duration to complete a task, then terminate and release the resources.

Here are some scenarios where Jobs are particularly well-suited:

  • Model Training: Train machine learning models on large datasets, where the resource gets freed up once the training is complete.
  • Maintenance and Cleanup: Schedule routine maintenance tasks, such as data backups, model retraining, report generation etc.
  • Batch Inference: Perform large-scale batch inference tasks, such as processing large volumes of data using trained models, leveraging Job's ability to handle parallel workloads efficiently.

Key considerations when building a Job

We need to consider the following things while deploying jobs:

  1. Dockerize the code to be deployed.
  2. Schedule Job to specify when the Job should run.
  3. Define the resources requirements for your service - Define Resources - While the documentation is specific for services, its the same as what is required for jobs.
  4. Parameterize a job to enable ease of changing argument values.
  5. [Optional] Defining environment variables and secrets to be injected into the code - Environment Variables and Secrets.
  6. Set retries and timeout for your jobs in case the job gets stuck or fails for some reason.
  7. [Optional] Set Concurrency Limit to specify how many instances of a Job can run at once.
  8. [Optional] Mount files or volumes to your job - Mount File or Volumes.
  9. Access data from S3 or other clouds
  10. Update, Rollback, Promote your Job: While the documentation is specifically for Services, the Update, Rollback, and Promote process follows a similar flow for Jobs.
  11. Setting up CI/CD for your Job: While the documentation is specifically for Services, the CI/CD setup process follows a similar flow for Jobs.

Running your First Job:

To run your first job, choose one of the following guides based on the location of your job code: