TrueFoundry jobs enable you to run task-oriented workloads which are meant to run for a certain duration to complete a task, then terminate and release the resources.
Here are some scenarios where Jobs are particularly well-suited:
- Model Training: Train machine learning models on large datasets, where the resource gets freed up once the training is complete.
- Maintenance and Cleanup: Schedule routine maintenance tasks, such as data backups, model retraining, report generation etc.
- Batch Inference: Perform large-scale batch inference tasks, such as processing large volumes of data using trained models, leveraging Job's ability to handle parallel workloads efficiently.
We need to consider the following things while deploying jobs:
- Dockerize the code to be deployed.
- Schedule Job to specify when the Job should run.
- Parameterize a job to enable ease of changing argument values.
- [Optional] Defining environment variables and secrets to be injected into the code - Environment Variables and Secrets.
- Set retries and timeout for your jobs in case the job gets stuck or fails for some reason.
- [Optional] Set Concurrency Limit to specify how many instances of a Job can run at once.
- [Optional] Mount files or volumes to your job - Mount File or Volumes.
- Access data from S3 or other clouds
- Update, Rollback, Promote your Job: While the documentation is specifically for Services, the Update, Rollback, and Promote process follows a similar flow for Jobs.
- Setting up CI/CD for your Job: While the documentation is specifically for Services, the CI/CD setup process follows a similar flow for Jobs.
To run your first job, choose one of the following guides based on the location of your job code:
Updated 2 days ago