Introduction to a Job
TrueFoundry jobs enable you to run task-oriented workloads which are meant to run for a certain duration to complete a task, then terminate and release the resources.
Here are some scenarios where Jobs are particularly well-suited:
- Model Training: Train machine learning models on large datasets, where the resource gets freed up once the training is complete.
- Maintenance and Cleanup: Schedule routine maintenance tasks, such as data backups, model retraining, report generation etc.
- Batch Inference: Perform large-scale batch inference tasks, such as processing large volumes of data using trained models, leveraging Job's ability to handle parallel workloads efficiently.
Key considerations when building a Job
We need to consider the following things while deploying jobs:
- Dockerize the code to be deployed.
- Schedule Job to specify when the Job should run.
- Define the resources requirements for your service - Define Resources - While the documentation is specific for services, its the same as what is required for jobs.
- Parameterize a job to enable ease of changing argument values.
- [Optional] Defining environment variables and secrets to be injected into the code - Environment Variables and Secrets.
- Set retries and timeout for your jobs in case the job gets stuck or fails for some reason.
- [Optional] Set Concurrency Limit to specify how many instances of a Job can run at once.
- [Optional] Mount files or volumes to your job - Mount File or Volumes.
- Access data from S3 or other clouds
- Update, Rollback, Promote your Job: While the documentation is specifically for Services, the Update, Rollback, and Promote process follows a similar flow for Jobs.
- Setting up CI/CD for your Job: While the documentation is specifically for Services, the CI/CD setup process follows a similar flow for Jobs.
Running your First Job:
To run your first job, choose one of the following guides based on the location of your job code:
Updated 8 months ago