Introduction to Job

Jobs play a major role in K8s and help you to perform tasks within your application. Jobs provide a way to run short-lived, parallel, or sequential batch tasks within the cluster.

Jobs ensure that a specified number of pods successfully complete their tasks before considering the job as finished. They offer mechanisms for task parallelism, completion tracking, and automatic retries, making them ideal for executing one-time or on-demand tasks.

Once a job is completed, compute and memory resources are released, hence we don't incur any extra cost.

Job Lifecycle

  • FINISHED: A job executes the code once, and if it completes successfully, the job is marked as FINISHED.
  • FAILED: A job can be configured to retry several times on failure. The job is marked as FAILED if it does not successfully finish even after the configured number of retries.
  • TERMINATED: A job when terminated manually before completion, is marked as TERMINATED.

Running a Job

Jobs can be run or triggered in multiple ways:

  • Manual: This is good for ad hoc use cases and can be triggered manually. An example can be a model training job which can be run when needed.
  • Schedule: A job can be triggered on a schedule like daily, weekly, or at 9 AM every Monday. An example of this can be a batch inference job running every morning at 8 AM on the previous day's incoming data.