Deploy a Cron Job

πŸ‘

What you'll learn

  • Deploying our training code as a job via servicefoundry

This is a guide to deploying training code as a cron job via servicefoundry

After you complete the guide, you will have a successfully deployed job. Your jobs deployment dashboard will look like this:

Cron Jobs

A cron job runs the defined job on a repeating schedule. This can be useful to retrain a model periodically, generate reports, and more.

Understanding the cron format

Going ahead, we will use the cron format to specify our jobs schedule.

The job schedule is a cron expression. It consists of five fields representing the time to execute a specified command.

* * * * *
| | | | |
| | | | |___ day of week (0-6) (Sunday is 0)
| | | |_____ month (1-12)
| | |_______ day of month (1-31)
| |_________ hour (0-23)
|___________ minute (0-59)

We can use a site like https://crontab.guru/ to get a human-readable description of the cron expression.

Concurrency for Cron Jobs

For cron jobs, it is possible that the previous run of the job hasn't been completed while it is already time for the job to run again because of the scheduled time. This can happen if we schedule a job to run every 10 mins, and for some reason, one instance of the job takes more than 10 mins. At this point, we have three options:

  1. Start the new instance of the job even if the previous one is running
  2. Do not start the new instance of the job and skip this job run since the previous job is running.
  3. Terminate the currently running job and start the new one.

The desired behavior depends on the use case, but you can achieve all three scenarios using the concurrency_policy setting. The possible options are:

  1. Forbid: This is the default. Do not allow concurrent runs.
  2. Allow: Allow jobs to run concurrently.
  3. Replace: Replace the current job with the new one.

Concurrency doesn't apply to manually triggered jobs. In that case, it always creates a new job run.

Project structure

To complete this guide, you are going to create the following files:

  • train.py : contains our training code
  • requirements.txt : contains our dependencies
  • deploy.py/deploy.yaml: contains our deployment code / deployment configuration. (Depending on whether you choose to use our python SDK or create a YAML file)

Your final file structure is going to look like this:

.
β”œβ”€β”€ train.py
β”œβ”€β”€ deploy.py / deploy.yaml
└── requirements.txt

As you can see, all the following files are created in the same folder/directory.

Step 1: Implement the training code

The first step is to create a job that trains a scikit-learn model on the iris dataset.

We start with a train.py containing our training code and requirements.txt with our dependencies.

.
β”œβ”€β”€ train.py
└── requirements.txt

train.py

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

X, y = load_iris(as_frame=True, return_X_y=True)
X = X.rename(columns={
        "sepal length (cm)": "sepal_length",
        "sepal width (cm)": "sepal_width",
        "petal length (cm)": "petal_length",
        "petal width (cm)": "petal_width",
})

# NOTE:- You can pass these configurations via command line
# arguments, config file, environment variables.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
# Initialize the model
clf = LogisticRegression(solver="liblinear")
# Fit the model
clf.fit(X_train, y_train)

preds = clf.predict(X_test)
print(classification_report(y_true=y_test, y_pred=preds))

Click on the Open Recipe below to understand the train.py:

requirements.txt

pandas
numpy
scikit-learn

# for deploying our job deployments
servicefoundry

Step 2: Deploying as cron job

You can deploy services on TrueFoundry programmatically either using our Python SDK, or via a YAML file.

So now you can choose between either creating a deploy.py file, which will use our Python SDK.
Or you can choose to create a deploy.yaml configuration file and then use the servicefoundry deploy command

Via python SDK

File Structure

.
β”œβ”€β”€ train.py
β”œβ”€β”€ deploy.py
└── requirements.txt

deploy.py

🚧

In the code below, ensure to replace "YOUR_WORKSPACE_FQN" in the last line with your WORKSPACE_FQN

import argparse
import logging
from servicefoundry import Build, Job, PythonBuild, Schedule

logging.basicConfig(level=logging.INFO)

parser = argparse.ArgumentParser()
parser.add_argument("--workspace_fqn", required=True, type=str)
args = parser.parse_args()

# First we define how to build our code into a Docker image
image = Build(
    build_spec=PythonBuild(
        command="python train.py",
        requirements_path="requirements.txt",
    )
)
job = Job(
    name="iris-train-cron-job",
    image=image,
    trigger=Schedule(
      schedule="0 8 1 * *",
      concurrency_policy="Forbid" # Any one of ["Forbid", "Allow", "Replace"]
    )
)
job.deploy(workspace_fqn=args.workspace_fqn)

Follow the recipe below to understand the deploy.py file :

To deploy the job using Python API use:

python deploy.py --workspace_fqn <YOUR WORKSPACE FQN HERE>

Via YAML file

File Structure

.
β”œβ”€β”€ train.py
β”œβ”€β”€ deploy.yaml
└── requirements.txt

deploy.yaml

name: iris-train-job
type: job
image:
  type: build
  build_source:
    type: local
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    requirements_path: requirements.txt
trigger:
  type: scheduled
  schedule: "0 8 1 * *"
  concurrency_policy: "Forbid"

Follow the recipe below to understand the deploy.yaml file :-

To deploy the job using Python API use:

servicefoundry deploy --workspace-fqn YOUR_WORKSPACE_FQN --file deploy.yaml

Run the above command from the same directory containing the train.py and requirements.txt files.

πŸ“˜

.tfyignore files

If there are any files you don't want to be copied to the workspace, like a data file, or any redundant files. You can use .tfyignore files in that case.

End result

On successful deployment, the Job will be created and run immediately.

We can now visit our Applications page to check Build status, Build Logs, Runs History and monitor progress of runs.
See Monitoring and Debugging guide for more details.

See Also