Training and logging your job metadata

Agenda

In this guide we will :-

  1. Train a scikit-learn model
  2. Log the model and model-metadata via mlfoundry
  3. Deploy the training code as a job via servicefoundry

Prerequisites

Before we start, we will need:

  1. A Workspace FQN - We can use an existing workspace or create one from the Workspaces page. If you already have a Workspace you can use that. Copy and note down the workspace FQN.

  1. Since we are pushing our model to Truefoundry Model Registry we will need to add our Truefoundry API Key as a Secret.

    1. Create and copy an API Key from the Settings page.

    2. Visit Secrets dashboard and Create a new Secret Group.

    3. Create a new Secret in this Secret group and Paste your API Key from Step 1.

    4. Once saved, note down the Secret FQN by clicking the Copy button beside the value. It would look like following: <username>:<secret-group-name>:<secret-name> (E.g. user:iris-train-job:MLF_API_KEY)

NOTE: A workspace is a resource (CPU, Memory) bound environment where we deploy jobs, services.

File Structure

We will require to create the following files for this guide:-

  • train.py :- containing our training code
  • requirements.txt :- contains our dependencies
  • deploy.py :- contains our deployment code

The final file structure will be like this:-

.
├── train.py
├── requirements.txt
└── deploy.py

Training Code

requirements.txt

The file contains our dependencies.

pandas
numpy
scikit-learn
pickle-mixin

# for experiment tracking and model registry
mlfoundry

# for deploying our job deployments
servicefoundry

train.py

This file fetches the data, trains the model and pushes it to model registry.
Follow this recipe to understand the train.py :-

Running the Training as a Job

Now we will deploy the training code as a job.

A job basically executes the code once.
We can do our training as a job, so that we are able to use our workspaces instead of our local environment for training.
This can be beneficial when we require more resources for training. The compute and memory resources are released once the job is completed and hence we don't incur any cost once the job completes.

deploy.py

This file deploys our training code as a job.
Follow this recipe to understand the deployment.py :-

Now you can go ahead and write the following command in your terminal:-

python deploy.py

This will go ahead and deploy your training code as a job to be executed in your workspace.

On successful deployment, the Job will be created and run immediately.

We can now visit our Applications page to check Build status, Build Logs, Runs History and monitor progress of runs.