Truefoundry Docs

A run is used to represent a single invocation of a job, a script or a ML experiment. You can create a run at the beginning of your script or notebook, log parameters, metrics, artifacts, models, tags and finally end the run. This provides an easy to keep track of all data related to job runs or ML experiments. A quick code snippet to create a run and end it:

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
# Your code here.
run.end()

You can organize multiple runs under a single ml_repo. For example, the run svm-model will be created under the ml_repo iris-demo. You can view these runs in the TrueFoundry dashboard.

TrueFoundry Dashboard

Create and end a run

Python

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
# Your code here.
run.end()

Add tags to a run

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.set_tags({"env": "development", "task": "classification"})
# Your code here.
run.end()

You can view the tags from the dashboard and also create new tags.

Log parameters

Parameters are used to store the configuration of a run. This can be either the inputs to your script or the hyperparameters of your model during training like learning_rate, cache_size. The parameter values are stringified before storing.You can log parameters using the log_params as shown below:

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")

run.log_params({"cache_size": 200.0, "kernel": "linear"})

run.end()

Parameters are immutable and you cannot change the value of param once logged. If you need to change the value of param, it basically means that you are changing your input configuration and it’s best to create a new run for that.

Viewing logged parameter in dashboard

Filtering runs bases on parameter value

To filters runs, click on top right corner of the screen to apply the required filter.

Capturing command-line arguments

We can capture command-line arguments directly from the argparse.Namespace object.

import argparse
from truefoundry.ml import get_client

parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, required=True)
args = parser.parse_args()

client = get_client()
run = client.create_run(ml_repo="iris-demo")

run.log_params(args)

run.end()

Log metrics

Metrics are values that help you to evaluate and compare different runs - for e.g. accuracy, f1 score. You can log any output of your script as a metric.You can capture metrics using the log_metrics method.

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.log_params({"cache_size": 200.0, "kernel": "linear"})
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})

run.end()

These metrics can be seen in Truefoundry dashboard. Filters can be used on metrics values to filter out runs as shown in the figure.

Metrics Overview

Filter runs on the basis of metrics

Step-wise metric logging

You can capture step-wise metrics too using the step argument.

for global_step in range(1000):
    run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6}, step=global_step)

The stepwise-metrics can be visualized as graphs in the dashboard.

Step-wise metrics

Should I use epoch or global step as a value for the `step` argument?

If available you should use the global step as a value for the step argument. To capture epoch-level metric aggregates, you can use the following pattern.

run.log_metrics(
  (metric_dict = { "epoch/train_accuracy": 0.7, epoch: epoch }),
  (step = global_step)
);

Log Artifacts

import os
from truefoundry.ml import get_client
from truefoundry.ml import ArtifactPath

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.log_params({"cache_size": 200.0, "kernel": "linear"})
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})

# Just creating sample files to log as artifacts
# os.makedirs("my-folder", exist_ok=True)
# with open("my-folder/file-inside-folder.txt", "w") as f:
#     f.write("Hello!")

# with open("just-a-file.txt", "w") as f:
#     f.write("Hello from file!")

artifact_version = run.log_artifact(
    name="my-artifact",
    artifact_paths=[
        # Add files and folders here, `ArtifactPath` takes source and destination
        # source can be single file path or folder path
        # destination can be file path or folder path
        # Note: When source is a folder path, destination is always interpreted as folder path
        ArtifactPath(src="just-a-file.txt"),
        ArtifactPath(src="my-folder/", dest="cool-dir"),
        ArtifactPath(src="just-a-file.txt", dest="cool-dir/copied-file.txt")
    ],
    description="This is a sample artifact",
    metadata={"created_by": "my-username"}
)
print(artifact_version.fqn)
run.end()

Log Models

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.log_params({"cache_size": 200.0, "kernel": "linear"})
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})

model_version = run.log_model(
    name="name-for-the-model",
    model_file_or_folder="path/to/model/file/or/folder/on/disk",
    framework=<None or Framework> # Check 
)
run.end()

Log Images

You can also log images in different steps in a run. Images can be associated with a step number, in case you are running multiple epochs in training and want to log the images at different steps.PIL package is needed to log images. To install the PIL package, run

pip install pillow

Here is the sample code to log images from different sources:

from truefoundry.ml import get_client, Image
import numpy as np
import PIL.Image

client = get_client()
run = client.create_run(
    ml_repo="my-classification-project",
)

imarray = np.random.randint(low=0, high=256, size=(100, 100, 3))
im = PIL.Image.fromarray(imarray.astype("uint8")).convert("RGB")
im.save("result_image.jpeg")

images_to_log = {
    "logged-image-array": Image(data_or_path=imarray),
    "logged-pil-image": Image(data_or_path=im),
    "logged-image-from-path": Image(data_or_path="result_image.jpeg"),
}

run.log_images(images_to_log, step=1)
run.end()

Images are represented and logged using this class in TrueFoundry.You can initialize truefoundry.ml.Image by either by using a local path or you can use a numpy array / PIL.Image object.You can also log caption and the actual and predicted values for an image as shown in the examples below.

Logging images with caption and a class label

from keras.datasets import mnist
from truefoundry.ml import get_client, Image
import time
import numpy as np

data = mnist.load_data()
(X_train, y_train), (X_test, y_test) = data

client = get_client()
run = client.create_run("mnist-sample")

actuals = list(y_test)
predictions = list(np.random.randint(9, size=10))

img_dict = {}
for i in range(10):
    img_dict[str(i)] = Image(
        data_or_path=X_train[i],
        caption="mnist sample",
        class_groups={
            "actuals": str(actuals[i]),
            "predictions": str(predictions[i])
            },
    )

run.log_images(img_dict)

The logged images can be visualized in the TrueFoundry dashboard.

You can also log images with multi-label classification problems.

images_to_log = {
    "logged-image-array": truefoundry.ml.Image(
        data_or_path=imarray,
        caption="testing image logging",
        class_groups={"actuals": ["dog", "human"], "predictions": ["cat", "human"]},
    ),
}

run.log_images(images_to_log, step=1)

Log Plots

You can also log plots in a run and visualize them in the TrueFoundry Dashboard. You can associate a plot with a step number, in case you are running multiple epochs in training and want to log the plots at different steps.You can log custom matplotlib, plotly plots as shown in examples below:

Matplotlib Plot
Seaborn Plot
Plotly Plot

from truefoundry.ml import get_client
from sklearn.metrics import ConfusionMatrixDisplay
import matplotlib.pyplot as plt

client = get_client()
run = client.create_run(
    ml_repo="my-classification-project",
)

ConfusionMatrixDisplay.from_predictions(["spam", "ham"], ["ham", "ham"])

run.log_plots({"confusion_matrix": plt}, step=1)
run.end()

You can visualize the logged plots in the TrueFoundry Dashboard.

Accessing Runs in TrueFoundry

To interact with runs in TrueFoundry, you can use the provided methods in the TrueFoundryClient class. Here are the different possibilities to access runs:

Get a Run by ID

To retrieve an existing run by its ID, use theget_run_by_id method:

client = TrueFoundryClient();
run = client.get_run_by_id("run_id_here");

Get a Run by Fully Qualified Name (FQN)

If you have the fully qualified name (FQN) of a run, which follows the pattern tenant_name/ml_repo/run_name, you can use the get_run_by_fqn method:

Python

client = TrueFoundryClient();
run = client.get_run_by_fqn("tenant_name/ml_repo/run_name");

Get All Runs for a Project

To retrieve all the runs’ names and IDs for a project, use the get_all_runs method:

Python

client = TrueFoundryClient();
runs_df = client.get_all_runs((ml_repo = "project_name_here"));

Search Runs

You can search for runs that match specific criteria using the search_runs method:

Python

client = TrueFoundryClient()
runs = client.search_runs(
    ml_repo="project_name_here",
    filter_string="metrics.accuracy > 0.75",
    order_by=["metric.accuracy DESC"],
)
for run in runs:
    print(run)

Get Tags for a Run

You can use the get_tags method. It returns a dictionary.

from truefoundry.ml import get_client

client = get_client()
run = client.get_run("run-id-of-the-run")

print(run.get_tags())

Get Parameters for a Run

You can use the get_params method. It returns a dictionary

from truefoundry.ml import get_client

client = get_client()
run = client.get_run("run-id-of-the-run")

print(run.get_params())

Get Metrics for a Run

You can use the get_metricsmethod. It returns a dictionary.

from truefoundry.ml import get_client

client = get_client()
run = client.get_run("run-id-of-the-run")

metrics = run.get_metrics()

for metric_name, metric_history in metrics.items():
    print(f"logged metrics for metric {metric_name}:")
    for metric in metric_history:
        print(f"value: {metric.value}")
        print(f"step: {metric.step}")
        print(f"timestamp_ms: {metric.timestamp}")
        print("--")

run.end()

FAQs

Can anyone create a run under my ml_repo?

You will need to have minimum of Project Editor role to create a run under a ml_repo. Project Viewer role does not have permission to create a run.

Can I use runs as a context manager?

Yes, we can use runs as a context manager. A run will be automatically ended after the execution exits the with block.

client.create_ml_repo("iris-demo")

run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
with run:
    # Your code here.
    ...

# No need to call run.end()

Are run names unique?

Yes. run names under a ml_repo are unique. If a run name already exists, we add a suffix to make it unique.
If you do not pass a run name while creating a run, we generate a random name.

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo")

print(run.run_name)
run.end()

How runs are identified?

Runs are identified by by their id.

from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo")

print(run.run_id)
run.end()

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

LLM Tracing

Advanced Features

Create Runs and Log Data

Viewing logged parameter in dashboard

Filtering runs bases on parameter value

Capturing command-line arguments

Step-wise metric logging

Should I use epoch or global step as a value for the `step` argument?

Logging images with caption and a class label

Accessing Runs in TrueFoundry

FAQs

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

LLM Deployment

LLM Finetuning

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

LLM Tracing

Advanced Features

​Viewing logged parameter in dashboard

​Filtering runs bases on parameter value

​Capturing command-line arguments

​Step-wise metric logging

​Should I use epoch or global step as a value for the step argument?

​Logging images with caption and a class label

​Accessing Runs in TrueFoundry

​FAQs

Viewing logged parameter in dashboard

Filtering runs bases on parameter value

Capturing command-line arguments

Step-wise metric logging

Should I use epoch or global step as a value for the `step` argument?

Logging images with caption and a class label

Accessing Runs in TrueFoundry

FAQs