Skip to main content
A run is used to represent a single invocation of a job, a script or a ML experiment. You can create a run at the beginning of your script or notebook, log parameters, metrics, artifacts, models, tags and finally end the run. This provides an easy to keep track of all data related to job runs or ML experiments. A quick code snippet to create a run and end it:
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
# Your code here.
run.end()
You can organize multiple runs under a single ml_repo. For example, the run svm-model will be created under the ml_repo iris-demo. You can view these runs in the TrueFoundry dashboard.

TrueFoundry Dashboard

Python
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
# Your code here.
run.end()
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.set_tags({"env": "development", "task": "classification"})
# Your code here.
run.end()
You can view the tags from the dashboard and also create new tags.
Parameters are used to store the configuration of a run. This can be either the inputs to your script or the hyperparameters of your model during training like learning_rate, cache_size. The parameter values are stringified before storing.You can log parameters using the log_params as shown below:
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")

run.log_params({"cache_size": 200.0, "kernel": "linear"})

run.end()
Parameters are immutable and you cannot change the value of param once logged. If you need to change the value of param, it basically means that you are changing your input configuration and it’s best to create a new run for that.

Viewing logged parameter in dashboard

Filtering runs bases on parameter value

To filters runs, click on top right corner of the screen to apply the required filter.

Capturing command-line arguments

We can capture command-line arguments directly from the argparse.Namespace object.
import argparse
from truefoundry.ml import get_client

parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, required=True)
args = parser.parse_args()

client = get_client()
run = client.create_run(ml_repo="iris-demo")

run.log_params(args)

run.end()
Metrics are values that help you to evaluate and compare different runs - for e.g. accuracy, f1 score. You can log any output of your script as a metric.You can capture metrics using the log_metrics method.
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.log_params({"cache_size": 200.0, "kernel": "linear"})
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})

run.end()
These metrics can be seen in Truefoundry dashboard. Filters can be used on metrics values to filter out runs as shown in the figure.

Metrics Overview

Filter runs on the basis of metrics

Step-wise metric logging

You can capture step-wise metrics too using the step argument.
for global_step in range(1000):
    run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6}, step=global_step)
The stepwise-metrics can be visualized as graphs in the dashboard.

Step-wise metrics

Should I use epoch or global step as a value for the step argument?

If available you should use the global step as a value for the step argument. To capture epoch-level metric aggregates, you can use the following pattern.
run.log_metrics(
  (metric_dict = { "epoch/train_accuracy": 0.7, epoch: epoch }),
  (step = global_step)
);
import os
from truefoundry.ml import get_client
from truefoundry.ml import ArtifactPath

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.log_params({"cache_size": 200.0, "kernel": "linear"})
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})

# Just creating sample files to log as artifacts
# os.makedirs("my-folder", exist_ok=True)
# with open("my-folder/file-inside-folder.txt", "w") as f:
#     f.write("Hello!")

# with open("just-a-file.txt", "w") as f:
#     f.write("Hello from file!")

artifact_version = run.log_artifact(
    name="my-artifact",
    artifact_paths=[
        # Add files and folders here, `ArtifactPath` takes source and destination
        # source can be single file path or folder path
        # destination can be file path or folder path
        # Note: When source is a folder path, destination is always interpreted as folder path
        ArtifactPath(src="just-a-file.txt"),
        ArtifactPath(src="my-folder/", dest="cool-dir"),
        ArtifactPath(src="just-a-file.txt", dest="cool-dir/copied-file.txt")
    ],
    description="This is a sample artifact",
    metadata={"created_by": "my-username"}
)
print(artifact_version.fqn)
run.end()
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
run.log_params({"cache_size": 200.0, "kernel": "linear"})
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})

model_version = run.log_model(
    name="name-for-the-model",
    model_file_or_folder="path/to/model/file/or/folder/on/disk",
    framework=<None or Framework> # Check 
)
run.end()
You can also log images in different steps in a run. Images can be associated with a step number, in case you are running multiple epochs in training and want to log the images at different steps.PIL package is needed to log images. To install the PIL package, run
pip install pillow
Here is the sample code to log images from different sources:
from truefoundry.ml import get_client, Image
import numpy as np
import PIL.Image

client = get_client()
run = client.create_run(
    ml_repo="my-classification-project",
)

imarray = np.random.randint(low=0, high=256, size=(100, 100, 3))
im = PIL.Image.fromarray(imarray.astype("uint8")).convert("RGB")
im.save("result_image.jpeg")

images_to_log = {
    "logged-image-array": Image(data_or_path=imarray),
    "logged-pil-image": Image(data_or_path=im),
    "logged-image-from-path": Image(data_or_path="result_image.jpeg"),
}

run.log_images(images_to_log, step=1)
run.end()
Images are represented and logged using this class in TrueFoundry.You can initialize truefoundry.ml.Image by either by using a local path or you can use a numpy array / PIL.Image object.You can also log caption and the actual and predicted values for an image as shown in the examples below.

Logging images with caption and a class label

from keras.datasets import mnist
from truefoundry.ml import get_client, Image
import time
import numpy as np

data = mnist.load_data()
(X_train, y_train), (X_test, y_test) = data

client = get_client()
run = client.create_run("mnist-sample")

actuals = list(y_test)
predictions = list(np.random.randint(9, size=10))

img_dict = {}
for i in range(10):
    img_dict[str(i)] = Image(
        data_or_path=X_train[i],
        caption="mnist sample",
        class_groups={
            "actuals": str(actuals[i]),
            "predictions": str(predictions[i])
            },
    )

run.log_images(img_dict)
The logged images can be visualized in the TrueFoundry dashboard.You can also log images with multi-label classification problems.
images_to_log = {
    "logged-image-array": truefoundry.ml.Image(
        data_or_path=imarray,
        caption="testing image logging",
        class_groups={"actuals": ["dog", "human"], "predictions": ["cat", "human"]},
    ),
}

run.log_images(images_to_log, step=1)
You can also log plots in a run and visualize them in the TrueFoundry Dashboard. You can associate a plot with a step number, in case you are running multiple epochs in training and want to log the plots at different steps.You can log custom matplotlib, plotly plots as shown in examples below:
  • Matplotlib Plot
  • Seaborn Plot
  • Plotly Plot
from truefoundry.ml import get_client
from sklearn.metrics import ConfusionMatrixDisplay
import matplotlib.pyplot as plt

client = get_client()
run = client.create_run(
    ml_repo="my-classification-project",
)

ConfusionMatrixDisplay.from_predictions(["spam", "ham"], ["ham", "ham"])

run.log_plots({"confusion_matrix": plt}, step=1)
run.end()
You can visualize the logged plots in the TrueFoundry Dashboard.

Accessing Runs in TrueFoundry

To interact with runs in TrueFoundry, you can use the provided methods in the TrueFoundryClient class. Here are the different possibilities to access runs:
To retrieve an existing run by its ID, use theget_run_by_id method:
client = TrueFoundryClient();
run = client.get_run_by_id("run_id_here");
If you have the fully qualified name (FQN) of a run, which follows the pattern tenant_name/ml_repo/run_name, you can use the get_run_by_fqn method:
Python
client = TrueFoundryClient();
run = client.get_run_by_fqn("tenant_name/ml_repo/run_name");
To retrieve all the runs’ names and IDs for a project, use the get_all_runs method:
Python
client = TrueFoundryClient();
runs_df = client.get_all_runs((ml_repo = "project_name_here"));
You can search for runs that match specific criteria using the search_runs method:
Python
client = TrueFoundryClient()
runs = client.search_runs(
    ml_repo="project_name_here",
    filter_string="metrics.accuracy > 0.75",
    order_by=["metric.accuracy DESC"],
)
for run in runs:
    print(run)
You can use the get_tags method. It returns a dictionary.
from truefoundry.ml import get_client

client = get_client()
run = client.get_run("run-id-of-the-run")

print(run.get_tags())
You can use the get_params method. It returns a dictionary
from truefoundry.ml import get_client

client = get_client()
run = client.get_run("run-id-of-the-run")

print(run.get_params())
You can use the get_metricsmethod. It returns a dictionary.
from truefoundry.ml import get_client

client = get_client()
run = client.get_run("run-id-of-the-run")

metrics = run.get_metrics()

for metric_name, metric_history in metrics.items():
    print(f"logged metrics for metric {metric_name}:")
    for metric in metric_history:
        print(f"value: {metric.value}")
        print(f"step: {metric.step}")
        print(f"timestamp_ms: {metric.timestamp}")
        print("--")

run.end()

FAQs

You will need to have minimum of Project Editor role to create a run under a ml_repo. Project Viewer role does not have permission to create a run.
Yes, we can use runs as a context manager. A run will be automatically ended after the execution exits the with block.
client.create_ml_repo("iris-demo")

run = client.create_run(ml_repo="iris-demo", run_name="svm-model")
with run:
    # Your code here.
    ...

# No need to call run.end()
Yes. run names under a ml_repo are unique. If a run name already exists, we add a suffix to make it unique.
If you do not pass a run name while creating a run, we generate a random name.
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo")

print(run.run_name)
run.end()
Runs are identified by by their id.
from truefoundry.ml import get_client

client = get_client()
run = client.create_run(ml_repo="iris-demo")

print(run.run_id)
run.end()
I