Log and Get Metrics
Metrics are values that help you to evaluate and compare different runs. For example, accuracy
, f1 score
.
Capturing metrics
You can capture metrics using the log_metrics
method.
from truefoundry.ml import get_client
client = get_client()
run = client.create_run(ml_repo="iris-demo")
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6})
run.end()
These metrics can be seen in MLFoundry dashboard. Filters can be used on metrics values to filter out runs as shown in the figure.
These metrics can also be found in the overview section of run in the dashboard.
Accessing the metrics for a run
You can use the get_metrics
method. It returns a dictionary.
from truefoundry.ml import get_client
client = get_client()
run = client.get_run("run-id-of-the-run")
metrics = run.get_metrics()
for metric_name, metric_history in metrics.items():
print(f"logged metrics for metric {metric_name}:")
for metric in metric_history:
print(f"value: {metric.value}")
print(f"step: {metric.step}")
print(f"timestamp_ms: {metric.timestamp}")
print("--")
run.end()
Step-wise metric logging
You can capture step-wise metrics too using the step
argument.
for global_step in range(1000):
run.log_metrics(metric_dict={"accuracy": 0.7, "loss": 0.6}, step=global_step)
The stepwise-metrics can be visualized as graphs in the dashboard.
Should I use epoch or global step as a value for the step
argument?
step
argument?If available you should use the global step as a value for the step
argument.
To capture epoch-level metric aggregates, you can use the following pattern.
run.log_metrics(
metric_dict={"epoch/train_accuracy": 0.7, "epoch": epoch}, step=global_step
)
Updated about 1 month ago