MlFoundry
module MLFoundry
MLFoundry
function get_client
get_client
Initializes and returns the mlfoundry client.
Args:
disable_analytics
(bool, optional): To turn off usage analytics collection, passTrue
. By default, this is set toFalse
.
Returns:
MlFoundry
: Instance ofMlFoundry
class which represents arun
.
Examples:
- Get client
import mlfoundry
client = mlfoundry.get_client()
class MlFoundry
MlFoundry
MlFoundry.
function create_ml_repo
create_ml_repo
Creates an ML Repository.
Args:
ml_repo
(str, optional): The name of the Repository you want to create. if not given, it creates a name by itself.storage_integration_fqnc
(str, optional): The storage integration FQN to use for the experiment for saving artifacts.
Examples:
- Create Repository
import mlfoundry
client = mlfoundry.get_client()
ml_repo = client.create_ml_repo(ml_repo="my-repo")
function create_run
create_run
Initialize a run
.
In a machine learning experiment run
represents a single experiment conducted under a project.
Args:
ml_repo
(str): The name of the project under which the run will be created. ml_repo should only contain alphanumerics (a-z,A-Z,0-9) or hyphen (-). The user must haveADMIN
orWRITE
access to this project.run_name
(Optional[str], optional): The name of the run. If not passed, a randomly generated name is assigned to the run. Under a project, all runs should have a unique name. If the passedrun_name
is already used under a project, therun_name
will be de-duplicated by adding a suffix. run name should only contain alphanumerics (a-z,A-Z,0-9) or hyphen (-).tags
(Optional[Dict[str, Any]], optional): Optional tags to attach with this run. Tags are key-value pairs.
kwargs:
Returns:
MlFoundryRun
: An instance ofMlFoundryRun
class which represents arun
.
Examples:
- Create a run under current user.
import mlfoundry
client = mlfoundry.get_client()
tags = {"model_type": "svm"}
run = client.create_run(
ml_repo="my-classification-project", run_name="svm-with-rbf-kernel", tags=tags
)
run.end()
- Creating a run using context manager.
import mlfoundry
client = mlfoundry.get_client()
with client.create_run(
ml_repo="my-classification-project", run_name="svm-with-rbf-kernel"
) as run:
# ...
# Model training code
...
# `run` will be automatically marked as `FINISHED` or `FAILED`.
- Create a run in a project owned by a different user.
import mlfoundry
client = mlfoundry.get_client()
tags = {"model_type": "svm"}
run = client.create_run(
ml_repo="my-classification-project",
run_name="svm-with-rbf-kernel",
tags=tags,
)
run.end()
function get_all_runs
get_all_runs
Returns all the run name and id present under a project.
The user must have READ
access to the project.
Args:
ml_repo
(str): Name of the project.
Returns:
pd.DataFrame
: dataframe with two columns- run_id and run_name
Examples:
- get all the runs from an ml_repo
import mlfoundry
client = mlfoundry.get_client()
run = client.get_all_runs(ml_repo='my-repo')
function get_artifact
get_artifact
Get the artifact version to download contents
Args:
fqn
(str): Fully qualified name of the artifact version.
Returns:
ArtifactVersion
: An ArtifactVersion instance of the artifact
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
artifact_version = client.get_artifact(fqn="artifact:truefoundry/user/my-classification-project/sklearn-artifact:1")
# download the artifact to disk
temp = tempfile.TemporaryDirectory()
download_info = artifact_version.download(path=temp.name)
print(download_info)
function get_artifact_version
get_artifact_version
Get the model version to download contents or load it in memory
Args:
ml_repo
(str): ML Repo to which artifact is loggedartifact_name
(str): Artifact Nameartifact_type
(str): The type of artifact to fetch (acceptable values: "artifact", "model", "plot", "image")version
(str | int): Artifact Version to fetch (default is the latest version)
Returns:
ArtifactVersion
: An ArtifactVersion instance of the artifact
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
model_version = client.get_artifact_version(ml_repo="ml-repo-name", name="artifact-name", version=1)
# load the model into memory
clf = model_version.load()
# download the model to disk
temp = tempfile.TemporaryDirectory()
download_info = model_version.download(path=temp.name)
print(download_info)
function get_artifact_version_by_fqn
get_artifact_version_by_fqn
Get the artifact version to download contents
Args:
fqn
(str): Fully qualified name of the artifact version.
Returns:
ArtifactVersion
: An ArtifactVersion instance of the artifact
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
artifact_version = client.get_artifact_version_by_fqn(fqn="artifact:truefoundry/user/my-classification-project/sklearn-artifact:1")
# download the artifact to disk
temp = tempfile.TemporaryDirectory()
download_info = artifact_version.download(path=temp.name)
print(download_info)
function get_model
get_model
Get the model version to download contents or load it in memory
Args:
fqn
(str): Fully qualified name of the model version.
Returns:
The ModelVersion instance of the model.
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
model_version = client.get_model(fqn="model:truefoundry/user/my-classification-project/my-sklearn-model:1")
# load the model into memory
clf = model_version.load()
# download the model to disk
temp = tempfile.TemporaryDirectory()
download_info = model_version.download(path=temp.name)
print(download_info)
function get_model_version
get_model_version
Get the model version to download contents or load it in memory
Args:
ml_repo
(str): ML Repo to which model is loggedname
(str): Model Nameversion
(str | int): Model Version to fetch (default is the latest version)
Returns:
The ModelVersion instance of the model.
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
model_version = client.get_model_version(ml_repo="ml-repo-name", name="model-name", version=1)
# load the model into memory
clf = model_version.load()
# download the model to disk
temp = tempfile.TemporaryDirectory()
download_info = model_version.download(path=temp.name)
print(download_info)
function get_model_version_by_fqn
get_model_version_by_fqn
Get the model version to download contents or load it in memory
Args:
fqn
(str): Fully qualified name of the model version.
Returns:
The ModelVersion instance of the model.
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
model_version = client.get_model_version_by_fqn(fqn="model:truefoundry/user/my-classification-project/my-sklearn-model:1")
# load the model into memory
clf = model_version.load()
# download the model to disk
temp = tempfile.TemporaryDirectory()
download_info = model_version.download(path=temp.name)
print(download_info)
function get_run_by_fqn
get_run_by_fqn
Get an existing run
by fqn
.
fqn
stands for Fully Qualified Name. A run fqn
has the following pattern: tenant_name/ml_repo/run_name
If a run svm
under the project cat-classifier
in truefoundry
tenant, the fqn
will be truefoundry/cat-classifier/svm
.
Args:
run_fqn
(str):fqn
of an existing run.
Returns:
MlFoundryRun
: An instance ofMlFoundryRun
class which represents arun
.
Examples:
- get run by run fqn
import mlfoundry
client = mlfoundry.get_client()
run = client.get_run_by_fqn(run_fqn='truefoundry/my-repo/svm')
function get_run_by_id
get_run_by_id
Get an existing run
by the run_id
.
Args:
run_id
(str): run_id or fqn of an existingrun
.
Returns:
MlFoundryRun
: An instance ofMlFoundryRun
class which represents arun
.
Examples:
- Get run by the run id
import mlfoundry
client = mlfoundry.get_client()
run = client.get_run_by_id(run_id='a8f6dafd70aa4baf9437a33c52d7ee90')
function get_run_by_name
get_run_by_name
Get an existing run
by run_name
.
Args:
ml_repo
(str): name of an the project of which the run is part of.run_name
(str): the name of the run required
Returns:
MlFoundryRun
: An instance ofMlFoundryRun
class which represents arun
.
Examples:
- get run by run run name
import mlfoundry
client = mlfoundry.get_client()
run = client.get_run_by_name(run_name='svm', ml_repo='my-repo')
function get_tracking_uri
get_tracking_uri
Get the current tracking URI.
Returns:
The tracking URI.
Examples:
import tempfile
import mlfoundry
client = mlfoundry.get_client()
tracking_uri = client.get_tracking_uri()
print("Current tracking uri: {}".format(tracking_uri))
function list_artifact_versions
list_artifact_versions
Get all the version of a artifact to download contents or load them in memory
Args:
ml_repo
(str): Repository in which the model is stored.name
(str): Name of the artifact whose version is requiredartifact_type
(ArtifactType): Type of artifact you want for example model, image, etc.
Returns:
Iterator[ArtifactVersion]
: An iterator that yields non deleted artifact-versions of a artifact under a given ml_repo sorted reverse by the version number
Examples:
import mlfoundry
client = mlfoundry.get_client()
artifact_versions = client.list_artifact_versions(ml_repo="my-repo", name="artifact-name")
for artifact_version in artifact_versions:
print(artifact_version)
function list_artifact_versions_by_fqn
list_artifact_versions_by_fqn
List versions for a given artifact
Args:
artifact_fqn
: FQN of the Artifact to list versions for.An artifact_fqn looks like
{artifact_type}`: {org}/{user}/{project}/{artifact_name}`or
{artifact_type}`: {user}/{project}/{artifact_name}`
where artifact_type can be on of ("model", "image", "plot")
Returns:
Iterator[ArtifactVersion]
: An iterator that yields non deleted artifact versions under the given artifact_fqn sorted reverse by the version number
Yields:
ArtifactVersion
: An instance ofmlfoundry.ArtifactVersion
Examples:
import mlfoundry
mlfoundry.login(tracking_uri=https://your.truefoundry.site.com")
client = mlfoundry.get_client()
artifact_fqn = "artifact:org/user/my-project/my-artifact"
for av in client.list_artifact_versions(artifact_fqn=artifact_fqn):
print(av.name, av.version, av.description)
function list_ml_repos
list_ml_repos
Returns a list of names of ML Repos accessible by the current user.
Returns:
List[str]
: A list of names of ML Repos
function list_model_versions
list_model_versions
Get all the version of a model to download contents or load them in memory
Args:
ml_repo
(str): Repository in which the model is stored.name
(str): Name of the model whose version is required
Returns:
Iterator[ModelVersion]
: An iterator that yields non deleted model versions of a model under a given ml_repo sorted reverse by the version number
Examples:
import mlfoundry
client = mlfoundry.get_client()
model_versions = client.list_mode_version(ml_repo="my-repo", name="svm")
for model_version in model_versions:
print(model_version)
function list_model_versions_by_fqn
list_model_versions_by_fqn
List versions for a given model
Args:
model_fqn
: FQN of the Model to list versions for.A model_fqn looks like
model`: {org}/{user}/{project}/{artifact_name}`or
model`: {user}/{project}/{artifact_name}`
Returns:
Iterator[ModelVersion]
: An iterator that yields non deleted model versions under the given model_fqn sorted reverse by the version number
Yields:
ModelVersion
: An instance ofmlfoundry.ModelVersion
Examples:
import mlfoundry
mlfoundry.login(tracking_uri=https://your.truefoundry.site.com")
client = mlfoundry.get_client()
model_fqn = "model:org/user/my-project/my-model"
for mv in client.list_model_versions(model_fqn=model_fqn):
print(mv.name, mv.version, mv.description)
function log_artifact
log_artifact
Logs an artifact for the current ml_repo
.
An artifact
is a list of local files and directories. This function packs the mentioned files and directories in artifact_paths
and uploads them to remote storage linked to the ml_repo
Args:
ml_repo
(str): Name of the ML Repo to which an artifact is to be logged.name
(str): Name of the Artifact. If an artifact with this name already exists under the current ml_repo, the logged artifact will be added as a new version under thatname
. If no artifact exist with the givenname
, the given artifact will be logged as version 1.artifact_paths
(List[mlfoundry.ArtifactPath], optional): A list of pairs of (source path, destination path) to add files and folders to the artifact version contents. The first member of the pair should be a file or directory path and the second member should be the path inside the artifact contents to upload to.
E.g. >>> client.log_artifact(
... ml_repo="sample-repo",
... name="xyz",
... artifact_paths=[
mlfoundry.ArtifactPath("foo.txt", "foo/bar/foo.txt"),
mlfoundry.ArtifactPath("tokenizer/", "foo/tokenizer/"),
mlfoundry.ArtifactPath('bar.text'),
('bar.txt', ),
('foo.txt', 'a/foo.txt')
]
... )
would result in
.
└── foo/
├── bar/
│ └── foo.txt
└── tokenizer/
└── # contents of tokenizer/ directory will be uploaded here
description
(Optional[str], optional): arbitrary text upto 1024 characters to store as description. This field can be updated at any time after logging. Defaults toNone
metadata
(Optional[Dict[str, Any]], optional): arbitrary json serializable dictionary to store metadata. For example, you can use this to store metrics, params, notes. This field can be updated at any time after logging. Defaults toNone
Returns:
mlfoundry.ArtifactVersion
: an instance ofArtifactVersion
that can be used to download the files, or update attributes like description, metadata.
Examples:
import os
import mlfoundry
with open("artifact.txt", "w") as f:
f.write("hello-world")
client = mlfoundry.get_client()
ml_repo = "sample-repo"
client.create_ml_repo(ml_repo=ml_repo)
client.log_artifact(
ml_repo=ml_repo,
name="hello-world-file",
artifact_paths=[mlfoundry.ArtifactPath('artifact.txt', 'a/b/')]
)
function log_model
log_model
Serialize and log a versioned model under the current ml_repo. Each logged model generates a new version associated with the given name
and linked to the current run. Multiple versions of the model can be logged as separate versions under the same name
.
Args:
ml_repo
(str): Name of the ML Repo to which an artifact is to be logged.name
(str): Name of the model. If a model with this name already exists under the current ml_repo, the logged model will be added as a new version under thatname
. If no models exist with the givenname
, the given model will be logged as version 1.model
(Any): model instance of any one of the supported frameworks undermlfoundry.enums.ModelFramework
. Can also beNone
which can be useful to create a reference entry without uploading any model files.framework
(Union[enums.ModelFramework, str]): Model Framework. Ex:- pytorch, sklearn, tensorflow etc. The full list of supported frameworks can be found inmlfoundry.enums.ModelFramework
. Can also beNone
whenmodel
isNone
.model_save_kwargs
(Optional[Dict[str, Any]], optional): keyword arguments to pass to model serializer. Defaults toNone
additional_files
(Sequence[Tuple[Union[str, Path], Optional[str]]], optional): A list of pairs of (source path, destination path) to add additional files and folders to the model version contents. The first member of the pair should be a file or directory path and the second member should be the path inside the model versions contents to upload to. The model version contents are arranged like follows . └── model/ └── # model files are serialized here └── # any additional files and folders can be added here.
You can also add additional files to model/ subdirectory by specifying the destination path as model/
E.g. >>> client.log_model(
... ml_repo="sample-repo", name="xyz", model=clf, framework="sklearn",
... additional_files=[("foo.txt", "foo/bar/foo.txt"), ("tokenizer/", "foo/tokenizer/")]
... )
would result in
.
├── model/
│ └── # model files are serialized here e.g. model.joblib
└── foo/
├── bar/
│ └── foo.txt
└── tokenizer/
└── # contents of tokenizer/ directory will be uploaded here
description
(Optional[str], optional): arbitrary text upto 1024 characters to store as description. This field can be updated at any time after logging. Defaults toNone
metadata
(Optional[Dict[str, Any]], optional): arbitrary json serializable dictionary to store metadata. For example, you can use this to store metrics, params, notes. This field can be updated at any time after logging. Defaults toNone
model_schema
(Optional[Union[Dict[str, Any], ModelSchema]], optional): instance ofmlfoundry.ModelSchema
. This schema needs to be consistent with older versions of the model under the givenname
i.e. a feature's value type and model's prediction type cannot be changed in the schema of new version. Features can be removed or added between versions.
E.g. if there exists a v1 with
schema = {"features"`**: {"name": "feat1": "int"}, "prediction": "categorical"}, then
schema = {"features"`**: {"name": "feat1": "string"}, "prediction": "categorical"} or
schema = {"features"`**: {"name": "feat1": "int"}, "prediction": "numerical"}
are invalid because they change the types of existing features and prediction
while
schema = {"features"`**: {"name": "feat1": "int", "feat2": "string"}, "prediction": "categorical"} or
schema = {"features"`**: {"feat2": "string"}, "prediction": "categorical"}
are valid
This field can be updated at any time after logging. Defaults to `None`
custom_metrics
: (Optional[Union[List[Dict[str, Any]], CustomMetric]], optional): list of instances ofmlfoundry.CustomMetric
The custom metrics must be added according to the prediction type of schema. custom_metrics = [{"name"
: "mean_square_error","type"
: "metric","value_type"
: "float" }]
Returns:
mlfoundry.ModelVersion
: an instance ofModelVersion
that can be used to download the files, load the model, or update attributes like description, metadata, schema.
Examples:
- sklearn
import mlfoundry
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
client = mlfoundry.get_client()
run = client.create_run(
ml_repo="my-classification-project"
)
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])
clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
clf.fit(X, y)
model_version = run.log_model(
name="my-sklearn-model",
model=clf,
framework="sklearn"
)
print(model_version.fqn)
function search_runs
search_runs
The user must have READ
access to the project. Returns a Generator that returns a MLFoundryRun on each next call. All the runs under a project which matches the filter string and the run_view_type are returned.
Args:
-
ml_repo
(str): Name of the project. filter_string (str, optional): Filter query string, defaults to searching all runs. Identifier required in the LHS of a search expression. -
Signifies an entity to compare against. An identifier has two parts separated by a period
: the type of the entity and the name of the entity. The type of the entity is metrics, params, attributes, or tags. The entity name can contain alphanumeric characters and special characters. -
You can search using two run attributes
: status and artifact_uri. Both attributes have string values. When a metric, parameter, or tag name contains a special character like hyphen, space, period, and so on, enclose the entity name in double quotes or backticks, params."model-type" or params.model-type
-
run_view_type
(str, optional): one of the following values "ACTIVE_ONLY", "DELETED_ONLY", or "ALL" runs. order_by (List[str], optional): List of columns to order by (e.g., "metrics.rmse"). Currently supported values are metric.key, parameter.key, tag.key, attribute.key. Theorder_by
column can contain an optionalDESC
orASC
value. The default isASC
. The default ordering is to sort bystart_time DESC
. -
job_run_name
(str): Name of the job which are associated with the runs to get that runs -
max_results
(int): max_results on the total numbers of run yielded through filter
Examples:
import mlfoundry as mlf
client = mlf.get_client()
with client.create_run(ml_repo="my-project", run_name="run-1") as run1:
run1.log_metrics(metric_dict={"accuracy": 0.74, "loss": 0.6})
run1.log_params({"model": "LogisticRegression", "lambda": "0.001"})
with client.create_run(ml_repo="my-project", run_name="run-2") as run2:
run2.log_metrics(metric_dict={"accuracy": 0.8, "loss": 0.4})
run2.log_params({"model": "SVM"})
# Search for the subset of runs with logged accuracy metric greater than 0.75
filter_string = "metrics.accuracy > 0.75"
runs = client.search_runs(ml_repo="my-project", filter_string=filter_string)
# Search for the subset of runs with logged accuracy metric greater than 0.7
filter_string = "metrics.accuracy > 0.7"
runs = client.search_runs(ml_repo="my-project", filter_string=filter_string)
# Search for the subset of runs with logged accuracy metric greater than 0.7 and model="LogisticRegression"
filter_string = "metrics.accuracy > 0.7 and params.model = 'LogisticRegression'"
runs = client.search_runs(ml_repo="my-project", filter_string=filter_string)
# Search for the subset of runs with logged accuracy metric greater than 0.7 and order by accuracy in Descending order
filter_string = "metrics.accuracy > 0.7"
order_by = ["metric.accuracy DESC"]
runs = client.search_runs(
ml_repo="my-project", filter_string=filter_string, order_by=order_by
)
filter_string = "metrics.accuracy > 0.7"
runs = client.search_runs(
ml_repo="transformers", order_by=order_by ,job_run_name='job_run_name', filter_string=filter_string
)
order_by = ["metric.accuracy DESC"]
runs = client.search_runs(
ml_repo="my-project", filter_string=filter_string, order_by=order_by, max_results=10
)
Returns:
Genarator[MlFoundryRun, None, None]
: An iterator overMlFoundryRun
s matching the search query
Updated about 1 month ago