Log and Get Models

Model comprises of model file/folder and some metadata. Each Model can have multiple versions. In essence they are just Artifacts with special type model

Log Model Version

You can automatically save and version model files/folder using the log_model method.

The basic usage looks like follows

from truefoundry.ml import get_client, ModelFramework
  
client = get_client()
run = client.create_run(...)
model_version = run.log_model(
	  name="name-for-the-model",
    model_file_or_folder="path/to/model/file/or/folder/on/disk",
  	framework=<None or ModelFramework member>
)    

📘

Framework Agnostic

Any file or folder can be saved as model by passing it in model_file_or_folder and framework can be set to None.

This is an example of storing an sklearn model. To log a model we start a run and then give our model a name and pass in the model saved on disk and the framework name.

 from truefoundry.ml import get_client, ModelFramework

  import joblib
  import numpy as np
  from sklearn.pipeline import make_pipeline
  from sklearn.preprocessing import StandardScaler
  from sklearn.svm import SVC

  X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
  y = np.array([1, 1, 2, 2])
  clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
  clf.fit(X, y)
  joblib.dump(clf, "sklearn-pipeline.joblib")

  client = get_client()
  client.create_ml_repo(  # This is only required once
      ml_repo="my-classification-project",
      # This controls which bucket is used.
      # You can get this from Integrations > Blob Storage. `None` picks the default
      storage_integration_fqn=None
  )
  run = client.create_run(
       ml_repo="my-classification-project"
  )
  model_version = run.log_model(  # You can also directly call client.log_model
       name="my-sklearn-model",
       model_file_or_folder="sklearn-pipeline.joblib",
       framework=ModelFramework.SKLEARN,
       metadata={"accuracy": 0.99, "f1": 0.80},
       step=1,  # step number, useful when using iterative algorithms like SGD
  )
  print(model_version.fqn)
from truefoundry.ml import get_client, ModelFramework

import torch
from transformers import AutoTokenizer, AutoConfig, pipeline, AutoModelForCausalLM
pln = pipeline(
     "text-generation",
     model_file_or_folder="EleutherAI/pythia-70m",
     tokenizer="EleutherAI/pythia-70m",
     torch_dtype=torch.float16
)
pln.model.save_pretrained("my-transformers-model")
pln.tokenizer.save_pretrained("my-transformers-model")

client = get_client()
client.create_ml_repo(  # This is only required once
    ml_repo="my-llm-project",
    # This controls which bucket is used.
    # You can get this from Integrations > Blob Storage. `None` picks the default
    storage_integration_fqn=None
)
run = client.create_run(
     ml_repo="my-llm-project"
)
model_version = run.log_model(
     name="my-transformers-model",
     model_file_or_folder="my-transformers-model/",
     framework=ModelFramework.TRANSFORMERS
)
print(model_version.fqn)

This will create a new model iris-demo under the ml_repo and the first version v1 for iris-classifier. Once created the model version is immutable.

Once created, a model version has a fqn (fully qualified name) which can be used to retrieve the model later - E.g. model:truefoundry/my-classification-project/my-sklearn-model:1

Any subsequent calls to log_model with the same name would create a new version of this model - v2, v3 and so on.

The logged model can be found in the dashboard in the Models tab under your ml_repo.

You can view the details of each model version from there on.

📘

Logging Model Version without a Run

It is possible to also log model without creating a run at all. See MlFoundry.log_model

Get Model Version and Download

You can first get the model using the fqn and then download the logged model using the fqn and then use the download() function. From here on you can access the files at download_info.download_dir

import tempfile
import joblib
from truefoundry.ml import get_client


client = get_client()
model_version = client.get_model_version_by_fqn(
     fqn="model:truefoundry/my-classification-project/my-sklearn-model:1"
)

# Download the model to disk
temp = tempfile.TemporaryDirectory()
download_info = model_version.download(path=temp.name)
print(download_info.model_dir, download_info.model_filename)

# Deserialize and Load
model = joblib.load(
     os.path.join(download_info.model_dir, download_info.model_filename)
)
import torch
from transformers import pipeline

from truefoundry.ml import get_client

client = get_client()
model_version = client.get_model_version_by_fqn(
     fqn="model:truefoundry/my-llm-project/my-transformers-model:1"
)
# Download the model to disk
temp = tempfile.TemporaryDirectory()
download_info = model_version.download(path=temp.name)
print(download_info.model_dir)

# Deserialize and Load
pln = pipeline("text-generation", model=download_info.model_dir, torch_dtype=torch.float16)

FAQs

What are the frameworks supported by the log_model method?

This method supports, "sklearn", "tensorflow", "pytorch", "keras", "xgboost", "lightgbm", "fastai", "h2o", "spacy", "statsmodels", "gluon", "paddle", "transfomers". These options are also available as a enum - mlfoundry.ModelFramework