Scikit Learn

Logging and Deploying Sklearn Models in Truefoundry

We will need to know some information about the model you are logging to generate a deployment package.

  • To load the model:
    • The serialization format (joblib,cloudpickle, etc.) and the model file name.
  • To generate the inference script and wrap it around a model server:
    • The inference method name (predict, predict_proba, etc).
    • The input and output schema of the inference method.
  • To deploy and run:
    • Python version along with pip package (numpy, scikit-learn) dependencies.

Log a deployable Sklearn Model

Below is an example of logging a model trained using Scikit-learn:

from truefoundry.ml import get_client, SklearnFramework, sklearn_infer_schema
import joblib
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# Define training data
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
y = np.array([1, 1, 2, 2])

# Create and train the model
clf = make_pipeline(StandardScaler(), SVC(gamma="auto"))
model = clf.fit(X, y)

# Save the model
joblib.dump(clf, "sklearn-pipeline.joblib")

# Initialize the Truefoundry client
client = get_client()

# Infer model schema
model_schema = sklearn_infer_schema(
    model_input=X, model=model, infer_method_name="predict"
)

# Log the model
model_version = client.log_model(
    ml_repo="my-classification-project",
    name="my-sklearn-model",
    model_file_or_folder="sklearn-pipeline.joblib",
    # To make the model deployable and generate the inference script, model file, and schema(with the method name) are required.
    framework=SklearnFramework(
        model_filepath="sklearn-pipeline.joblib",
        model_schema=model_schema,
    ),
    # Auto-captures the current environment details (e.g., python_version, pip_packages) if not provided, based on the framework.
)

# Output the model's Fully Qualified Name (FQN)
print(f"Model version logged successfully: {model_version.fqn}")
  • View and manage recently logged models in the ML Repos.
  • Access framework details like serialization format, model schema, and inference method.
  • Access environment details like the Python version and pip packages list required for a specific model version.

Deploy the model

Once the model is deployable, you can start the deployment flow directly using the CLI.

Navigate to the Model Registry

  • Locate the desired model in the list and click on the Deploy button
  • Select the workspace for deployment, then click the copy icon to use the generated CLI command and initialize the model deployment package.


Common Model Deployment Issues and Troubleshooting Guide

  • Fix for Incomplete Model Manifest and make an existing logged model deployable

    Deploying a logged model may fail due to an incomplete model manifest, causing errors like:

    -Model framework is not supported for deployment
    -Model filename not found, please save model filename while logging the model -Model schema not found, please save schema while logging the model -Serialization format not found, please save serialization format while logging the model

    Here’s an example code snippet to resolve the Incomplete Model Manifest by adding the required fields and updating the model version:

    from truefoundry.ml import get_client, ModelVersionEnvironment, SklearnFramework, sklearn_infer_schema
    import joblib
    import numpy as np
    
    # Replace with your model version FQN
    model_version_fqn = "model:truefoundry/my-classification-project/my-sklearn-model:1"
    
    client = get_client()
    model_version = client.get_model_version_by_fqn(model_version_fqn)
    model_version.download(path=".")
    
    # Replace with your model file path
    model_file_path = "./sklearn-pipeline.joblib"
    model = joblib.load(model_file_path)
    
    # Update the model input example as per your model
    X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
    model_schema = sklearn_infer_schema(model_input=X, model=model, infer_method_name="predict")
    
    # To make the model deployable and generate the inference script, model file, and schema(with the method name) are required.
    model_version.framework = SklearnFramework(
        model_filepath="sklearn-pipeline.joblib",
        serialization_format="joblib",
        model_schema=model_schema,
    )
    model_version.environment = ModelVersionEnvironment(
        python_version="3.11",
        pip_packages=[
            "joblib==1.4.2",
            "numpy==1.26.4",
            "pandas==2.1.4",
            "scikit-learn==1.5.2",
        ],
    )
    model_version.update()

  • Python version < 3.8 and > 3.12 is not supported for Triton deployment

    The Triton deployment depends on the nvidia-pytriton library (https://pypi.org/project/nvidia-pytriton), which supports Python versions >=3.8 and <=3.12. If you need to use a version outside this range, consider FastAPI as an alternative framework for serving the model.


  • Numpy version must be specified for Triton deployment,Numpy version must be less than 2.0.0 for Triton deployment

    The nvidia-pytriton library does not support numpy versions >=2.0. If you need to use a version outside this range, consider FastAPI as an alternative framework for serving the model.