Migrate Pytorch Sagemaker Endpoints To TrueFoundry Platform

TrueFoundry provides a straightforward migration path for Pytorch endpoints already running in Sagemaker using a custom inference script to the TrueFoundry platform. We will start with an inference script and follow a workflow to implement the expected changes.

Existing Code

A Sagemaker deployment typically contains code in the form of the following file tree -

code/  
├── inference.py
└── requirements.txt

inference.py - This is the inference handler that implements the Sagemaker functions like model_fn, input_fn, predict_fn, output_fn etc
requirements.txt - This contains any additional Python packages needed by the inference handler

Apart from these, there are also -

Model artifacts - Generated model files (e.g. model.pth). These may reside on your S3 buckets.
Sagemaker deployment code (e.g. sagemaker_deploy.py) - Code to call Sagemaker to deploy the model as an endpoint

Examples:

`inference.py`

# defining model and loading weights to it.
import json
import torch

class Net(torch.nn.Module):
  ...

 
def model_fn(model_dir):
    model = Net()
    with open(os.path.join(model_dir, "model.pth"), "rb") as f:
        model.load_state_dict(torch.load(f))
    model.to(device).eval()
    return model


# data preprocessing
def input_fn(request_body, request_content_type):
    assert request_content_type == "application/json"
    data = json.loads(request_body)["inputs"]
    data = torch.tensor(data, dtype=torch.float32, device=device)
    return data


# inference
def predict_fn(input_object, model):
    with torch.no_grad():
        prediction = model(input_object)
    return prediction


# postprocess
def output_fn(predictions, content_type):
    assert content_type == "application/json"
    res = predictions.cpu().numpy().tolist()
    return json.dumps(res)

`sagemaker_deploy.py`

from sagemaker.pytorch import PyTorchModel
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer


pytorch_model = PyTorchModel(
    model_data = s3_model_uri,
    role=role,                  
    entry_point=<"inference.py">,
    framework_version = "1.12.1",
    py_version='py38',
    name = <"model_name">,
)

predictor = pytorch_model.deploy(
    serializer = JSONSerializer(),
    deserializer = JSONDeserializer(),
    endpoint_name = <"endpoint_name">,
    instance_type = "ml.g4dn.xlarge",
    initial_instance_count = 2,
)

Migration Workflow

Broadly speaking these are the things we shall do -

Enclose the inference handler within a Docker container containing torchserve to support pytorch-based models
Upload the model artifact as a TrueFoundry Artifact to make it accessible from the running container
Launch a TrueFoundry deployment utilizing the above two pieces

Steps

Upload the Pytorch model artifacts to the TrueFoundry Model Registry.

For reference, your model on disk may look like this

model/
└── model.pth

Add a upload_model.py

Create an ML repo (docs) and provide access to that repo from the workspace we need to deploy in (docs). Change ml_repo arguments
Change model_file_or_folder to point to your model folder

The code prints a model fqn at the end which we will use in later steps.

import mlfoundry

client = mlfoundry.get_client()

model_version = client.log_model(
    ml_repo=<"ml_repo_name">,
    name=<"my-pytorch-model">,
    model_file_or_folder=<"model/">,  # Path to directory containing the model
    framework=mlfoundry.ModelFramework.PYTORCH,
    description="TorchServe + Sagemaker compatible model artifact",
)
print("Model Version FQN:", model_version.fqn)

We will write a Python script called main.py that could launch the torchserve process at startup -

import os
# We are reusing sagemaker's open source toolkit to manage torchserve
from sagemaker_pytorch_serving_container import serving
import shutil

def main():
    home_dir = os.getenv("HOME")
    # MODEL_DIR is populated by truefoundry artifacts. Model will be downloaded here
    artifact_model_dir = os.getenv("MODEL_DIR")
    if not artifact_model_dir:
        raise ValueError("`MODEL_DIR` must be set in environment and point to a model directory")
    # Base dir where models are supposed to be present
    base_dir = os.getenv("SAGEMAKER_BASE_DIR", os.path.join(home_dir, "model-store"))
    model_dir = os.path.join(base_dir, "model")
    # Copying over the model artifacts to the model dir
    shutil.copytree(src=artifact_model_dir, dst=model_dir)
    # Copying over the code to model dir
    shutil.copytree(src=os.path.join(home_dir, "code"), dst=os.path.join(model_dir, "code"))
    # Launching the torchserve process
    serving.main()


if __name__ == '__main__':
    main()

Next, we’ll write a Dockerfile that can create the TrueFoundry application

# We pick a gpu enabled torchserve image as base. Others can be found here - https://hub.docker.com/r/pytorch/torchserve/tags
FROM --platform=linux/amd64 pytorch/torchserve:0.9.0-gpu
ENV SAGEMAKER_BASE_DIR=/home/model-server/model-store
RUN pip install -U pip setuptools wheel && \
    pip install --no-cache-dir \
        sagemaker-pytorch-inference==2.0.22 \
        scikit-learn \
        pandas \
        scipy==1.10.1
USER root
RUN touch /etc/sagemaker-ts.properties && \
    chown model-server:model-server /etc/sagemaker-ts.properties
USER model-server
WORKDIR /home/model-server/
# Assuming code is the directory where inference code and requirements.txt is present
COPY code ./code
COPY main.py ./
EXPOSE 8080

Now let’s go ahead and write a deploy.py script that can be used with TrueFoundry to get a service deployed. Here you’ll need to change the following
- Service Name - Name for the service we’ll deploy
- Entrypoint Script Name (value for SAGEMAKER_PROGRAM) - The code file name containing model_fn, input_fn, predict_fn and output_fn
- Model Version FQN - The FQN obtained from upload_model.py

import argparse
import logging
from typing import Optional

from servicefoundry import (
    Service, Build, LocalSource,
    DockerFileBuild, Port, ArtifactsDownload,
    TruefoundryArtifactSource,
    ArtifactsCacheVolume, Resources, GPUType, NvidiaGPU,
)

logging.basicConfig(level=logging.INFO, format=logging.BASIC_FORMAT)


def main():
    service = Service(
        name=<"service_name">,
        image=Build(
            build_source=LocalSource(local_build=False),
            build_spec=DockerFileBuild(
                dockerfile_path="./Dockerfile",
                command="python main.py"
            )
        ),
        ports=[Port(port=8080, host=<"host.app.example.com">, path=<"/">)],
        env={
            # This should be the `entry_point` argument, the code file containing model_fn, predict_fn, etc
            "SAGEMAKER_PROGRAM": <"inference.py">
        },
        artifacts_download=ArtifactsDownload(
            artifacts=[              	
                TruefoundryArtifactSource(
                    # This should be the model version fqn obtained by running `upload_model.py`
                    artifact_version_fqn=<"model_version_fqn">
                    download_path_env_variable="MODEL_DIR",
                )
            ],
        ),
        resources=Resources(
            cpu_request=1,
            cpu_limit=4,
            memory_request=8000,
            memory_limit=16000,
            ephemeral_storage_request=10000,
            ephemeral_storage_limit=16000,
            devices=[
                NvidiaGPU(name=GPUType.T4, count=1)
            ]
        ),
        liveness_probe=HealthProbe(
            config=HttpProbe(path="/ping", port=8080),
            initial_delay_seconds=30,
            period_seconds=10,
            timeout_seconds=1,
            success_threshold=1,
            failure_threshold=5,
        ),
        readiness_probe=HealthProbe(
            config=HttpProbe(path="/ping", port=8080),
            initial_delay_seconds=30,
            period_seconds=10,
            timeout_seconds=1,
            success_threshold=1,
            failure_threshold=5,
        ),
    )
    service.deploy(workspace_fqn=<"workspace_fqn">, wait=False)


if __name__ == '__main__':
    main()

Deploy using servicefoundry

$ python deploy.py

Once the deployment has gone through, it can be tested using this script -

if __name__ == "__main__":
    # The URL of the endpoint you're sending the request to
    url = 'https://<host.app.example.com>/predictions/model'

    # Your JSON payload
    data = {"inputs": <serialized_input>}

    # Send the POST request with the JSON payload
    response = requests.post(url, json=data)

    # Check the response
    if response.status_code == 200:
        print("Request successful.")
        # Process response data if needed
        print(response.json())
    else:
        print("Request failed.", response.status_code)
        print(response.text)

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Secret Management

Access Control

Platform

Deploying On Your Own Cloud

HOW TOs

Migrate Pytorch Sagemaker Endpoints To TrueFoundry Platform

Existing Code

`inference.py`

`sagemaker_deploy.py`

Migration Workflow

Steps

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Secret Management

Access Control

Platform

Deploying On Your Own Cloud

HOW TOs

​Existing Code

​inference.py

​sagemaker_deploy.py

​Migration Workflow

​Steps

Existing Code

`inference.py`

`sagemaker_deploy.py`

Migration Workflow

Steps