Model Deployment

📘

Note

While using ServiceFoundry python SDK type is not a required field in any of the imported classes

ModelDeployment

Description

Describes the configuration for the model deployment

Schema

{
  "type": "string",
  "name": "string",
  "model_source": {},
  "resources": {
    "cpu_request": 0.2,
    "cpu_limit": 0.5,
    "memory_request": 200,
    "memory_limit": 500,
    "ephemeral_storage_request": 1000,
    "ephemeral_storage_limit": 2000,
    "instance_family": [
      "string"
    ]
  },
  "grpc": false
}

Properties

NameTypeRequiredDescription
typestringtruenone
namestringtrueName of the model deployment. This uniquely identifies this deployment in the workspace.
> Name can only contain alphanumeric characters and '-' and can be atmost 20 characters long
model_sourceobjecttrueSpecify the model to be deployed
resourcesResourcesfalseDescribes the resource constraints for the application so that it can be deployed accordingly on the cluster
To learn more you can go here
grpcbooleantrueUse grpc

Python Examples

from servicefoundry import (
		ModelDeployment, TruefoundryModelRegistry, Resources
)

model = ModelDeployment(
    name="my-service",
    model_source=TruefoundryModelRegistry(
        model_version_fqn="..."
    ),
  	resources=Resources(
        cpu_request=1,  
        memory_request=1000, # in Megabytes
        ephemeral_storage_request=1000, # in Megabytes
        cpu_limit=4,
        memory_limit=4000,
        ephemeral_storage_limit=10000,
        instance_family=["c6i", "t3", "m4"],
    ),
)

deployment = model.deploy(workspace_fqn='...')
from servicefoundry import (
		ModelDeployment, HuggingfaceModelHub, Resources
)

model = ModelDeployment(
    name="my-service",
    model_source=HuggingfaceModelHub(
        model_id="...",
      	pipeline="..."
    ),
  	resources=Resources(
        cpu_request=1,  
        memory_request=1000, # in Megabytes
        ephemeral_storage_request=1000, # in Megabytes
        cpu_limit=4,
        memory_limit=4000,
        ephemeral_storage_limit=10000,
        instance_family=["c6i", "t3", "m4"],
    ),
)

deployment = model.deploy(workspace_fqn='...')

Model Source

The modules below help define the model source for the ModelServer

TruefoundryModelRegistry

Description

Truefoundry model registry (deploy a model from TFY model registry)

Schema

{
  "type": "string",
  "model_version_fqn": "string"
}

Properties

NameTypeRequiredDescription
typestringtruenone
model_version_fqnstringtrueThe FQN of the model version that you want to deploy.

Python Examples

from servicefoundry import (
		ModelDeployment, TruefoundryModelRegistry, Resources
)

model = ModelDeployment(
    ...
    model_source=TruefoundryModelRegistry(
        model_version_fqn="..."
   		 ),
  	...
    )
)

HuggingfaceModelHub

Description

Huggingface model hub (deploy a model from HF model hub)

Schema

{
  "type": "string",
  "model_id": "string",
  "pipeline": "string"
}

Properties

NameTypeRequiredDescription
typestringtruenone
model_idstringtrueThe name of the model that you want to deploy. Example:- t5-small, philschmid/bart-large-cnn-samsum.
pipelinestringfalsePipeline to use for inference. If not set, we will try to infer the task name for the given model.
Examples:- summarization, text-generation, text-classification, etc.
You can read more about HF pipelines here:- https://huggingface.co/docs/transformers/pipeline_tutorial
You can find a list of tasks here:- https://huggingface.co/docs/transformers/v4.24.0/en/main_classes/pipelines#transformers.pipeline.task

Python Examples

from servicefoundry import (
		ModelDeployment, HuggingfaceModelHub, Resources
)

model = ModelDeployment(
    ...
    model_source=HuggingfaceModelHub(
        model_id="...",
      	pipeline="..."
   		 ),
  	...
    )
)