👍
What you'll learn

How to Deploy a Hugging Face Model as a Model Deployment via Truefoundry User Interface.

After you complete the guide, you will have successfully deployed a hugging face model. Your model deployment dashboard will look similar to this:

Deploying your Huggingface model

Step 1: Get the model details

Before you initiate the deployment, you need to gather essential details about the Hugging Face model you want to deploy.

In general, for the model you want to deploy you will need to get the Model's

Model Library the model runs with. One of transformers, sentence-transformers
Id on Hub: (e.g. philschmid/bart-large-cnn-samsum )
Pipeline Tag to run the model with (e.g. summarization, text-generation, text-classification)

For this example, we will be deploying the BART-Large CNN SamSum model.

Model Library: transformers
Id: philschmid/bart-large-cnn-samsum
Pipeline: summarization

Step 2: Initiating Deployment via UI

On the left panel of the TrueFoundry user interface, locate and click on the "Deployments" tab. From here, navigate to the "Models" tab in the top bar.
Click on the "New Deployment" button, and a form will appear with various deployment options. Click on the "Model" button to select job deployment (will be pre-selected)

In the same form, there will be a field "Where would you like to deploy?" option. Click on the "Search Workspace" bar, and select the desired workspace for your deployment.

📘
Note:
If you don't have any workspace right now, you can create a new workspace by clicking the Create New Workspace link and following along this Documentation or contact your cluster admin in case of some issues

Once selected click on the "Next Step" button to continue with the deployment.

Step 3: Configure Deployment

Now, let's dive into the configuration options for your deployment form. Each of these options plays a crucial role in shaping how your job is deployed and executed.

We will not only explain these options to you but also provide clear instructions on what to fill in each field to ensure a seamless deployment process.

Name

Enter a name for your model deployment

Model to Deploy

The Model to Deploy option lets you specify whether you want to deploy a model from Truefoundry model registry or deploy a model from Huggingface Model Hub. Choose "Huggingface model hub" and enter the model details we collected earlier.

For our example, we enter:

Model Library: transformers
Model repository name: facebook/bart-large-cnn
Pipeline: summarization

📘
Deploying sentence-transformers embedding models
If you are deploying a model with sentence-transformers enter feature-extraction in the Pipeline field

Resources

Allocate computing resources (CPU, memory, storage) for your model using the Resources option.

CPU refers to the computing power available to your application
Memory refers to how much space your application has to hold and work with data while it's running
Ephemeral storage is where your application can temporarily store files and data

Requests and Limits:

Request is like asking for a certain amount of a resource. It's what your application initially asks for to start working properly.
Limit is like setting a maximum value. It restricts how much of a resource (like CPU or memory) your application can use.

So for each category of resource, you specify the Request and Limits

For our example, we set:

CPU Request: 1
CPU Limit: 1
Memory Request: 4000 (MB)
Memory Limit: 6000 (MB)
Storage Request: 8000 (MB)
Storage Limit: 8000 (MB)

Step 4: Submit the form

Now that all the deployment options are filled, you can proceed by clicking the "Create" button. This will initiate your deployment process. After clicking "Create," your Models Dashboard will resemble the following:

While your deployment is in progress, you can hover over the spinner icon to check the status of the deployment. After a brief moment, the deployment should become active, and your Models Dashboard will transform to appear as follows:

Interacting with the model

OpenAPI Spec

To make the process of making the requests easier you can go to the OpenAPI tab and click on the Infer Simple button

Here you can directly try out your model via the changing the body and hitting the Send API Request button.

Once you hit the button, in a few seconds you will be able to see the response of your model.

You can see the logs and metrics of your Model via the button in the Topbar of your Model dashboard

You can now see the logs and metrics of what happened when you made the request

Code Snippet

You can scroll down to get code snippets on how to programmatically send requests to your deployed model.

Deploy from Huggingface Hub

👍
What you'll learn

Deploying your Huggingface model

Step 1: Get the model details

Step 2: Initiating Deployment via UI

📘
Note:

Step 3: Configure Deployment

Name

Model to Deploy

📘
Deploying `sentence-transformers` embedding models

Resources

Step 4: Submit the form

Interacting with the model

OpenAPI Spec

Code Snippet

👍What you'll learn

Deploying your Huggingface model

Step 1: Get the model details

Step 2: Initiating Deployment via UI

📘Note:

Step 3: Configure Deployment

Name

Model to Deploy

📘Deploying sentence-transformers embedding models

Resources

Step 4: Submit the form

Interacting with the model

OpenAPI Spec

Code Snippet

👍
What you'll learn

📘
Note:

📘
Deploying `sentence-transformers` embedding models