FastAPI
Deploy a model with FastAPI
FastAPI is a modern, fast (high-performance), web framework for building APIs with Python. Serving a model with FastAPI is the easiest way to deploy a model. We basically wrap our inference function in a FastAPI app.
Here we will work with a very simple scikit-learn iris classification model. You can find the FastAPI code for this model here.
They key files are:
iris_classifier.joblib
: The model fileserver.py
: The main FastAPI code that loads the model and provides a REST api to serve the model.requirements.txt
: Contains the dependencies.
How to write the inference function in FastAPI
Here’s an explanation of the code in server.py
Load the file from the path specified in the MODEL_PATH
environment variable
We are using an environment variable MODEL_PATH
to read the model path. This is useful when we want to load a model from a different path that can be downloaded and cached separately. See Cache Models and Artifacts guide for more details.
async def
vs def
FastAPI is designed to be an async web framework. This means that the server can handle multiple requests concurrently.
When using async def
, it is the developer’s responsibility to make sure the function is not blocking the event loop.
However, most ML Inferences are compute bound and block the event loop. Use def
instead of async def
for inference functions.
Generally, if you are not sure just use def
. FastAPI (mainly Startlette
which is the foundation of FastAPI) automatically runs the sync functions in a thread pool to not block the event loop.
Running the server locally
We’ll see something like this:
We can open the browser and navigate to http://localhost:8000/docs
to try out our API.
Deploying with TrueFoundry
To deploy the model, we need to package both the model file and the code. To do this, we can follow the steps below:
Log the Model To Model Registry
Logging the model to the registry is not mandatory to deploy the model, but is highly recommended. You can follow the guide here to log the model to the registry.
Push the code to a Git repository or directly deploy from local machine
Once you have tested your code locally, we highly recommend pushing the code a Git repository. This allows you to version control the code and also makes the deployment process much easier. However, if you don’t have access to a Git repository, or the Git repositories are not integrated with Truefoundry, you can directly deploy from local laptop.
You can follow the guide here to deploy your code. A few key things to note:
Binding to 0.0.0.0
in command
Please make sure you put --bind 0.0.0.0:8000
in the command. By default, gunicorn
binds to 127.0.0.1
. To make it accessible from outside the environment, we need to bind to 0.0.0.0
.
The port number in the ports section should match the port your model server will listen on.
E.g. In this case gunicorn
is being told to bind on 8000
in command
, hence we use 8000
in ports
Download Model from Model Registry in the deployment configuration
If you logged the model to the registry in Step 1, Truefoundry can help automatically download the model at the path specified in the MODEL_PATH
environment variable to the deployed service.
To enable this, you can modify the deployment configuration as follows:
View the deployment, logs and metrics
Once the deployment goes through, you can view the deployment, the pods, logs, metrics and events to debug any issues.