Deploy Model As API
LitServe
LitServe is a lightweight and fast inference server for machine learning models. It can be a good alternative to FastAPI if you are looking for microbatching support.
In this example, we will deploy a simple Whisper model using Litserve. You can find the code for this example here.
You can clone the code and read the code in whisper_server.py
file.
To run the server locally, you can follow the instructions below:
- Install the dependencies
- Run the server
- Test the server