LitServe is a lightweight and fast inference server for machine learning models. It can be a good alternative to FastAPI if you are looking for microbatching support.

In this example, we will deploy a simple Whisper model using Litserve. You can find the code for this example here.

You can clone the code and read the code in whisper_server.py file.

To run the server locally, you can follow the instructions below:

  1. Install the dependencies
pip install -r requirements.txt
  1. Run the server
python whisper_server.py
  1. Test the server
curl -X POST http://localhost:8000/predict -H "Content-Type: application/json" -d '{"text": "Hello, world!"}'

Examples