Benchmarking your deployed service using Locust

This guide will help you learn how to benchmark your deployed service.

When deploying a service into production, it becomes essential to thoroughly understand its performance metrics. Key considerations include:

  • Determining the service's capacity in terms of handling requests per second.
  • Assessing the threshold of concurrent requests, indicating the number of users the service can accommodate simultaneously.
  • Analyzing how latency fluctuates with increasing traffic volume.

These inquiries are important as they inform crucial decisions regarding service configurations and operational strategies:

  • Determining the requisite number of replicas necessary.
  • Evaluating the necessity of implementing auto-scaling mechanisms.
  • If auto-scaling is necessary:
    • Selecting appropriate metrics for triggering auto-scaling mechanisms (e.g., request per second, CPU utilization, or custom metrics).
    • Establishing thresholds for these auto-scaling strategies.

This evaluation ensures adherence to service level agreements (SLAs) while simultaneously optimizing costs.

In this guide, we will use Locust which is an open-sourced tool for benchmarking services.


You can setup locust in any environment with python installed with the following command:

pip install locust

Writing the Locust File

In order to benchmark your service with locust, you need to write a locust file. You need to define what API Endpoint (path) you need to benchmark and write a sample request for the same.

Here is a small example to benchmark a deployed LLM. You can write this script to benchmark to any service and not just LLMs.

Note: You will need to replace your model name with the name of deployed service.

from locust import FastHttpUser, task

class HelloWorldUser(FastHttpUser):
    def hello_world(self):
                "model": "<Add your deployed service name>",
                "prompt": [
                    "This is a test prompt"

Running the benchmarks

You can start the launcher for locust with the following command:

locust -f

Once you run this, you will find the find a service running on port 8089

Now, open http://localhost:8089 in a browser window. You will find the UI like this:

In the section of host, paste the endpoint of your "Service" by copying the deployed endpoint as shown below:

Once you paste the link, you can click on "Start Swarming" after setting the following parameters:

  • Number of Users: Number of concurrent users that will bombard your service
  • Spawn Rate: If multiple users are selected, the rate at which it will create new users (this can be 1 by default)

Once you start swarming, you can see the results on the dashboard:

Once this is setup, you can edit and increase the number of users from top by clicking on Edit.

You can view the detailed charts by clicking on "Charts" tab as shown in the picture. The results look something like this.

Deploying this Locust Script as a Service:

While you can run this script locally, your internet speed and difference in local setup of different users might affect the results.

For this, you can deploy this as a service on Truefoundry.

You will need two files:

  • above)
  • requirements.txt

The contents of requirements.txt are:

# add other dependencies if used in your locust script

Once you have this ready, please go to Truefoundry UI and follow the following steps:

  1. Click on New Deployment button on top right of your screen.
  2. Select your workspace
  3. Select Code from Laptop
  4. Click on Next
  5. Follow the guide from the UI

Step 2 to 4 are illustrated in the image below:

Once you complete all the steps and deploy your Service, you can access the deployed locust benchmarking script from the UI by clicking on the "Endpoint" as shown below: