Add health checks to deployments

👍

What you'll learn

  • What are Health Checks
  • How to add Liveness and Readiness Probe's to your deployed services

This is a guide to add health checks to your deployed services.

Health Checks

Health checks allow you to detect when the service is healthy or not. This helps to route the incoming traffic to only the healthy instances and restart or terminate the containers that are not healthy. We can currently configure two types of health checks - the liveness probe and readiness probe.

Liveness Probe

Liveness probe checks whether the service is currently healthy by making a request to an endpoint of the service. If the service is not healthy, the container will be terminated and another one will be restarted. We can configure all parameters of the liveness probe according to our needs:

Readiness Probe

Readiness probe checks whether the container is ready to receive traffic. Until the readiness probe succeeds, no incoming traffic will be routed to this container. Like the liveness probe, this is also achieved by making a request to an endpoint and checking if the endpoint responds with a successful response(any HTTP code >= 200 and < 400). This is usually useful when the service is doing some heavy work like loading a model which can take significant time - during this period, we don't want to route any traffic to this container since the model is not loaded yet.

HealthChecks can be configured using the the following parameters:

  • HttpRequest Configuration: (Response is considered successful if http status code is >=200 and < 400)
    • port: Set the port to send the HTTP request to.
    • path: The endpoint path to send the request to
  • initial_delay_seconds: Number of seconds after the container is started before the first probe is initiated. Defaults to 0.
  • period_seconds: - How often, in seconds, to execute the probe. Defaults to 10.
  • timeout_seconds: - Number of seconds after which the probe times out. (Defaults to 1)
  • success_threshold: - Minimum consecutive successes for the probe to be considered successful after having failed (Defaults to 1)
  • failure_threshold: - Number of consecutive failures required to determine the container is not alive for liveness probe or not ready for readiness probe (Defaults to 3)

Step 1: Implement service code

We will firstly create a FastAPI service, and add two routes there, namely:-

  • livez
  • readyz

File Structure:

.
└── main.py

main.py

from fastapi import FastAPI

app = FastAPI()


@app.get("/livez")
def liveness():
    return True


@app.get("/readyz")
def readyness():
    return True


@app.get("/")
async def root():
    return {"message": "Hello World"}

Step 2: Adding the Health Checks

Depending on wether you are deploying via the Python SDK or via creating a yaml configuration file, you can open the following recipes:-

via Python SDK

File Structure:

.
├── main.py
└── deploy.py

deploy.py

via YAML

File Structure:

.
├── main.py
└── deploy.yaml

deploy.yaml