Prometheus
to scrap all the metrics exposed at /metrics
endpoint of your server.
/predict
endpoint. Now, we would like to track how many times this API has been called and how much time it took for each of these predictions, in other words, what is the latency of this API?
Let us start by installing the Python library for Prometheus
uvicorn app:app --port 8000 --host 0.0.0.0
Now, you can check the exposed metrics at http://localhost:8000/metrics
kustomize
patch to your service while deploying. This will add necessary annotations to your service Pods for Prometheus to scrape metrics. Please fill the placeholders with the correct service-name
and service-port-number
.
Prometheus
and Grafana
is installed in your cluster.request_count_total{container=~"iris-inference",namespace=~"demo-ws"}
where container
is service_name and namespace
is workspace_name.Query
section with different percentile like round(histogram_quantile(0.99, sum(rate(request_latency_seconds_bucket{namespace=~"demo-ws", container=~"iris-inference"}[$__rate_interval])) by (le)), 0.001)
represents p99
, similarly round(histogram_quantile(0.90, sum(rate(request_latency_seconds_bucket{namespace=~"demo-ws", container=~"iris-inference"}[$__rate_interval])) by (le)), 0.001)
represents p90
where container
is service_name and namespace
is workspace_name