Alerts are important to monitor the health of your applications and get notified if something goes wrong. Truefoundry makes it easy to setup alerts for all your applications including Service, Async Service, Job, Helm Deployment, Volume, Notebook and SSH server. You can set up notification channels in Email, Slack or Pagerduty to be notified of the alerts.

Truefoundry uses Prometheus AlertManager to power the alerts.

Key Components of an Alert

An alert primarily comprises of two components:

  1. Alert Rule: This is a rule in terms of a PromQL query that is evaluated periodically to check if its true. If its true for a configured duration, the alert is triggered.

Truefoundry provides the PromQL expressions of the most commonly used alerts which should suffice for most usecases. So you don’t need to necessarily learn PromQL to setup alerts.

  1. Severity: This is the severity of the alert which is used to categorize if the alert needs immediate attention or not. It can be either warning or critical. This can be used in PagerDuty to route the alert to the proper channel.
  2. Notification Channel: These are the channels where the alerts will be sent once they are triggered.

Setting up alerts in Truefoundry Services

To setup alerts, you need to follow the steps below:

1. Setup Notification Channels

You need to create an integration with either Slack, Email or PagerDuty before setting up a notification channel. If this is not already done, please refer to Adding slack, email, and pagerduty integrations documentation.

Before setting up the alerts, you have to configure the notification channels and add one or more notification channels to send alerts to.

Configure Notification Channels

You can choose any among Email, SlackBot or Pagerduty as the notification channel to send alerts.

Email Channel
You can add multiple notification channels to send alerts to.

2. Create Alert

You can choose among the already available alerts or create your own custom alert. In most of the cases, the existing alerts should suffice for your usecase. Here are a few of the alerts already available:

Create Alert Rule

If you need something apart from the above, you can create your own alert using the New Alert Rule form in the UI. To create and verify the PromQL query, we recommend using Grafana UI to test the query. The key fields to fill up in the form are:

  • Name: A descriptive name for your alert.

  • Description: (Optional) Briefly describe what this alert monitors.

  • Prometheus Expression: Enter the Prometheus query that defines the alert condition. For example:

    sum(rate(http_requests_total{status!="2xx"}[5m])) by (service) > 5
    

    This triggers if there are more than 5 non-2xx HTTP responses in 5 minutes for any service.

  • Trigger After (seconds): How long the condition must be true before triggering the alert.

  • Severity: Choose between Warning and Critical.

  • Notification Enabled: Enable or disable notifications for this rule.

Create Alert Rule Form

Applying AlertRules via YAML in GitOps

You can apply AlertRules via YAML in GitOps. You can copy the YAML from the Code icon in the Alerts page.