Notifications

Introduction to Notifications

Sveltos uses ClusterProfiles/Profiles to automatically track matching clusters and deploy specified add-ons (like Helm charts or Kubernetes resources). It can then assess the cluster health (ensuring all add-ons are ready) and send notifications. These notifications allow external tools to trigger further workflows, like CI/CD pipelines, only once the cluster is confirmed healthy and stable.

ClusterHealthCheck

ClusterHealthCheck is the CRD that can be used to:

Define the cluster health checks;
Instruct Sveltos when and how to send notifications

Cluster Selection

The clusterSelector field is a Kubernetes label selector. Sveltos uses it to detect all the clusters to assess health and send out notifications.

LivenessChecks

The livenessCheck field is a list of cluster liveness checks to be evaluated.

The supported types are:

Addons: Addons type instructs Sveltos to evaluate state of add-ond deployment in such a cluster;
HealthCheck: HealthCheck type allows to define a custom health check for any Kubernetes type.

Notifications

The notifications fields is a list of all notifications to be sent when the liveness check state changes.

The supported types are:

Slack
Webex
Teams
Discord
Telegram
SMTP
Kubernetes events (reason=ClusterHealthCheck)

HealthCheck CRD

The HealthCheck resource defines a custom health assessment by first selecting Kubernetes resources and then applying custom evaluation logic to determine their collective health.

resourceSelectors
- Purpose: Resource Selection
- Details: An array of ResourceSelector objects. These define the Kubernetes resources to monitor by specifying their Group, Version, Kind, Namespace, and Name.
resourceSelectors[*].LabelFilters
- Purpose: Filtering by Label
- Details: Filters the selected resources using standard label operations: Equal, Different, Has, or DoesNotHave.
resourceSelectors[*].Evaluate
- Purpose: Lua Pre-Filter (Optional)
- Details: An optional Lua script used to additionally filter resources before the main health check is performed.
resourceSelectors[*].EvaluateCEL
- Purpose: CEL Pre-Filter (Optional)
- Details: An optional list of Common Expression Language (CEL) rules used to additionally filter resources.
evaluateHealth
- Purpose: Custom Health Evaluation
- Details: A mandatory Lua script that performs the core health check logic on all the final, filtered resources.

The Spec.evaluateHealth field must contain a Lua script with a function named evaluate().

Input Access: The function accesses all Kubernetes resources selected by resourceSelectors using the global Lua variable: resources.

Required Output: It must return an array of tables (structured instances), with the following required and optional fields for each evaluated resource:

resource
- Type: Object
- Description: The specific Kubernetes resource that was evaluated.
healthStatus
- Type: String
- Description: The assessment of the resource's health. Must be one of: Healthy, Progressing, Degraded, or Suspended.
message
- Type: String
- Description: Optional, an informative message providing context for the status.
reEvaluate
- Type: Boolean
- Description: Optional. If set to true, the health check will be automatically re-evaluated in 10 seconds.
ignore
- Type: Boolean
- Description: Optional. If set to true, Sveltos will ignore this resource's result during the overall health calculation.

Example: ConfigMap HealthCheck

In the follwoing example¹, we are creating an HealthCheck that watches all the ConfigMap Kubernetes resources.

hs is the health status object we will return to Sveltos. It must contain a status attribute which indicates whether the resource is Healthy, Progressing, Degraded or Suspended. By default,the status is set to Healthy and the hs.ignore is set to true, as we do not want to mess with the status of other, non-OPA ConfigMaps. Optionally, the health status object may also contain a message.

In this example, we want to identify if the ConfigMap is an OPA policy or another kind of ConfigMap. If it is a OPA policy, we retrieve the value of the openpolicyagent.org/policy-status annotation. The annotation is set to {"status":"ok"} if the policy loaded successfully. If errors occurred during loading (e.g., the policy contained a syntax error) the cause will be reported in the annotation. Depending on the value of the annotation, we set the status and message attributes appropriately.

At the end, we return the hs object to Sveltos.

Example - HealthCheck Definition

---
apiVersion: lib.projectsveltos.io/v1beta1
kind: HealthCheck
metadata:
  name: opa-configmaps
spec:
  resourceSelectors:
  - group: ""
    version: v1
    kind: ConfigMap
  evaluateHealth: |
    function evaluate()
      statuses = {}

      status = "Healthy"
      message = ""

      local opa_annotation = "openpolicyagent.org/policy-status"

      for _,resource in ipairs(resources) do
        if resource.metadata.annotations ~= nil then
          if resource.metadata.annotations[opa_annotation] ~= nil then
            if obj.metadata.annotations[opa_annotation] == '{"status":"ok"}' then
              status = "Healthy"
              message = "Policy loaded successfully"
            else
              status = "Degraded"
              message = obj.metadata.annotations[opa_annotation]
            end
            table.insert(statuses, {resource=resource, status = status, message = message})
          end
        end
      end
      local hs = {}
      if #statuses > 0 then
        hs.resources = statuses
      end
      return hs
    end

The below ClusterHealthCheck resources, will send a Webex message as notification if a ConfigMap with an incorrect OPA policy is detected.

---
apiVersion: lib.projectsveltos.io/v1beta1
kind: ClusterHealthCheck
metadata:
  name: hc
spec:
  clusterSelector:
    matchLabels:
      env: fv
  livenessChecks:
  - name: deployment
    type: HealthCheck
    livenessSourceRef:
      kind: HealthCheck
      apiVersion: lib.projectsveltos.io/v1beta1
      name: opa-configmaps
  notifications:
  - name: webex
    type: Webex
    notificationRef:
      apiVersion: v1
      kind: Secret
      name: webex
      namespace: default

Notifications and multi-tenancy

If the below label is set on the HealthCheck instance created by the tenant admin

projectsveltos.io/admin-name: <admin>

Sveltos will ensure the tenant admin can define notifications only by looking at the resources it has been authorized to by platform admin.

Sveltos suggests using the below Kyverno ClusterPolicy, which takes care of adding proper labels to each HealthCheck at creation time.

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-labels
  annotations:
    policies.kyverno.io/title: Add Labels
    policies.kyverno.io/description: >-
      Adds projectsveltos.io/admin-name label on each HealthCheck
      created by tenant admin. It assumes each tenant admin is
      represented in the management cluster by a ServiceAccount.
spec:
  background: false
  rules:
  - exclude:
      any:
      - clusterRoles:
        - cluster-admin
    match:
      all:
      - resources:
          kinds:
          - HealthCheck
    mutate:
      patchStrategicMerge:
        metadata:
          labels:
            +(projectsveltos.io/serviceaccount-name): '{{serviceAccountName}}'
            +(projectsveltos.io/serviceaccount-namespace): '{{serviceAccountNamespace}}'
    name: add-labels
  validationFailureAction: enforce

Credit for this example to https://blog.cubieserver.de/2022/argocd-health-checks-for-opa-rules/ ↩