vec2checkd

This program/daemon executes PromQL queries via the Prometheus HTTP API, translates the results to passive check results and sends them to the Icinga HTTP API in order to update certain host or service objects. These mappings (PromQL query result -> passive check result) are applied regularly per a user-defined interval, similar to an active service check applied by an Icinga satellite or agent.

The obvious choice to monitor anything that exports time-series data scraped by Prometheus et al. is Grafana and/or Alertmanager. However this little tool might come in handy when you/your organization relies primarily on Icinga2 for infrastructure monitoring but wants to integrate some services that export time-series data into the mix. In this case setting up Grafana and/or Alertmanager alongside Icinga2 may be overkill (or not ... I guess it "depends").

Installation

Prerequisites: Rust >= v1.58.0

.deb package

To install the .deb package either download from GitHub and

dpkg -i <path_to_package>

OR

Build it yourself by first installing the cargo deb command:

cargo install cargo-deb

and then building the package:

cargo deb

The .deb package also provides a systemd unit file and a default configuration file in /etc/vec2checkd/config.yaml.

Binary only

cargo install vec2checkd

Configuration

vec2checkd reads its configuration from a single YAML file (default is /etc/vec2checkd/config.yaml. A custom location may be provided using the --config command-line argument. See the documentation on configuration for a detailed explanation on the contents of the YAML file.

Examples

Below is an example configuration that starts out pretty simple relying primarily on defaults set by vec2checkd.

```yaml

Connects to 'http://localhost:9090' by default.

prometheus: {} icinga: host: 'https://my-satellite.exmaple.com:5665' authentication: method: 'x509' clientcert: '/var/lib/vec2checkd/ssl/kubernetes-monitoring.crt' clientkey: '/var/lib/vec2checkd/ssl/kubernetes-monitoring.key' mappings: 'Failed ingress requests': query: 'sum(rate(nginxingresscontroller_requests{cluster="production",status!~"2.."}[5m]))' host: 'Kubernetes Production' service: 'Failed ingress requests'

... ```

As per the defaults that come into play here, this mapping will execute the PromQL query every 60 seconds and send the following default plugin output and performance data to Icinga2 in order to update the service object "Failed ingress requests" on host "Kubernetes Production". The status will be 0 (OK) as no thresholds have been defined.

```

plugin output

[OK] 'Failed ingress requests' is 34.43

performance data

'Failed ingress requests'=34.4393348197696023;;;; ```

Now extend the configuration with another example mapping that builts on the existing one.

```yaml prometheus: {} icinga: host: 'https://my-satellite.exmaple.com:5665' authentication: method: 'x509' clientcert: '/var/lib/vec2checkd/ssl/kubernetes-monitoring.crt' clientkey: '/var/lib/vec2checkd/ssl/kubernetes-monitoring.key' mappings: 'Failed ingress requests': query: 'sum(rate(nginxingresscontroller_requests{cluster="production",status!~"2.."}[5m]))' host: 'Kubernetes Production' service: 'Failed ingress requests'

'Successful ingress requests': query: 'sum(rate(nginxingresscontrollerrequests{cluster="production",status=~"2.."}[5m]))' host: 'Kubernetes Production' service: 'Successful ingress requests' interval: 300 # In words: "WARN if the value dips below 200 or CRIT when the value dips below 100". thresholds: warning: '@200' critical: '@100' pluginoutput: '[$state] Nginx ingress controller processes $value requests per second (HTTP 2xx)' performance_data: enabled: true label: 'requests'

... ```

The second mapping will only be applied every 300 seconds. The warning and critical thresholds are also considered before the final check result is sent to Icinga2. Given the PromQL query evaluates to a value of "130.0", vec2checkd sends status 1 (WARNING) and the following plugin output and performance data to the API.

```

plugin output

[WARNING] Nginx ingress controller processes 130 requests per second (HTTP 2xx)

performance data

'requests'=130.0;@0:200;@0:100;; ```

There is a little more going on here, so check the documentation about details on the placeholders in the pluginoutput field, the thresholds, the performancedata object etc.

Limitations

ToDos