MinMon - an opinionated minimal monitoring and alarming tool (for Linux)

This tool is just a single binary and a config file. No database, no GUI, no graphs. Just monitoring and alarms. I wrote this because the exsiting alternatives I could find were too heavy, mainly focused on nice GUIs with graphs (not on alarming), too complex to setup or targeted at cloud/multi-instance setups.

test workflow docker workflow cargo-publish workflow cargo-deny workflow crates.io License Latest SemVer tag AUR version

Checks

Actions

Report

The absence of alarms can mean two things: everything is okay or the monitoring/alarming failed altogether. That's why MinMon can trigger regular report events to let you know that it's up and running.

Design decisions

Config file

The config file uses the TOML format and has the following sections: - log - report - actions - checks

Architecture

System overview

```mermaid graph TD A(Config file) --> B(Main loop) B -->|interval| C(Check 1) B -.-> D(Check 2..n) C -->|data| E(Alarm 1) C -.-> F(Alarm 2..m) E -->|cycles, repeatcycles| G(Action) E -->|recovercycles| H(Recover action) E -->|errorrepeatcycles| I(Error action)

style C fill:green;
style D fill:green;
style E fill:red;
style F fill:red;
style G fill:blue;
style H fill:blue;
style I fill:blue;

```

Alarm state machine

Each alarm has 3 possible states. "Good", "Bad" and "Error".\ It takes cycles consecutive bad data points to trigger the transition from "Good" to "Bad" and recover_cycles good ones to go back. These transitions trigger the action and recover_action actions. During the "Bad" state, action will be triggered again every repeat_cycles cycles (if repeat_cycles is not 0).\ \ The "Error" state is a bit special as it only "shadows" the other states. An error means that there is no data available at all, e.g. the filesystem usage for /home could not be determined. Since this should rarely ever happen, the transition to the error state always triggers the error_action on the first cycle. If there is valid data on the next cycle, the state machine continues as if the error state did not exist.

```mermaid stateDiagram-v2 direction LR

[*] --> Good
Good --> Good
Good --> Bad: action/cycles
Good --> Error: error_action

Bad --> Good: recover_action/recover_cycles
Bad --> Bad: repeat_action/repeat_cycles
Bad --> Error: error_action

Error --> Good
Error --> Bad
Error --> Error: error_repeat_action/error_repeat_cycles

```

Example

Check the mountpoint at /home every minute. If the usage level exceeds 70% for 3 consecutive cycles (i.e. 3 minutes), the "Warning" alarm triggers the "Webhook 1" action. The action repeats every 100 cycles until the "Warning" alarm recovers. This happens after 5 consecutive cycles below 70% which also triggers the "Webhook 1" action. If there is an error while checking the filesystem usage, the "Log error" action is triggered. This is repeated every 200 cycles.

Config

```toml [[checks]] interval = 60 name = "Filesystem usage" type = "FilesystemUsage" mountpoints = ["/home"]

[[checks.alarms]] name = "Warning" level = 70 cycles = 3 repeatcycles = 100 action = "Webhook 1" recovercycles = 5 recoveraction = "Webhook 1" errorrepeatcycles = 200 erroraction = "Log error"

[[actions]] name = "Webhook 1" type = "Webhook" url = "https://example.com/hook1" body = """{"text": "{{alarmname}}: {{checkname}} on mountpoint '{{alarm_id}}' reached {{level}}%."}""" headers = {"Content-Type" = "application/json"}

[[actions]] name = "Log error" type = "Log" level = "Error" template = """{{checkname}} check didn't have valid data for alarm '{{alarmname}}' and id '{{alarm_id}}'.""" ```

The webhook text will be rendered into something like "Warning: Filesystem usage on mountpoint '/home' reached 70%."

Diagram

```mermaid graph TD A(example.toml) --> B(Main loop) B -->|every 60 seconds| C(FilesystemUsage 1: '/srv') C -->|level '/srv': 60%| D(LevelAlarm 1: 70%) D -->|cycles: 3, repeatcycles: 100| E(Action: Webhook 1) D -->|recovercycles: 5| F(Recover action: Webhook 1) D -->|errorrepeatcycles: 200| G(Error action: Log error)

style C fill:green;
style D fill:red;
style E fill:blue;
style F fill:blue;
style G fill:blue;

```

Some (more exotic) ideas

Just to give some ideas of what's possible: - Run it locally on your workstation and let it send you notifications to your desktop environment using the Process action and notify-send when the filesystem fills up. - Use the report in combination with the Webhook action and telepush and let it send you "I'm still alive, since {{minmon_uptime}} seconds!" once a week to your Telegram messenger for the peace of mind.

Placeholders

To improve the reusability of the actions, it's possible to define custom placeholders for the report, events, checks, alarms and actions. When an action is triggered, the placeholders (generic and custom) are merged into the final placeholder map. Inside the action (depending on the type of the action) the placeholders can be used in one or more config fields using the {{placeholder_name}} syntax. There are also some generic placeholders that are always available and some that are specific to the check that triggered the action. Placeholders that don't have a value available when the action is triggered will be replaced by an empty string.

Installation

Docker image

To pull the docker image use sh docker pull ghcr.io/flo-at/minmon:latest or the example docker-compose.yml file.\ In both cases, read-only mount your config file to /etc/minmon.toml.

Build and install using cargo

Make sure cargo is correctly installed on your local machine. You can either install MinMon from crates.io using sh cargo install --all-features minmon Or if you already checked out the repository, you can build and install your local copy like this: sh cargo install --all-features --path . Copy the systemd.minmon.service file to /etc/systemd/system/minmon.service and place your config file at path /etc/minmon.toml. You can enable and start the service with systemctl daemon-reload && systemctl enable --now minmon.service.\ \ If you don't want to include the systemd integration, leave out the --all-features option.

Install for the AUR (Arch Linux)

Use your package manager of choice to install the minmon package from the AUR.\ Place your config file at path /etc/minmon.toml. You can enable and start the service with systemctl daemon-reload && systemctl enable --now minmon.service.\

systemd integration (optional)

Roadmap

Check ideas

General ideas

Contributions

Contributions are very welcome! Right now MinMon is pretty basic but it's also super easy to extend. Even if it's just a typo in the documentation, I'll be happy to merge your PR. If you're looking for a new check or action type, just open a new issue (if it doesn't exist yet) and tag it with the "enhancement" label.