Monitoring service¶

The beegfs-mon service collects statistics from the system and provides them to the user using a time series database (InfluxDB). For visualization of the data beegfs-mon provides predefined Grafana panels that can be used out of the box, or the user can use whatever tool he prefers.

Installation¶

The service and the Grafana panels are contained in the optional beegfs-mon package. The package is available from the general BeeGFS repository.

Additionally, a working and reachable InfluxDB setup is required. Installing InfluxDB should be simple in most cases since there are prebuilt packages available for all of the distributions that are supported by BeeGFS. The installation instructions for InfluxDB version 1.8 can be found here and for InfluxDB version 2.x can be found on this page.

It can be installed on the same host, but if you have an existing installation, you can use this one as well. Just make sure beegfs-mon can access it via http.

If you want to use the prebuilt Grafana panels (or want to create your own), you also need Grafana. It also doesn’t need to be on the same host, it just needs http access to the InfluxDB instance. For installation instructions, please refer to the official Grafana website..

To enable BeeGFS monitoring with system monitoring using Telegraf, Telegraf needs to be installed on all nodes. If Telegraf is available in your distribution’s package manager, it is recommended to install it from there. However, if Telegraf is not available or you prefer to get the most recent version directly from upstream, you can download it from the official website.

Install Telegraf on your management, meta, and storage servers following the installation instructions provided on the website. After the installation, locate the Telegraf configuration directory. It is typically located at (/etc/telegraf/).

Create a new configuration file named beegfs_mon_telegraf.conf in the Telegraf configuration directory (/etc/telegraf/telegraf.d/).

For InfluxDB version 1.x:

$ vim /etc/telegraf/telegraf.d/beegfs_mon_telegraf.conf

[[outputs.influxdb]]
urls = ["http://localhost:8086"] # Replace with the actual InfluxDB URL
database = "your_database_name" # Replace with your desired InfluxDB database name
username = "your_username" # Replace with your InfluxDB username
password = "your_password" # Replace with your InfluxDB password

[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

[[inputs.diskio]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.system]]

For InfluxDB version 2.x:

$ vim /etc/telegraf/telegraf.d/beegfs_mon_telegraf.conf

[[outputs.influxdb_v2]]
urls = ["http://localhost:8086"] # Replace with the actual InfluxDB URL
token = "your_influxdb_token" # Replace with your InfluxDB 2.x token
organization = "your_organization" # Replace with your InfluxDB 2.x organization
bucket = "your_bucket" # Replace with your InfluxDB 2.x bucket

[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

[[inputs.diskio]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.system]]

Save the beegfs_mon_telegraf.conf file.

Make sure you have installed and configured Telegraf on all servers (management, meta and storage). Then start the service with

$ systemctl start telegraf

Configuration¶

Before running beegfs-mon, make sure to edit the two files located at /etc/beegfs/beegfs-mon.auth and /etc/beegfs/beegfs-mon.conf. In the /etc/beegfs/beegfs-mon.auth, include the username and password of InfluxDB if you’re using InfluxDB version 1.x. For InfluxDB version 2.x, add the organization and token instead.

Next, modify the /etc/beegfs/beegfs-mon.conf. If everything is installed on the same host, you only need to specify the management host (sysMgmtdHost). In case you’re using InfluxDB version 2.x, update the value of (dbType) from influxdb to influxdb2, include the bucket name in the (dbBucket) and add the file path on (dbAuthFile) to /etc/beegfs/beegfs-mon.auth. If you are using Authentication download your shared secret to the path defined by connAuthFile or set connDisableAuthentication to true to disable connection authentication.

If your InfluxDB is installed on another host or you need to use a different database name, you must also modify the corresponding entries (dbHostName, dbHostPort, dbDatabase).

After editing the configuration, you can start the service with

$ systemctl start beegfs-mon

Grafana panels: Default installation¶

A set of Grafana panels for use with BeeGFS is provided by the beegfs-mon-grafana package. Once it is installed they can be imported using the script /opt/beegfs/scripts/grafana/import-dashboards. For the out-of-the-box setup with InfluxDB and Grafana being on the same host, just use

$ cd /opt/beegfs/scripts/grafana
$ ./import-dashboards default

Grafana panels: Custom installation¶

In any other case, either provide the script with the URLs to InfluxDB and Grafana (call the script without arguments for usage instruction) or install them manually. The latter can be done from within Grafanas web interface:

First, the data source must be defined. In the main menu, click on Data Sources and then Add Data Source. Enter a name, hostname and port where your InfluxDB is running. Set the name of the Database (default: beegfs_mon). Save.

To add the dashboards, select Dashboards/Import from the main menu. Choose one of the dashboard .json files located at /opt/beegfs/scripts/grafana/. Depending on your InfluxDB version, select either a file ending with influxdbv1.json (e.g. beegfs_overview_influxdbv1.json) for InfluxDB version 1 or a file ending with influxdbv2.json (e.g. beegfs_overview_influxdbv2.json) for InfluxDB version 2. if you are using telegraf select the file ending with (e.g. beegfs_overview_telegraf_influxdbv1.json) or file ending with (e.g. beegfs_overview_telegraf_influxdbv2.json) Select the data source you created previously from the dropdown menu and click Import. Repeat for the rest of the panels.

You can now click on Dashboards in the main menu and then on the Button to the right of it. A list of the installed dashboards should pop up, in which you can select the one you want to watch. If your BeeGFS setup, the beegfs-mon service, and InfluxDB are already running and are configured properly, you should already see some data being collected.

You can also update a dashboard in Grafana by deleting the existing one and importing a new version, you can follow these steps:

Log in to your Grafana instance.
Navigate to the dashboard you want to update.
Click on the Settings (gear icon) in the top toolbar to access the dashboard settings.
Scroll down to the bottom of the settings page and click on the “Delete” button.
Confirm the deletion when prompted. This will delete the existing dashboard from Grafana.
Once the dashboard is deleted, go to the Grafana homepage or the Dashboards section.
Click on the “Import” button to import a new dashboard.

The new dashboard will be imported and saved in Grafana, replacing the previous version.

For more documentation and help in using Grafana, please visit the official website http://docs.grafana.org.

Grafana Alerts setup¶

To take advantage of the new alerting feature, you need to run the alerting script provided in the beegfs-mon-grafana package. The script, located at /opt/beegfs/scripts/grafana/import-alerts, sets up preconfigured BeeGFS alerts including an email template, contact point and notification policies. After running the script, update the placeholder email address in the contact point configuration with your own email address to receive alerts. By default all alert are paused you can unpause them from grafana UI.

Steps to unpause alert evaluation¶

In the left-side menu, click on “Alerting”
Click on “BeeGFS-Alert” to see the list of existing alerts
Identify the alert you wish to unpause, then click “Edit” (the pen icon)
Scroll down to find the “Pause evaluation” option. Click the button to unpause the alert
Save your changes and exit editing mode

By completing these steps, you’ve successfully unpaused the specified alert, allowing it to resume evaluation based on the configured conditions.

Steps to edit a contact point¶

In the left-side menu, click “Alerting”
Click “Contact points” to view a list of existing contact points
Find the BeeGFS email contact point to edit, and then click “Edit” (the pen icon)
Change “Addresses” section with your email and click “Save contact point”

Note

For email alerts to work correctly, it’s essential to configure the SMTP settings in the grafana configuration file /etc/grafana/grafana.ini. This configuration is necessary to enable Grafana to send email notifications in response to defined alert conditions.

Alert customization for system-specific requirements¶

Users can customize their alert preferences to align with their specific configurations. The following are elements they can modify:

Pending period

Setting a pending period helps stop unnecessary alerts for short-term issues. In the pending period, you select the period in which an alert rule can be in breach of the condition until the alert fires.
Alert condition

An alert condition is the query or expression that determines whether the alert will fire or not depending on the value it yields. There can be only one condition which will determine the triggering of the alert.

For more details see the Grafana documentation on alert rules.

Steps to edit an alert rule¶

In the left-side menu, click “Alerting”
Click on “BeeGFS-Alert” to see the list of existing alerts
Identify the alert you wish to edit, then click on “Edit” (the pen icon)
Adjust the alert condition threshold or pending period as per your requirements

For more details see the Grafana documentation on queries and conditions.

Using Telegraf with BeeGFS Monitoring¶

If you’re using Telegraf for monitoring, you’ll need to update the configuration file located at /etc/telegraf/telegraf.d/beegfs_mon_telegraf.conf to enable service status alerts. Follow these steps:

$ vim /etc/telegraf/telegraf.d/beegfs_mon_telegraf.conf

At the end of the file add procstat input plugin for systemd service monitoring.

[[inputs.procstat]]
systemd_unit = "beegfs-service-name"

For example, if both the BeeGFS meta and storage services are running on the same machine, update the Telegraf configuration file to

[[inputs.procstat]]
systemd_unit = "beegfs-meta.service"
[[inputs.procstat]]
systemd_unit = "beegfs-storage.service"

If BeeGFS components are distributed across multiple machines, ensure that you update the Telegraf configuration files on each relevant machine.

$ systemctl restart telegraf

After updating the Telegraf configuration files of all servers, you can proceed to unpause the service alert.

Usage¶

You can connect to Grafana using your web browser. If you installed the predefined panels, you will find five of them: One for BeeGFS overview, one for meta service statistics, one for storage, one for storage targets and one for client operations. You can modify the node shown using the drop down on the upper left corner.

If you want to write your own Grafana panels or use other software to process the collected data, you can access the InfluxDB using one of its provided APIs. Please refer to the InfluxDB documentation for details. Here you find a reference of the used fields and tags in the database.

Apache Cassandra Support¶

beegfs-mon supports the use of a Apache Cassandra database as database backend. Unless you already have a Cassandra installation you want to use or have other reasons to specifically use Casssandra, we recommend to use InfluxDB. It is more lightweight and easier to handle. Also, there are no Grafana panels available for Cassandra.

To use Cassandra, you need to install a third-party library: https://github.com/datastax/cpp-driver. For BeeGFS version 7.1 it has to be version 2.9. Make sure, the dynamic library is located in the standard path, so it can be loaded by the service. To load the library and use Cassandra, change the corresponding line in the mon configuration file from influxdb to cassandra. Cassandra uses slightly different options for configuration as you can see there, but you can achieve the same functionality as with InfluxDB. Please refer to the configuration file documentation for details.