Filesystem Modification Events

BeeGFS modification events provide a way for external applications to be informed about changes happening inside the file system (sometimes referred to as a file system changelog). Similar functionality is provided by tools such as inotify, but since BeeGFS is designed for parallel access by many clients, inotify would need to be deployed across all clients to have a complete picture of everything happening inside the file system. Modification event logging provides an efficient way to record changes at the source and simple mechanisms for external applications to subscribe to events.

Overview

Before setting it up, there are a few key parts to the event logging system to be aware of:

  • To avoid the event logging system from interfering with file system performance, when a change happens in BeeGFS that triggers an event, metadata servers log the event to an an on-disk “event queue” and immediately return a response to the client.

    • The event queue is a fixed sized “ring buffer” meaning as new events are added to the back of the queue, old events will be dropped automatically. The size of the queue is configurable allowing for each metadata server to retain a variable number of events.

  • Metadata servers allow an external application known as a “listener” to interact with their event queue using a Unix socket and a low level “File Event” protocol.

    • This protocol allows listeners to start streaming from the most recent event or any event in the queue based on its sequence number.

  • Generally users are not expected to implement their own listeners and instead BeeGFS provides two options, Watch and the legacy beegfs-event-listener to forward events to external applications referred to as “subscribers”.

    • Watch is the recommended listener starting in BeeGFS 8. Watch handles streaming events over the network to one or more subscribers using gRPC. Each subscriber can be reading from a different point inside the event queue and Watch gracefully handles when subscribers disconnect, avoiding sending duplicate events or dropping events as long as the subscriber reconnects while the event is still available. Watch and/or the Metadata server it is listening to can restart and will automatically resume sending events to subscribers from the point they left off.

    • The legacy beegfs-event-listener provides the same functionality as it did in BeeGFS 7. It prints JSON formatted events to stdout for a single subscriber. It does not provide a way for subscribers to resume from a particular event after a restart and will always resume from the most recent event in the queue. It is provided mainly to maintain compatibility with subscribers that rely on the pre-BeeGFS 8 functionality. It is recommended all new applications use Watch instead. Existing applications should consider migrating from beegfs-event-listener to Watch.

Configuration

Filesystem Modification Events must be enabled on all client and metadata nodes to work correctly. If you are using metadata mirroring events are only emitted from the primary node, meaning logging must also be enabled on all secondary nodes as they will become the primary in case of a failover.

Clients

The metadata server has to rely on the client to forward some information when doing some actions to be able to complete the modification event messages. The events of interest can be selected in the client configuration file:

sysFileEventLogMask = flush,close,trunc,setattr,link-op,open-read,open-write,open-readwrite

For a complete coverage of all possible events, switch on everything, as shown above. If you only need a subset of event types, others can be removed from the list to reduce the performance overhead. But usually, this is not worthwhile since the overhead is very small.

Metadata Servers

To enable the event stream, specify the path for the UNIX socket to use in the configuration file of the metadata server. For example:

sysFileEventLogTarget = unix:/run/beegfs/eventlog

If sysFileEventLogTarget is set, the server will try to write to this socket every time a filesystem event occurs that is related to this metadata server. The receiving application has to open the socket at that path and it is recommended to start the receiving application before the metadata server to ensure timely processing of all events.

Warning

To capture all events of the file system sysFileEventLogMask must be configured on all clients and sysFileEventLogTarget must be configured on all metadata servers, each with their own local UNIX socket and event listener. The merging of the multiple streams is left as a task for the listening application.

If the receiving application is NOT listening when the metadata service tries to write an event to the socket, for example if it stops for some reason, the metadata service will only queue the event internally using the on-disk “eventq” and will resend any events from the point requested by the listener when it reconnects. By default the event queue is persisted in a sub-directory eventq/ of the metadata target (storeMetaDirectory) however this could be customized if you are concerned about the modification events interfering with your metadata disk performance:

sysFileEventPersistDirectory = <path>

By default the metadata service will use a few gigabytes of space to store these events, which is typically sufficient if your listener is only ever expected to be offline briefly. If you are concerned about losing events you could also increase the amount of space used to buffer events when the listener is not available:

sysFileEventPersistSize = <size> // Size accepts suffix M for megabytes or G for gigabyte.

The actual size of each event varies by the event type and the length of the paths. Events stored in the metadata event queue can range in size from ~72 bytes (e.g., creating a file in the root directory with a single letter name) up to ~8,287 bytes (e.g., renaming a file with a 4096 character name, the max in Linux). For example a 1GiB (1,048,576 KiB or 1,073,741,824 bytes) on-disk metadata event queue could store between 129,569 and 14,913,080 events depending on their size. Set the size based on the average/expected number of requests per second this metadata node will handle, average event size, and how long you wish to be able to tolerate the listener/subscribers being offline.

Warning

If sysFileEventPersistDirectory changes then any previous events that have not been sent to the receiving application will be lost. To migrate the eventq to a new device you should copy the original eventq directory to a new location then update sysFileEventPersistDirectory to point to the new path. Currently the sysFileEventPersistSize parameter cannot be changed after the metadata first starts without deleting the eventq directory and allowing it to be recreated which will loose all outstanding events. Recreating the eventq also causes the event sequence IDs generated by this metadata node to reset, which may cause problems for some subscribers depending how they rely on the sequence IDs.

Listeners

Watch

Deploy Subscribers

Before deploying Watch you will typically want to deploy one or more subscribers where Watch should send events. While Watch is designed to support multiple subscriber types, currently the only supported subscriber type is “gRPC” which uses Protocol Buffers as the Interface Definition Language (IDL) and the provided protocol buffer definitions for Watch make it simple to integrate BeeGFS modification events with applications written in a variety of languages including C++, Go, Rust, Python, and more. To get started refer to the README.md file included with the Watch source code and provided subscriber example written in Go.

In the future Watch may be extended to support more subscriber types and provide more pre-built subscribers to allow for out-of-the-box integration with popular applications based on user demand.

Install/Configure the Watch Service

Note

Watch is an enterprise feature and when it first starts up it will check in with the management service to verify its licensing status. This check only happens at startup ensuring if the management service is briefly unavailable the event stream is not disrupted. Before you begin run beegfs license and ensure the io.beegfs.watch feature is licensed.

Use the following steps to install/configure Watch on all of your Metadata servers. Currently Watch can only listen to events from a single metadata service, if you have a multi mode deployment with multiple metadata services on the same server please consult with support for how to adapt this procedure for your environment.

  1. If needed add the BeeGFS package repositories to your package manager. This has most likely already been done if the metadata service is already installed.

  2. Install the beegfs-watch package using your distribution’s package manager.

  3. Using your preferred text editor, edit the /etc/beegfs/beegfs-watch.toml file. Available settings are documented in the file, here is an overview of the minimum configuration required to run the Watch service:

    • The [management] section controls how Watch connects to the management service. Update the address to the IP address/hostname and port where the management service is listening for gRPC traffic. If you are using connection based authentication download the same shared secret as used for your BeeGFS management service to /etc/beegfs/conn.auth (otherwise set auth-disable = true). If needed update the client TLS configuration used to connect to the management service.

      • Unless you opted to disable TLS, generally you can just copy the same TLS certificate used by the management to /etc/beegfs/cert.pem. This is what the default configuration file expects and is the easiest way to get TLS configured uniformly.

    • The [[metadata]] section controls how Watch listens to the metadata service on this node. The event-log-target must be set to the same path as the sysFileEventLogTarget in your beegfs-meta.conf file (note Watch does not expect the unix:// prefix).

    • Define one or more subscribers. The configuration for each subscriber should be in its own [[subscriber]] section. At minimum for each subscriber you need to set a unique id, name and grpc-address. Depending on the subscriber TLS configuration you may need to adjust the gRPC TLS configuration.

      • Note Watch connects to subscribers so it is considered the TLS client. The subscriber gRPC server could be started using the same TLS certificate/key as the management service, or you could use unique key/cert pairs for each subscriber depending on your security requirements.

  4. Start and enable the service to ensure it automatically restarts after a reboot: systemctl enable --now beegfs-watch.

  5. Verify the service finished startup and was able to connect to all subscribers by running: journalctl -u beegfs-watch.

Reconfiguring Watch

Subscribers and the log level can be updated after starting Watch without requiring a restart. The intent is to allow subscribers to be added, removed, and updated without impacting other subscribers. Make the necessary updates to your configuration file then run systemctl reload beegfs-watch. Check the logs to confirm the new configuration was applied successfully.

Note only configuration set using the configuration file can be updated without a restart, for example if you set the log level using an environment variable or flag it cannot be updated because those configuration sources have the highest precedence and are immutable.

beegfs-event-listener

The beegfs-event-listener program is included in the beegfs-utils package. It opens a UNIX socket at the specified path and listens for incoming messages. For example

$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog

Every message is printed as one line of JSON formatted output.

Example:

$ mv /mnt/beegfs0/a /mnt/beegfs0/b

This will result output like:

{ "FormatVersion": 2, "EventFlags": 0, "NumLinks": 1, "Event": { "Type": "Create", "Path": "\/mydir\/myfile", "EntryId": "D86-67CE19D0-1", "ParentEntryId": "0-67570C20-2", "TargetPath": "", "TargetParentId": "", "UserID": 1000, "Timestamp": 1741560310395710449 } }

The output can easily be parsed by scripts. For example this simple ruby program will print the event type and the file path for each event:

read-event-log.rb
 1#!/usr/bin/env ruby
 2
 3require "json"
 4
 5def printEvent(event)
 6   if event
 7      print "Event: #{event['Type']} #{event['Path']}\n"
 8   end
 9end
10
11while a = gets
12   json_data = JSON.parse(a)
13   printEvent(json_data['Event'])
14end

Use it like this:

$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog | ./read-event-log.rb

Event Details

Every event message consists of the following fields:

  • Format Version - Always “2” for events generated by BeeGFS 8 metadata nodes (uint16).

    • This field is not directly included with messages send by Watch using gRPC. Instead the protocol buffer defined event messages handle multiple versions of the event data using a oneof field that will contain event data for a particular format version.

  • Sequence ID - A monotonically increasing integer uniquely identifying an event generated by a metadata node. The generated sequence of IDs will reset if the event queue is recreated (uint64).

    • Subscribers can use the sequence ID to confirm they received all events and potentially take action if they detect events were dropped or if duplicate events are received. Duplicate events can be avoided if subscribers acknowledges the sequence ID of the last event received when reconnecting to Watch.

  • Metadata ID - The ID of the metadata node that generated this event. The Sequence ID + Metadata ID uniquely identify a particular event in a BeeGFS instance.

    • This field is included with every event message sent by Watch using gRPC. For the beegfs-event-listener this field is only included as part of the initial handshake.

  • Metadata Mirror - The ID of the metadata buddy group for this metadata node if applicable. This will either be “0” or nil (for events sent by Watch) if the node is not part of a buddy group. This field will always be set when the node is a member of a buddy group, even if the entry itself is not be mirrored (see the event flags to determine if the event is for a mirrored entry).

    • This field is included with every event message sent by Watch using gRPC. For the beegfs-event-listener this field is only included as part of the initial handshake.

    • Important: If the metadata node is added to a buddy group after enabling modification events, this field will not be populated until after the metadata node is restarted.

  • The actual event:

    • Event Flags - A set of bitwise flags stored as an integer. Use bitwise operations (&) to determine which flag(s) are currently set, (e.g., if event.EventFlags&EVENTFLAG_MIRRORED != 0 { fmt.Println("the entry is mirrored") }) (uint32).

      • EVENTFLAG_NONE (0x00000000): No flags are set.

      • EVENTFLAG_MIRRORED (0x00000001): The event is for a mirrored entry.

      • EVENTFLAG_SECONDARY (0x00000002): The event was generated by the secondary node in a mirror (not currently used).

    • Link Count - Number of links to this entry (uint64).

    • Event Type - What type of event happened (uint32).

    • Path - Full path relative to the mount point of the file/directory (string).

    • Entry ID - The unique ID of the file/directory in BeeGFS, similar to an inode number in other file systems (string).

    • Parent Entry ID - The entry ID of the parent directory (string).

    • Target Path - Only used by select event types, refer to the documentation on each type for details (string).

    • Target Parent ID - Only used by select event types, refer to the documentation on each type for details (string).

    • Message User ID - The ID of the user that triggered the event (uint32)

    • Timestamp - Unix timestamp with nanosecond precision when the event was triggered. Note this may differ slightly from the atime/mtime (int64).

For most Event Types the target path and target entry ID fields are empty. The path, entry ID, parent entry ID, user ID, and time fields always contain information about the file/directory being worked on.

Event Types

The following event types exist:

Event

Description

Flush

File contents was flushed. File size might have changed.

Truncate

File was truncated. File size might have changed.

SetAttr

File attributes changed.

CloseAfterWrite

File was closed and possibly modified.

Create

New file was created.

MKdir

New directory was created.

MKnod

A block or character special file was created.

RMdir

Directory was removed.

Unlink

File removed.

Note: Multiple paths can reference the same EntryID (File Content). Disk space is only freed if the last link has been removed.

Symlink

A symbolic link was created.

Path, EntryId, and ParentEntryId: of the newly created link. TargetPath and TargetParentEntryID: of the referenced file/directory. Note: since this is a symbolic link, the target may contain relative and/or non-existing paths.

Hardlink

A hardlink was created.

Path: Path relative to the mount point of the new link EntryId: The EntryID of the referenced file ParentEntryId: The EntryID of the parent directory, containing the new link. Since hardlinks are only supported within the same directory, this is identical to the parent directory of the source. TargetPath, TargetEntryId: of the link target.

Rename

A file or directory was renamed or moved.

Path: Thing being moved EntryID: Its entryID ParentEntryID: Its parentEntryID TargetPath, TargetParentId: The path/name moved to and the EntryID of the new parent directory.

OpenRead

Create event log entries for open with O_RDONLY flag.

OpenWrite

Create event log entries for open with O_WRONLY flag.

OpenReadWrite

Create event log entries for open with O_RDWR flag.

LastWriterClosed

Triggered when no more clients have the entry open for writing.

Entry IDs

BeeGFS uses an Entry ID to identify files and directories, similar to inodes on other UNIX/Linux file systems. An Entry ID is a string of the following form:

root|disposal|mdisposal|[0-9A-F]{1,8}-[0-9A-F]{1,8}-[0-9A-F]{1,8}

The three hex numbers can be represented as positive, non-zero integers. The special cases root, disposal, mdisposal do not appear for normal files and are for internal bookkeeping only. They can be represented by the integer triple by including zeros, for example.