Filesystem Modification Events¶
BeeGFS modification events provide a way for external applications to be informed about changes happening inside the file system (sometimes referred to as a file system changelog). Similar functionality is provided by tools such as inotify, but since BeeGFS is designed for parallel access by many clients, inotify would need to be deployed across all clients to have a complete picture of everything happening inside the file system. Modification event logging provides an efficient way to record changes at the source and simple mechanisms for external applications to subscribe to events.
Overview¶
Before setting it up, there are a few key parts to the event logging system to be aware of:
To avoid the event logging system from interfering with file system performance, when a change happens in BeeGFS that triggers an event, metadata servers log the event to an an on-disk “event queue” and immediately return a response to the client.
The event queue is a fixed sized “ring buffer” meaning as new events are added to the back of the queue, old events will be dropped automatically. The size of the queue is configurable allowing for each metadata server to retain a variable number of events.
Metadata servers allow an external application known as a “listener” to interact with their event queue using a Unix socket and a low level “File Event” protocol.
This protocol allows listeners to start streaming from the most recent event or any event in the queue based on its sequence number.
Generally users are not expected to implement their own listeners and instead BeeGFS provides two options, Watch and the legacy beegfs-event-listener to forward events to external applications referred to as “subscribers”.
Watch is the recommended listener starting in BeeGFS 8. Watch handles streaming events over the network to one or more subscribers using gRPC. Each subscriber can be reading from a different point inside the event queue and Watch gracefully handles when subscribers disconnect, avoiding sending duplicate events or dropping events as long as the subscriber reconnects while the event is still available. Watch and/or the Metadata server it is listening to can restart and will automatically resume sending events to subscribers from the point they left off.
The legacy beegfs-event-listener provides the same functionality as it did in BeeGFS 7. It prints JSON formatted events to stdout for a single subscriber. It does not provide a way for subscribers to resume from a particular event after a restart and will always resume from the most recent event in the queue. It is provided mainly to maintain compatibility with subscribers that rely on the pre-BeeGFS 8 functionality. It is recommended all new applications use Watch instead. Existing applications should consider migrating from beegfs-event-listener to Watch.
Configuration¶
Filesystem Modification Events must be enabled on all client and metadata nodes to work correctly. If you are using metadata mirroring events are only emitted from the primary node, meaning logging must also be enabled on all secondary nodes as they will become the primary in case of a failover.
Clients¶
The metadata server has to rely on the client to forward some information when doing some actions to be able to complete the modification event messages. The events of interest can be selected in the client configuration file:
sysFileEventLogMask = flush,close,trunc,setattr,link-op,open-read,open-write,open-readwrite
For a complete coverage of all possible events, switch on everything, as shown above. If you only need a subset of event types, others can be removed from the list to reduce the performance overhead. But usually, this is not worthwhile since the overhead is very small.
Metadata Servers¶
To enable the event stream, specify the path for the UNIX socket to use in the configuration file of the metadata server. For example:
sysFileEventLogTarget = unix:/run/beegfs/eventlog
If sysFileEventLogTarget
is set, the server will try to write to this socket every time a
filesystem event occurs that is related to this metadata server. The receiving application has to
open the socket at that path and it is recommended to start the receiving application before the
metadata server to ensure timely processing of all events.
Warning
To capture all events of the file system sysFileEventLogMask
must be configured on all
clients and sysFileEventLogTarget
must be configured on all metadata servers, each with their
own local UNIX socket and event listener. The merging of the multiple streams is left as a task
for the listening application.
If the receiving application is NOT listening when the metadata service tries to write an event to
the socket, for example if it stops for some reason, the metadata service will only queue the event
internally using the on-disk “eventq” and will resend any events from the point requested by the
listener when it reconnects. By default the event queue is persisted in a sub-directory eventq/
of the metadata target (storeMetaDirectory
) however this could be customized if you are
concerned about the modification events interfering with your metadata disk performance:
sysFileEventPersistDirectory = <path>
By default the metadata service will use a few gigabytes of space to store these events, which is typically sufficient if your listener is only ever expected to be offline briefly. If you are concerned about losing events you could also increase the amount of space used to buffer events when the listener is not available:
sysFileEventPersistSize = <size> // Size accepts suffix M for megabytes or G for gigabyte.
The actual size of each event varies by the event type and the length of the paths. Events stored in the metadata event queue can range in size from ~72 bytes (e.g., creating a file in the root directory with a single letter name) up to ~8,287 bytes (e.g., renaming a file with a 4096 character name, the max in Linux). For example a 1GiB (1,048,576 KiB or 1,073,741,824 bytes) on-disk metadata event queue could store between 129,569 and 14,913,080 events depending on their size. Set the size based on the average/expected number of requests per second this metadata node will handle, average event size, and how long you wish to be able to tolerate the listener/subscribers being offline.
Warning
If sysFileEventPersistDirectory
changes then any previous events that have not been sent to
the receiving application will be lost. To migrate the eventq to a new device you should copy the
original eventq directory to a new location then update sysFileEventPersistDirectory
to point
to the new path. Currently the sysFileEventPersistSize
parameter cannot be changed after the
metadata first starts without deleting the eventq directory and allowing it to be recreated which
will loose all outstanding events. Recreating the eventq also causes the event sequence IDs
generated by this metadata node to reset, which may cause problems for some subscribers depending
how they rely on the sequence IDs.
Listeners¶
Watch¶
Deploy Subscribers¶
Before deploying Watch you will typically want to deploy one or more subscribers where Watch should
send events. While Watch is designed to support multiple subscriber types, currently the only
supported subscriber type is “gRPC” which uses Protocol Buffers as the Interface Definition Language
(IDL) and the provided protocol buffer definitions for Watch make it simple to integrate BeeGFS
modification events with applications written in a variety of languages including C++, Go, Rust,
Python, and more. To get started refer to the README.md
file included with the Watch source code
and provided subscriber example written in Go.
In the future Watch may be extended to support more subscriber types and provide more pre-built subscribers to allow for out-of-the-box integration with popular applications based on user demand.
Install/Configure the Watch Service¶
Note
Watch is an enterprise feature and when it first starts up it will check in with the management
service to verify its licensing status. This check only happens at startup ensuring if the
management service is briefly unavailable the event stream is not disrupted. Before you begin run
beegfs license
and ensure the io.beegfs.watch
feature is licensed.
Use the following steps to install/configure Watch on all of your Metadata servers. Currently Watch can only listen to events from a single metadata service, if you have a multi mode deployment with multiple metadata services on the same server please consult with support for how to adapt this procedure for your environment.
If needed add the BeeGFS package repositories to your package manager. This has most likely already been done if the metadata service is already installed.
Install the
beegfs-watch
package using your distribution’s package manager.Using your preferred text editor, edit the
/etc/beegfs/beegfs-watch.toml
file. Available settings are documented in the file, here is an overview of the minimum configuration required to run the Watch service:The
[management]
section controls how Watch connects to the management service. Update theaddress
to the IP address/hostname and port where the management service is listening for gRPC traffic. If you are using connection based authentication download the same shared secret as used for your BeeGFS management service to/etc/beegfs/conn.auth
(otherwise setauth-disable = true
). If needed update the client TLS configuration used to connect to the management service.Unless you opted to disable TLS, generally you can just copy the same TLS certificate used by the management to
/etc/beegfs/cert.pem
. This is what the default configuration file expects and is the easiest way to get TLS configured uniformly.
The
[[metadata]]
section controls how Watch listens to the metadata service on this node. Theevent-log-target
must be set to the same path as thesysFileEventLogTarget
in yourbeegfs-meta.conf
file (note Watch does not expect theunix://
prefix).Define one or more subscribers. The configuration for each subscriber should be in its own
[[subscriber]]
section. At minimum for each subscriber you need to set a uniqueid
,name
andgrpc-address
. Depending on the subscriber TLS configuration you may need to adjust the gRPC TLS configuration.Note Watch connects to subscribers so it is considered the TLS client. The subscriber gRPC server could be started using the same TLS certificate/key as the management service, or you could use unique key/cert pairs for each subscriber depending on your security requirements.
Start and enable the service to ensure it automatically restarts after a reboot:
systemctl enable --now beegfs-watch
.Verify the service finished startup and was able to connect to all subscribers by running:
journalctl -u beegfs-watch
.
Reconfiguring Watch¶
Subscribers and the log level can be updated after starting Watch without requiring a restart. The
intent is to allow subscribers to be added, removed, and updated without impacting other
subscribers. Make the necessary updates to your configuration file then run systemctl reload
beegfs-watch
. Check the logs to confirm the new configuration was applied successfully.
Note only configuration set using the configuration file can be updated without a restart, for example if you set the log level using an environment variable or flag it cannot be updated because those configuration sources have the highest precedence and are immutable.
beegfs-event-listener¶
The beegfs-event-listener
program is included in the beegfs-utils
package. It opens a UNIX
socket at the specified path and listens for incoming messages. For example
$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog
Every message is printed as one line of JSON formatted output.
Example:
$ mv /mnt/beegfs0/a /mnt/beegfs0/b
This will result output like:
{ "FormatVersion": 2, "EventFlags": 0, "NumLinks": 1, "Event": { "Type": "Create", "Path": "\/mydir\/myfile", "EntryId": "D86-67CE19D0-1", "ParentEntryId": "0-67570C20-2", "TargetPath": "", "TargetParentId": "", "UserID": 1000, "Timestamp": 1741560310395710449 } }
The output can easily be parsed by scripts. For example this simple ruby program will print the event type and the file path for each event:
1#!/usr/bin/env ruby
2
3require "json"
4
5def printEvent(event)
6 if event
7 print "Event: #{event['Type']} #{event['Path']}\n"
8 end
9end
10
11while a = gets
12 json_data = JSON.parse(a)
13 printEvent(json_data['Event'])
14end
Use it like this:
$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog | ./read-event-log.rb
Event Details¶
Every event message consists of the following fields:
Format Version - Always “2” for events generated by BeeGFS 8 metadata nodes (uint16).
This field is not directly included with messages send by Watch using gRPC. Instead the protocol buffer defined event messages handle multiple versions of the event data using a oneof field that will contain event data for a particular format version.
Sequence ID - A monotonically increasing integer uniquely identifying an event generated by a metadata node. The generated sequence of IDs will reset if the event queue is recreated (uint64).
Subscribers can use the sequence ID to confirm they received all events and potentially take action if they detect events were dropped or if duplicate events are received. Duplicate events can be avoided if subscribers acknowledges the sequence ID of the last event received when reconnecting to Watch.
Metadata ID - The ID of the metadata node that generated this event. The Sequence ID + Metadata ID uniquely identify a particular event in a BeeGFS instance.
This field is included with every event message sent by Watch using gRPC. For the beegfs-event-listener this field is only included as part of the initial handshake.
Metadata Mirror - The ID of the metadata buddy group for this metadata node if applicable. This will either be “0” or nil (for events sent by Watch) if the node is not part of a buddy group. This field will always be set when the node is a member of a buddy group, even if the entry itself is not be mirrored (see the event flags to determine if the event is for a mirrored entry).
This field is included with every event message sent by Watch using gRPC. For the beegfs-event-listener this field is only included as part of the initial handshake.
Important: If the metadata node is added to a buddy group after enabling modification events, this field will not be populated until after the metadata node is restarted.
The actual event:
Event Flags - A set of bitwise flags stored as an integer. Use bitwise operations (
&
) to determine which flag(s) are currently set, (e.g.,if event.EventFlags&EVENTFLAG_MIRRORED != 0 { fmt.Println("the entry is mirrored") }
) (uint32).EVENTFLAG_NONE (0x00000000): No flags are set.
EVENTFLAG_MIRRORED (0x00000001): The event is for a mirrored entry.
EVENTFLAG_SECONDARY (0x00000002): The event was generated by the secondary node in a mirror (not currently used).
Link Count - Number of links to this entry (uint64).
Event Type - What type of event happened (uint32).
Path - Full path relative to the mount point of the file/directory (string).
Entry ID - The unique ID of the file/directory in BeeGFS, similar to an inode number in other file systems (string).
Parent Entry ID - The entry ID of the parent directory (string).
Target Path - Only used by select event types, refer to the documentation on each type for details (string).
Target Parent ID - Only used by select event types, refer to the documentation on each type for details (string).
Message User ID - The ID of the user that triggered the event (uint32)
Timestamp - Unix timestamp with nanosecond precision when the event was triggered. Note this may differ slightly from the atime/mtime (int64).
For most Event Types the target path and target entry ID fields are empty. The path, entry ID, parent entry ID, user ID, and time fields always contain information about the file/directory being worked on.
Link Count¶
Events also include the linkCount
for the entry, which is the number of links to the file or
directory. This link count is equivalent to what is found in the output of commands like ls
. For
regular files this is the number of hard links that point to the file’s inode, and will be one
unless additional hard links were created with a command like ln
, then it will be equal to the
number of links to the inode. For directories the link count works differently and the count
includes the directory itself, a link from each of its subdirectory (i.e., their ..
entry) and
the .
entry for the current directory.
Knowing the number of links to a directory is particularly important for files with multiple hard links, because only a single event is generated for whichever path (i.e., link) is used to update a particular entry. This could cause problems depending on what an application is doing with these events, and decisions on when a full scan of the file system is needed can be made based on the number of links to an entry (for example if a file has two or more links).
Applications that require the link counts to be 100% accurate should check if the link count is zero and call stat directly if so. This is because if an inode has multiple links and one of the links is renamed, if that link (i.e., dentry) is not on the same metadata node that owns the inode, the node triggering the rename event will need to retrieve the link count from the node that owns the inode. This means while the link count should normally be set correctly, there is the chance the node triggering the event cannot retrieve the link count due to a network/other error that happens after the successful rename. Because a rename does not require communication with the node that owns the inode, the operation is still considered to have completed successfully, and the link count in the event will be set to zero. Note while the rename could be rolled back, as a general principle the event logging functionality is designed to not interfere with the normal functionality of the file system, and this approach would violate that principle.
Event Types¶
The following event types exist:
Event |
Description |
---|---|
Flush |
File contents was flushed. File size might have changed. |
Truncate |
File was truncated. File size might have changed. |
SetAttr |
File attributes changed. |
CloseAfterWrite |
File was closed and possibly modified. |
Create |
New file was created. |
MKdir |
New directory was created. |
MKnod |
A block or character special file was created. |
RMdir |
Directory was removed. |
Unlink |
File removed. Note: Multiple paths can reference the same |
Symlink |
A symbolic link was created.
|
Hardlink |
A hardlink was created.
|
Rename |
A file or directory was renamed or moved.
|
OpenRead |
Create event log entries for open with O_RDONLY flag. |
OpenWrite |
Create event log entries for open with O_WRONLY flag. |
OpenReadWrite |
Create event log entries for open with O_RDWR flag. |
LastWriterClosed |
Triggered when no more clients have the entry open for writing. |
Entry IDs¶
BeeGFS uses an Entry ID to identify files and directories, similar to inodes on other UNIX/Linux file systems. An Entry ID is a string of the following form:
root|disposal|mdisposal|[0-9A-F]{1,8}-[0-9A-F]{1,8}-[0-9A-F]{1,8}
The three hex numbers can be represented as positive, non-zero integers. The special cases
root
, disposal
, mdisposal
do not appear for normal files and are for internal
bookkeeping only. They can be represented by the integer triple by including zeros, for example.