Filesystem Modification Events

The modification event logging facility of BeeGFS uses the metadata servers to collect information about modified files and directories in the file system. These messages are forwarded to external applications using a UNIX socket. As an example of such a tool, we provide the beegfs-event-listener as part of the beegfs-utils package. It collects the event information from the metadata servers, and prints them as JSON formatted text to STDOUT.

When configured for modification event logging, each metadata server checks for a socket at the log target path specified in the configuration file and tries to deliver modification event packets there. Each metadata server only collects info about the files it manages itself, so one metadata-event-listener is needed per metadata server. In case of metadata mirroring, events are only emitted from the primary server. The secondary should also be equipped with an event listener, since it will become the primary in case of a fail-over.

The provided tool is just an example. There are many possibilities in developing your own tools, for example, adapters to backup systems. When developing your own software using BeeGFS modification event logging, you can use the beegfs_file_event_log.hpp provided as part of the beegfs-utils-devel package. It allows you to read the file modification event stream provided by the metadata server. As a reference, the source code of the beegfs-modification-event-listener is also provided as part of the package.

Configuration

Filesystem Modification Events need to be enabled on all clients and the metadata daemon(s) to work properly.

Client

The metadata server has to rely on the client to forward some information when doing some actions to be able to complete the modification event messages. The events of interest can be selected in the client configuration file:

sysFileEventLogMask = flush,trunc,setattr,close,link-op,read

For a complete coverage of all possible events, switch on everything, as shown above. If you only need a subset of event types, others can be removed from the list to reduce the performance overhead. But usually, this is not worthwhile since the overhead is very small.

Metadata server

To enable the event stream, specify the path for the UNIX socket to use in the configuration file of the metadata server. For example:

sysFileEventLogTarget = unix:/tmp/beegfslog

If this variable is set, the server will try to write to this socket every time a filesystem event occurs that is related to this metadata server.

The receiving application has to open the socket at that path. It is recommended to start the receiving application before the metadata server since undeliverable event messages will be discarded. In this case the dropped events counter included in each message is increased to inform the receiver.

To capture all events of the file system and to get the full picture, the event output has to be activated on all metadata servers, each with their own local UNIX socket and receiving application instance. The merging of the multiple streams is left as a task for the receiving application.

beegfs-event-listener

The beegfs-event-listener program is included in the beegfs-utils package. It opens a UNIX socket at the specified path and listens for incoming messages. For example

$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog

Every message is printed as one line of JSON formatted output.

Example:

$ mv /mnt/beegfs0/a /mnt/beegfs0/b

This will result in, for example:

{ "VersionMajor": 1, "VersionMinor": 0, "PacketSize": 77, "Dropped": 0, "Missed": 0, "Event": { "Type": "Rename", "Path": "\/a", "EntryId": "0-5A9EB0A7-1", "ParentEntryId": "root", "TargetPath": "\/b", "TargetParentId": "root" } }

The output can easily be parsed by scripts. For example this simple ruby program will print the event type and the file path for each event:

read-event-log.rb
 1#!/usr/bin/env ruby
 2
 3require "json"
 4
 5def printEvent(event)
 6   if event
 7      print "Event: #{event['Type']} #{event['Path']}\n"
 8   end
 9end
10
11while a = gets
12   json_data = JSON.parse(a)
13   printEvent(json_data['Event'])
14end

Use it like this:

$ /opt/beegfs/sbin/beegfs-event-listener /tmp/beegfslog | ./read-event-log.rb

Messages

Every event message consists of the following fields:

  • Major Version (uint 16)

  • Minor Version (uint 16)

  • Size of the whole message (uint 32)

  • Dropped Messages counter (uint 64)

  • Missed Events counter (uint 64)

  • The actual event:

    • Event Type (uint 32) (see below)

    • Path of the file/directory (string)

    • EntryID (string) (see below)

    • Parent EntryID (string)

    • Target Path (string)

    • Parent EntryID of the target path (string)

For details see /usr/include/beegfs/beegfs_file_event_log.hpp, and the example code at /usr/share/doc/beegfs-utils-devel/examples/beegfs-event-listener/, both included in the beegfs-utils-devel package,

Event Types

For most events the target path and target EntryID fields are empty. Path, EntryId, and ParentEntryId always contain information about the file/directory being worked on.

Path Full path relative to the mountpoint of the file/directory EntryId The EntryID (similar to an inode number of other systems) of the file/directory ParentEntryId The EntryID of the parent directory

The following event types exist:

Event

Description

Flush

File contents was flushed. File size might have changed.

Truncate

File was truncated. File size might have changed.

SetAttr

File attributes changed.

Close

File was closed and possibly modified.

Create

New file was created.

MKdir

New directory was created.

MKnod

A block or character special file was created.

RMdir

Directory was removed.

Unlink

File removed.

Note: Multiple paths can reference the same EntryID (File Content). Disk space is only freed if the last link has been removed.

Symlink

A symbolic link was created.

Path, EntryId, and ParentEntryId: of the newly created link. TargetPath and TargetParentEntryID: of the referenced file/directory. Note: since this is a symbolic link, the target may contain relative and/or non-existing paths.

Hardlink

A hardlink was created.

Path: Path relative to the mount point of the new link EntryId: The EntryID of the referenced file ParentEntryId: The EntryID of the parent directory, containing the new link. Since hardlinks are only supported within the same directory, this is identical to the parent directory of the source. TargetPath, TargetEntryId: of the link target.

Rename

A file or directory was renamed or moved.

Path: Thing being moved EntryID: Its entryID ParentEntryID: Its parentEntryID TargetPath, TargetParentId: The path/name moved to and the EntryID of the new parent directory.

Read

Create event log entries for open with O_RDONLY flag for the purpose of file access auditing.

Each message contains a dropped and a missed counter. The dropped counter is incremented for each message that could not be delivered. The missed counter counts events that can refer to multiple paths at the same time, e.g. hardlinks. Decisions on when a full scan of the file system is needed can be made based on the value of these counters.

EntryIDs

BeeGFS uses EntryId to identify files and directories, similar to inodes on normal UNIX file systems. An EntryID is a string of the following form:

root|disposal|mdisposal|[0-9A-F]{1,8}-[0-9A-F]{1,8}-[0-9A-F]{1,8}

The three hex numbers can be represented as positive, non-zero integers. The special cases root, disposal, mdisposal do not appear for normal files and are for internal bookkeeping only. They can be represented by the integer triple by including zeros, for example.