Data Management API

Overview

The BeeGFS Data Management API provides integration with external data management solutions that need to monitor and react to file system activity and/or control the state of file system entries. A common use case is hierarchical storage management solutions that automatically move data between high-cost and low-cost storage media. For example, BeeGFS using SSDs and external tape storage.

There are three main components:

  • A system for applications to subscribe to a stream of Filesystem Modification Events from each BeeGFS metadata service using gRPC.

  • File access flags that block different types of file access. These access checks can only be bypassed by BeeGFS clients configured with the sysBypassFileAccessCheckOnMeta parameter.

    • These access flags and the resulting locking behavior they impose are unique to BeeGFS and distinct from Linux permissions or advisory file locks via flock. They are intended to provide stricter more consistent locking guarantees than otherwise possible in POSIX.

  • File data states, each represented as a single numeric value (0-7) per file. The state has no effect on BeeGFS behavior but allows HSM applications to record metadata about the state of a file (e.g., present, offloaded). This value is displayed in BeeGFS tools like beegfs entry info, or used to affect HSM behavior. For example, marking files that should not be restored automatically.

Access flags and data states are part of a file’s persistent metadata and visible/enforced on all clients once set. They are also preserved across metadata operations like rename, the same as other file metadata like user or group IDs. Currently file access flags and data states can be set, modified, and inspected using the BeeGFS CTL command-line tool and Golang library. See below for more details.

Example Use Case

An HSM solution can subscribe to BeeGFS File System Modification events to receive updates about what files are being created, accessed, modified, renamed, and removed from BeeGFS. It can use these events to keep an out-of-tree index updated then apply policies to move file contents between storage tiers based on user criteria such as last file access or modification timestamps.

Internal BeeGFS file access flags can be set to take a read and/or write lock on a file’s contents and prevent regular client access while a sync is in progress. If the HSM chooses to leave behind an empty “stub file”, files can be left locked to indefinitely block client access. An OPEN_BLOCKED event is triggered whenever a client attempts to access a locked file which can be used by HSM software to trigger a restore of the file’s contents.

File Access Flags and Data States

Setting Access Flags

One or both of the following access flags can be set on a file to block access from regular clients:

  • Read Lock: Block open(2) with O_RDONLY or O_RDWR.

  • Write Lock: Block open(2) with O_WRONLY or O_RDWR (with or without O_TRUNC).

The O_APPEND flag can be combined with other flags but on its own does not affect if read/write access is allowed through that file descriptor, thus is ignored for the access checks.

When an open(2) is blocked EWOULDBLOCK (i.e., resource temporarily unavailable) is returned. If file system modification events are enabled, an OPEN_BLOCKED event is triggered for the file.

Note

This deviates slightly from POSIX, technically open(2) without O_NONBLOCK should block.

Other metadata operations such as unlink, rename, hardlink, setattr, getxattr, setxattr are not affected by the read/write locks. Additional locks may be added in the future for these operations.

The metadata service enforces a few rules when transitioning between access flags:

  • Acquiring stricter locks (adding flags): not allowed if conflicting read/write sessions exist.

    • For example, if any client has a file open with the O_RDWR flag, attempting to take a write lock would fail with the error IN_USE.

  • Relaxing locks (removing flags): not allowed if dependent read/write sessions exist.

    • Relaxing a lock is only allowed when no conflicting open file descriptors exist. This is intended to protect against races/conflicts where there is an open file descriptor relying on the lock for protected access. This protection only applies to file descriptors opened with access modes that match those protected by the lock. For example, a write lock may be removed if the file is only opened read-only, but not if there are any open descriptors with write access that may be relying on the lock for protected access.

Access flags can be updated/inspected using BeeGFS CTL using a hidden CLI command or the Golang Library:

$ beegfs entry set --access-flags=<unlocked | read-lock | write-lock | read-write-lock | none> /mnt/beegfs/myfile

For example to set the write access flag and verify it is set:

$ beegfs entry set --access-flags=write /mnt/beegfs/myfile

$ beegfs entry info --columns=path,type,access /mnt/beegfs/myfile
PATH     TYPE  ACCESS
/myfile  file  Locked (write)

All access flags can be cleared by specifying none:

$ beegfs entry set --access-flags=none /mnt/beegfs/myfile

$ beegfs entry info --columns=path,type,access /mnt/beegfs/myfile
PATH     TYPE  ACCESS
/myfile  file  Unlocked

Bypassing Access Flags

A BeeGFS client used by HSM software can specify sysBypassFileAccessCheckOnMeta=true in their beegfs-client.conf file to bypass all access checks and manage the contents of locked and unlocked files. It is the responsibility of the HSM to ensure the appropriate file access locks are taken and released if exclusive access to a file is required.

Warning

Be careful these special clients are not inadvertently used by other users or applications.

Setting Data States

Data states can be updated/inspected using BeeGFS CTL (hidden CLI command or the Go library):

$ beegfs entry set --data-state=1 /mnt/beegfs/myfile

$ beegfs entry info --columns=path,type,state /mnt/beegfs/myfile
PATH     TYPE  STATE
/myfile  file  1