Mirroring

Warning

Mirroring is not a replacement for backups. If files are accidentally deleted or overwritten by a user or process, mirroring won’t help you to bring the old file back. So you are still responsible to do regular backups of your important bits.

This article is about using mirroring, for a general explanation see Mirroring

By default, mirroring is disabled for a new file system instance. Both metadata and storage mirroring can be enabled with the beegfs command line tool. The beegfs tool is provided by the beegfs-tools package and is usually run from a client node.

Before metadata or storage mirroring can be enabled, buddy groups need to be defined, as these are the basis for mirroring.

Management of Mirror Buddy Groups

When defining buddy groups administrators should take into consideration:

  • Are the targets equal (or nearly equal) in size?

  • Are the targets on different servers or in different racks/failure domains for redundancy?

Defining Buddy Groups

Warning

When creating a buddy group that contains metadata node that owns the root directory, you must assign that target as the primary so mirroring can be enabled on the root directory. It is also recommended to then immediately follow the steps to activate mirroring for the root inode. Otherwise if there is a switchover before mirroring the root inode you may run into problems enabling mirroring.

Buddy groups are created using the beegfs mirror create command. To define a buddy mirror first decide which targets will be the primary and secondary in the group, and what alias you want to use for the group. You can also optionally specify a numerical ID or allow this to be automatically assigned. These numerical IDs only need to be unique amongst the other buddy groups of a particular type (i.e., you could have a metadata target 1 and a metadata buddy group 1) however it is recommended they are unique to simplify troubleshooting.

For example to create a metadata buddy group with ID 1 where the primary target is meta:1 and the secondary target is meta:2 with the alias “m1m2” you would run:

beegfs mirror create --node-type=meta --num-id=1 --primary=meta:1 --secondary=meta:2 m1m2

Note

Note in previous version of BeeGFS it was possible to automatically define buddy groups. In practice this often led to mishaps where buddy groups were not defined optimally, because the system couldn’t take take topology constraints into consideration, for example when using multi-mode if two BeeGFS services were on the same physical server.

List defined Mirror Buddy Groups

Mirror buddy groups can be listed by running:

$ beegfs mirror list
$ beegfs mirror list --node-type=meta
$ beegfs mirror list --node-type=storage

Define Stripe Pattern

After defining storage buddy mirror groups in your system, you have to define a data stripe pattern that uses it: Striping.

Caveats of Storage Mirroring

Storage buddy mirroring provides protection against many failure modes of a distributed system, such as drives failing, servers failing, networks being unstable or failing, and a number of other modes. It does not provide perfect protection if a system is degraded, mostly only for the degraded part of the system. If any storage buddy group is in a degraded state, another failure may cause data loss. Administrative actions can also cause data loss or corruption if the system is in an unstable or degraded state. These actions should be avoided if at all possible, for example, by ensuring that no access to the system is possible while the actions are performed.

Setting states of active storage targets

When manually changing the state of a storage target from GOOD to NEEDS_RESYNC, clients accessing files during a period of propagation “see” different versions of the global state. This influences data and file locks. By default, propagation happens every 30 seconds, so the period will not take longer than a minute. This may happen because the state is not synchronously propagated to all clients, which makes the following sequence of events possible:

  1. An administrator sets the state of an active storage target which is the secondary of a buddy group to NEEDS_RESYNC with beegfs mirror resync start <buddy-group>.

  2. The state is propagated to the primary of the buddy group. The primary will no longer forward written data to the secondary.

  3. A client writes data to a file residing on the buddy group. The data is not forwarded to the secondary.

  4. A different client reads data from the file. If the client attempts to read from the primary, no data loss occurs. If the client attempts to read from the secondary, which is possible without problems in a stable system, the client will receive stale data.

If the two clients in this example used the file system to communicate, e.g., by calling flock for the file they share, the second client would not see the expected data. Accesses to the file will only stop considering the secondary as a source once all clients have received the updated state information, which may take up to 30 seconds.

Setting the state of a primary storage target may exhibit the same effects. Setting states for targets that are currently GOOD, and by that triggering a switchover, must be avoided while clients are still able to access data on the target. Propagation of the switchover takes some time during which clients may attempt to access data on the target that was set to non-GOOD. If the access was a write, that write may be lost.

Fsync may fail without setting targets to NEEDS_RESYNC

When fsync is configured to propagate to the storage servers and trigger an fsync on the storage servers, an error during fsync may leave the system in an unpredictable state if the error occurred on the secondary of a buddy group. If the fsync operation failed on the secondary due to a disk error, the error may be detected only during the next operation of the secondary. If a failover happens before the error is detected the automatic resync from the new primary (old secondary, which has failed) to the new secondary (old primary) may cause data loss.

Activating Metadata Mirroring

After defining metadata mirror buddy groups, you have to activate metadata mirroring: Metadata Mirroring.

Enable Mirroring

Storage mirroring can be enabled on a per-directory basis, so that some data in the file system can be mirrored while other data might not be mirrored. On the metadata side, it is also possible to activate or deactivate mirroring per directory, but certain logical restrictions apply. For example, for a directory to be mirrored effectively, the whole path to it must also be mirrored.

Mirroring settings of a directory will be applied to new file entries and will be inherited by new subdirectories. For instance, if metadata mirroring is enabled for directory /mnt/beegfs/mydir1, then a new subdirectory /mnt/beegfs/mydir1/mydir2 will also automatically have metadata mirroring enabled.

After metadata mirroring is enabled for a file system using the beegfs mirror init command, the metadata of the root directory will be mirrored by default. Therefore, newly created directories under the root will also have metadata mirroring enabled. It is possible to exclude new folders from metadata mirroring by creating them using beegfs entry create directory --no-mirror <name>. For more information about metadata mirroring, please see Metadata Mirroring.

To enable file contents mirroring for a certain existing directory, see the built-in help of the beegfs tool (remember to define buddy groups first):

$ beegfs entry set --pattern=mirrored --help

File contents mirroring could be subsequently disabled using beegfs entry set --pattern=raid0. Files that were already created in the directory when mirroring was enabled will remain mirrored.

To check the metadata and file contents mirroring settings run:

$ beegfs entry info /mnt/beegfs/mydir/myfile

To check the state of all targets (metadata and storage) run:

$ beegfs target list --state

Restoring Metadata and Storage Target Data after Failures

If a storage target or metadata server is not reachable, it will be marked as offline and won’t get data updates. Usually, when the target or server re-registers, it will automatically be synchronized from the remaining mirror in the buddy group (self-healing). However, in some cases, it might be necessary that you manually start a synchronization process. For more information on how to do that and on how to monitor synchronization, please see Resynchronization of mirrored targets.