Mirroring

Warning

Mirroring is not a replacement for backups. If files are accidentally deleted or overwritten by a user or process, mirroring won’t help you to bring the old file back. So you are still responsible to do regular backups of your important bits.

This article is about using mirroring, for a general explanation see Mirroring

By default, mirroring is disabled for a new file system instance. Both types of mirroring can be enabled with the beegfs-ctl command line tool. (The beegfs-ctl tool is contained in the beegfs-utils package and is usually run from a client node.)

Before metadata or storage mirroring can be enabled, buddy groups need to be defined, as these are the basis for mirroring.

Management of Mirror Buddy Groups

Mirror buddy groups are numeric IDs, just like the numeric IDs of the storage targets. Please note that buddy group IDs don’t conflict with target IDs, i.e., they don’t need to be distinct from storage target IDs.

There are basically two different ways to define buddy groups. They can be defined manually, or you can tell BeeGFS to create them automatically.

Of course, defining groups manually gives you greater control and allows you to create a more detailed configuration. For example, the automatic mode won’t consider targets that are not equally sized. It also doesn’t know about the topology of your system, so if you, for example, want to make sure that members of buddy groups are placed in different physical locations you have to define them manually.

Define Buddy Groups automatically

Automatic creation of buddy groups can be done with beegfs-ctl, separately for metadata and for storage servers:

$ beegfs-ctl --addmirrorgroup --automatic --nodetype=meta
$ beegfs-ctl --addmirrorgroup --automatic --nodetype=storage

Please see the builtin help of beegfs-ctl for more information on available parameters:

$ beegfs-ctl --addmirrorgroup --help

Define Buddy Groups manually

Manual definition of mirror buddy groups can be useful if you want to set custom group IDs or if you want to make sure that the buddies are in different failure domains (e.g., different racks). Manual definition of mirror buddy groups is done with the beegfs-ctl tool. By using the following command, you can create a buddy group with the ID 100, consisting of targets 1 and 2:

$ beegfs-ctl --addmirrorgroup --nodetype=storage --primary=1 --secondary=2 --groupid=100

Please see the builtin help of beegfs-ctl for more information on available parameters: beegfs-ctl --addmirrorgroup --help

When creating mirror buddy groups for metadata manually, and one of them contains the root directory, it is necessary to set this one as primary.

List defined Mirror Buddy Groups

Configured mirror buddy groups can be listed with beegfs-ctl (don’t forget to specify the node type):

$ beegfs-ctl --listmirrorgroups --nodetype=storage
$ beegfs-ctl --listmirrorgroups --nodetype=meta

It’s also possible to list mirror buddy groups alongside other target information:

$ beegfs-ctl --listtargets --mirrorgroups

Please see the builtin help of beegfs-ctl for more information on available parameters:

beegfs-ctl --listtargets --help

Define Stripe Pattern

After defining storage buddy mirror groups in your system, you have to define a data stripe pattern that uses it: Striping.

Caveats of Storage Mirroring

Storage buddy mirroring provides protection against many failure modes of a distributed system, such as drives failing, servers failing, networks being unstable or failing, and a number of other modes. It does not provide perfect protection if a system is degraded, mostly only for the degraded part of the system. If any storage buddy group is in a degraded state, another failure may cause data loss. Administrative actions can also cause data loss or corruption if the system is in an unstable or degraded state. These actions should be avoided if at all possible, for example, by ensuring that no access to the system is possible while the actions are performed.

Setting states of active storage targets

When manually changing the state of a storage target from GOOD to NEEDS_RESYNC, clients accessing files during a period of propagation “see” different versions of the global state. This influences data and file locks. By default, propagation happens every 30 seconds, so the period will not take longer than a minute. This may happen because the state is not synchronously propagated to all clients, which makes the following sequence of events possible:

  1. An administrator sets the state of an active storage target which is the secondary of a buddy group to NEEDS_RESYNC with beegfs-ctl --startresync.

  2. The state is propagated to the primary of the buddy group. The primary will no longer forward written data to the secondary.

  3. A client writes data to a file residing on the buddy group. The data is not forwarded to the secondary.

  4. A different client reads data from the file. If the client attempts to read from the primary, no data loss occurs. If the client attempts to read from the secondary, which is possible without problems in a stable system, the client will receive stale data.

If the two clients in this example used the file system to communicate, e.g., by calling flock for the file they share, the second client would not see the expected data. Accesses to the file will only stop considering the secondary as a source once all clients have received the updated state information, which may take up to 30 seconds.

Setting the state of a primary storage target may exhibit the same effects. Setting states for targets that are currently GOOD, and by that triggering a switchover, must be avoided while clients are still able to access data on the target. Propagation of the switchover takes some time during which clients may attempt to access data on the target that was set to non-GOOD. If the access was a write, that write may be lost.

Fsync may fail without setting targets to NEEDS_RESYNC

When fsync is configured to propagate to the storage servers and trigger an fsync on the storage servers, an error during fsync may leave the system in an unpredictable state if the error occurred on the secondary of a buddy group. If the fsync operation failed on the secondary due to a disk error, the error may be detected only during the next operation of the secondary. If a failover happens before the error is detected the automatic resync from the new primary (old secondary, which has failed) to the new secondary (old primary) may cause data loss.

Activating Metadata Mirroring

After defining metadata mirror buddy groups, you have to activate metadata mirroring: Metadata Mirroring.

Enable Mirroring

Storage mirroring can be enabled on a per-directory basis, so that some data in the file system can be mirrored while other data might not be mirrored. On the metadata side, it is also possible to activate or deactivate mirroring per directory, but certain logical restrictions apply. For example, for a directory to be mirrored effectively, the whole path to it must also be mirrored.

Mirroring settings of a directory will be applied to new file entries and will be derived by new subdirectories. For instance, if metadata mirroring is enabled for directory /mnt/beegfs/mydir1, then a new subdirectory /mnt/beegfs/mydir1/mydir2 will also automatically have metadata mirroring enabled.

After metadata mirroring is enabled for a file system using the beegfs-ctl --mirrormd command, the metadata of the root directory will be mirrored by default. Therefore, newly created directories under the root will also have metadata mirroring enabled. It is possible to exclude new folders from metadata mirroring by creating them using beegfs-ctl --createdir --nomirror. For more information about metadata mirroring, please see Metadata Mirroring.

To enable file contents mirroring for a certain directory, see the built-in help of the beegfs-ctl tool. (Remember to define buddy groups first.)

$ beegfs-ctl --setpattern --buddymirror --help

File contents mirroring can be disabled afterwards by using beegfs-ctl --setpattern without the --buddymirror option. However, files that were already created while mirroring was enabled will remain mirrored.

To check the metadata and file contents mirroring settings of a certain directory or file, use:

$ beegfs-ctl --getentryinfo /mnt/beegfs/mydir/myfile

To check target states of storage targets, use:

$ beegfs-ctl --listtargets --nodetype=storage --state

Restoring Metadata and Storage Target Data after Failures

If a storage target or metadata server is not reachable, it will be marked as offline and won’t get data updates. Usually, when the target or server re-registers, it will automatically be synchronized from the remaining mirror in the buddy group (self-healing). However, in some cases, it might be necessary that you manually start a synchronization process. For more information on how to do that and on how to monitor synchronization, please see Resynchronization of mirrored targets.