Resynchronization of mirrored targets

Automatic Resynchronization

In general, if a secondary target or server is considered to be out-of-sync, it is automatically set to the consistency state needs resync (see Target States for more information on states) by the management daemon. The storage target resynchronization process for self-healing is coordinated by the primary target. The standard process tries to avoid unnecessary transfer of files. Therefore the primary target saves the time of the last successful communication with the secondary target. Only files which where modified after this timestamp will be resynchronized by default. To avoid losing cached data, a short safety threshold timespan will be added (defined by sysResyncSafetyThresholdMins in beegfs-storage.conf). Since metadata are much smaller than storage contents, there is no timestamp-based mechanism in place, and instead the full mirrored metadata of the metadata server will be sent to its buddy during the resynchronization process.

Manual Resynchronization

In some cases it might be useful or even necessary to manually trigger resynchronization of a storage target or metadata server. One case, for example, is a storage system on the secondary target that is damaged beyond repair. In this scenario all data of that target might be lost and a new target needs to be brought up with the old target ID. The automatic resync won’t be sufficient then, because it would only consider files after the last successful communication of the targets. Another case for a manual resync override is when a file system check of the underlying local file system (e.g. xfs_repair) has removed old files.

The beegfs-ctl tool can be used to manually set a storage target or metadata server to the needs resync state. Please note that this does not trigger a resync immediately, but does only inform the management daemon about the new state. The resync process then will be started by the primary of that buddy group a few moments later.

As said before, the primary target saves the time of the last successful communication with the secondary target. Without additional parameters, this timestamp will be used to shorten resynchronization times as much as possible. But it is also possible to override this timestamp to resynchronize a longer timespan or to resynchronize everything in the case described previously.

Please use beegfs-ctl --startresync --help for more information on available parameters.

If a resynchronization is already running and you want to abort it and start anew, you can do so by passing the --restart parameter to beegfs-ctl. If you don’t, the current process keeps running and your request will be ignored. This is particularly useful if the system started an automatic resynchronization after a secondary target became reachable again, but you know that the timestamp-based approach is not sufficient. For example, this might be the case if your complete underlying filesystem broke before the secondary target was started, i.e. the target is completely empty and needs a full synchronization. Note that restarting a running resync is only possible for storage targets because metadata servers never do a partial resynchronization.

The following command could be used to stop the automatic resynchronization and start a full resynchronization instead:

$ beegfs-ctl --startresync --nodetype=storage --targetid=X --timestamp=1 --restart

Display Resynchronization Information

The beegfs-ctl command line tool can be used to display information on an ongoing resynchronization process by using the mode --resyncstats.

Please use beegfs-ctl --resyncstats --help for more information on available parameters.