File System Check¶
This section describes the
beegfs-fsck command and its options. It is used for two independent
actions: it checks the file system for consistency and provides a repair tool, and it is used to
enable quota support. The default options for the file system check vary between different versions
of BeeGFS. It also provides the parameter
--automatic, which would directly try to bring
everything back in a consistent state. In a system with data loss on the metadata targets, this
would result in a deletion of storage data. Therefore, we recommend not to enable
unless you are sure about it. A suggested procedure to check a BeeGFS file system efficiently
will be explained within this page.
The primary use of
beegfs-fsck is to check a file system for consistency and run to execute
repair tools. Furthermore, it offers the possibility to enable quota support. For its main purpose, it
creates a database of the current file system contents, which will be stored on the local machine.
A BeeGFS file system check gathers information from all available servers in parallel and stores the information in a database to check for errors and to validate consistency across the servers. A checkup of a file system of moderate size (with tens of millions of entries) usually takes less than one hour to complete. However, a checkup of a large file system (with hundreds of millions of entries) can take significantly longer to complete. Therefore, it is important to plan this procedure carefully.
Often, the user faces a trade-off between speed and safety when performing a large file system
checkup. The fastest way would be taking the system down and performing a single execution of the
beegfs-fsck with the option
--automatic, to repair all errors automatically.
However, unexpected errors caused by unsuccessful maintenance procedures executed in the past and
hardware failures could lead the command to try to fix inconsistencies in the wrong way.
beegfs-fsck --enablequota sets quota information. Further options can be
beegfs-fsck --enablequota --help. For information about quota please look at
Enabling quota for an existing BeeGFS installation.
BeeGFS File System Check¶
beegfs-fsck --checkfs runs a consistency check of the file system and repairs it
using default actions. For this purpose, it creates a database of the current file system contents
on the local machine. This command should only be run with all targets in the “Good” state, and
never if there are targets in a “Bad” or “Needs-resync” state or while a resync is running. This
must be verified before executing the file system check by running the following commands:
$ beegfs-ctl --listtargets --nodetype=meta --state $ beegfs-ctl --listtargets --nodetype=storage --state
Suggested Procedure for a Large File System¶
The following steps help you to find a balanced approach for checking the file system that is more suitable for your case.
First of all, it is always a good idea to update to the latest available version to make sure that you are using the latest available fixes and performance improvements (e.g. the
beegfs-fscktool was redesigned to be much faster as of BeeGFS release 2015.03.r13). (This does not apply to releases which require an offline-upgrade. Before the offline upgrade is performed, any file system inconsistencies have to be repaired.)
For the execution of
beegfs-fsckchoose a client machine with multiple CPU cores, a fast network interface, and a fast storage device for the database, such as SSDs or even a RAM disk.
It is a good practice to clear disposal files (see Deletion of files in use) in order to avoid false positives related to the disposal entries.
Decide whether the file system will remain online or offline during the execution of
beegfs-fsck. Checkups in offline mode are faster compared to online mode with potentially lots of concurrent accesses by users. However, in some situations, the system should not become unavailable for users, and it is preferable that the checkup takes longer to complete, so that the system can remain online. If the system is going to be online (i.e., users could possibly access the system while
beegfs-fsckis running), it is very important to add the option
--runOnlineto the commands below for releases prior to 2015.03-r18. As of release 2015.03-r18,
beegfs-fsckruns in online mode by default, and there is no reason to disable online mode, even if the system is not being accessed by users.
Execute a read-only checkup first, as seen in the example below. Provide a path to a directory located at a fast storage device to store the metadata database files. If the system is online and being accessed by users, a small overall performance decrease is expected, but this execution may reveal that the filesystem has no errors and make further steps unnecessary.
client# beegfs-fsck --checkfs --readOnly --databasePath=/mnt/ssd/fsck
If errors were reported in the read-only execution, run the file system check a second time, and fix the errors based on the stored database. The option
--nofetchmakes sure that the existing database is used instead of gathering all information again from the servers.
This is only safe to do if the filesystem is not modified between the read-only and the
--nofetchrun. Never reuse a database that has already been used in a read-write run, as that can lead to data corruption.
If you don’t mind pressing some keys to confirm fixing operations, do not use the option
--automaticand have the chance of saying “no” if you want to.
client# beegfs-fsck --checkfs --noFetch --databasePath=/mnt/ssd/fsck
If you have multiple BeeGFS instances, specify which one is to be checked by adding the option
--cfgFile=<path>to the commands above, specifying the path to the client configuration file of the target BeeGFS instance.
beegfs-fsckdoes not require the BeeGFS client to be mounted during the checkup.
When the file system is online and being accessed by users during the checkup, it is possible that some uncritical false positives occur and
beeegfs-fsckreports more file attributes errors than actually exist. Don’t worry. Simply let
beegfs-fsckfix them all.
In rare cases, it is possible that BeeGFS clients need to refresh their caches to recognize the updated information of repaired file system entries. In order to do that, a simple restart of the client services will be enough.
client# systemctl restart beegfs-client
beegfs-fsckon machines where server daemons (
beegfs-mon) are running. The command may end up consuming a large amount of memory (by default, 50% of the physical RAM) that could disturb server daemons in crucial moments. In case you can’t avoid that (e.g.: in systems where multiple BeeGFS services share the same physical machine), inform a maximum amount of RAM in bytes for the file system check to use by adding the option
--tundeDbFragmentSize, as seen in the example below.
client# beegfs-fsck --checkfs --tuneDbFragmentSize=1073741824