File System Check¶
This section describes the beegfs-fsck
command and its options. It is used for two independent
actions: it checks the file system for consistency and provides a repair tool, and it is used to
enable quota support. The default options for the file system check vary between different versions
of BeeGFS. It also provides the parameter --automatic
, which would directly try to bring
everything back in a consistent state. In a system with data loss on the metadata targets, this
would result in a deletion of storage data. Therefore, we recommend not to enable --automatic
,
unless you are sure about it. A suggested procedure to check a BeeGFS file system efficiently
will be explained within this page.
General Information¶
The primary use of beegfs-fsck
is to check a file system for consistency and run to execute
repair tools. Furthermore, it offers the possibility to enable quota support. For its main purpose, it
creates a database of the current file system contents, which will be stored on the local machine.
A BeeGFS file system check gathers information from all available servers in parallel and stores the information in a database to check for errors and to validate consistency across the servers. A checkup of a file system of moderate size (with tens of millions of entries) usually takes less than one hour to complete. However, a checkup of a large file system (with hundreds of millions of entries) can take significantly longer to complete. Therefore, it is important to plan this procedure carefully.
Often, the user faces a trade-off between speed and safety when performing a large file system
checkup. The fastest way would be taking the system down and performing a single execution of the
command beegfs-fsck
with the option --automatic
, to repair all errors automatically.
However, unexpected errors caused by unsuccessful maintenance procedures executed in the past and
hardware failures could lead the command to try to fix inconsistencies in the wrong way.
Enabling Quota¶
The command beegfs-fsck --enablequota
sets quota information. Further options can be
displayed with beegfs-fsck --enablequota --help
. For information about quota please look at
Enabling quota for an existing BeeGFS installation.
BeeGFS File System Check¶
The command beegfs-fsck --checkfs
runs a consistency check of the file system and repairs it
using default actions. For this purpose, it creates a database of the current file system contents
on the local machine. This command should only be run with all targets in the “Good” state, and
never if there are targets in a “Bad” or “Needs-resync” state or while a resync is running. This
must be verified before executing the file system check by running the following commands:
$ beegfs-ctl --listtargets --nodetype=meta --state
$ beegfs-ctl --listtargets --nodetype=storage --state
Suggested Procedure for a Large File System¶
The following steps help you to find a balanced approach for checking the file system that is more suitable for your case.
First of all, it is always a good idea to update to the latest available version to make sure that you are using the latest available fixes and performance improvements (e.g. the
beegfs-fsck
tool was redesigned to be much faster as of BeeGFS release 2015.03.r13). (This does not apply to releases which require an offline-upgrade. Before the offline upgrade is performed, any file system inconsistencies have to be repaired.)For the execution of
beegfs-fsck
choose a client machine with multiple CPU cores, a fast network interface, and a fast storage device for the database, such as SSDs or even a RAM disk.It is a good practice to clear disposal files (see Deletion of files in use) in order to avoid false positives related to the disposal entries.
Decide whether the file system will remain online or offline during the execution of
beegfs-fsck
. Checkups in offline mode are faster compared to online mode with potentially lots of concurrent accesses by users. However, in some situations, the system should not become unavailable for users, and it is preferable that the checkup takes longer to complete, so that the system can remain online. If the system is going to be online (i.e., users could possibly access the system whilebeegfs-fsck
is running), it is very important to add the option--runOnline
to the commands below for releases prior to 2015.03-r18. As of release 2015.03-r18,beegfs-fsck
runs in online mode by default, and there is no reason to disable online mode, even if the system is not being accessed by users.Execute a read-only checkup first, as seen in the example below. Provide a path to a directory located at a fast storage device to store the metadata database files. If the system is online and being accessed by users, a small overall performance decrease is expected, but this execution may reveal that the filesystem has no errors and make further steps unnecessary.
client# beegfs-fsck --checkfs --readOnly --databasePath=/mnt/ssd/fsck
If errors were reported in the read-only execution, run the file system check a second time, and fix the errors based on the stored database. The option
--nofetch
makes sure that the existing database is used instead of gathering all information again from the servers.Warning
This is only safe to do if the filesystem is not modified between the read-only and the
--nofetch
run. Never reuse a database that has already been used in a read-write run, as that can lead to data corruption.If you don’t mind pressing some keys to confirm fixing operations, do not use the option
--automatic
and have the chance of saying “no” if you want to.client# beegfs-fsck --checkfs --noFetch --databasePath=/mnt/ssd/fsck
If you have multiple BeeGFS instances, specify which one is to be checked by adding the option
--cfgFile=<path>
to the commands above, specifying the path to the client configuration file of the target BeeGFS instance.The command
beegfs-fsck
does not require the BeeGFS client to be mounted during the checkup.When the file system is online and being accessed by users during the checkup, it is possible that some uncritical false positives occur and
beeegfs-fsck
reports more file attributes errors than actually exist. Don’t worry. Simply letbeegfs-fsck
fix them all.In rare cases, it is possible that BeeGFS clients need to refresh their caches to recognize the updated information of repaired file system entries. In order to do that, a simple restart of the client services will be enough.
client# systemctl restart beegfs-client
Avoid running
beegfs-fsck
on machines where server daemons (beegfs-meta
,beegfs-storage
,beegfs-mgmtd
,beegfs-mon
) are running. The command may end up consuming a large amount of memory (by default, 50% of the physical RAM) that could disturb server daemons in crucial moments. In case you can’t avoid that (e.g.: in systems where multiple BeeGFS services share the same physical machine), inform a maximum amount of RAM in bytes for the file system check to use by adding the option--tundeDbFragmentSize
, as seen in the example below.client# beegfs-fsck --checkfs --tuneDbFragmentSize=1073741824