File System Check

This section describes the beegfs-fsck command and its options. It is used for two independent actions: it checks the file system for consistency and provides a repair tool, and it is used to enable quota support. The default options for the file system check vary between different versions of BeeGFS. It also provides the parameter --automatic, which would directly try to bring everything back in a consistent state. In a system with data loss on the metadata targets, this would result in a deletion of storage data. Therefore, we recommend not to enable --automatic, unless you are sure about it. A suggested procedure to check a BeeGFS file system efficiently will be explained within this page.

General Information

The primary use of beegfs-fsck is to check a file system for consistency and run to execute repair tools. Furthermore, it offers the possibility to enable quota support. For its main purpose, it creates a database of the current file system contents, which will be stored on the local machine.

A BeeGFS file system check gathers information from all available servers in parallel and stores the information in a database to check for errors and to validate consistency across the servers. A checkup of a file system of moderate size (with tens of millions of entries) usually takes less than one hour to complete. However, a checkup of a large file system (with hundreds of millions of entries) can take significantly longer to complete. Therefore, it is important to plan this procedure carefully.

Often, the user faces a trade-off between speed and safety when performing a large file system checkup. The fastest way would be taking the system down and performing a single execution of the command beegfs-fsck with the option --automatic, to repair all errors automatically. However, unexpected errors caused by unsuccessful maintenance procedures executed in the past and hardware failures could lead the command to try to fix inconsistencies in the wrong way.

Enabling Quota

The command beegfs-fsck --enablequota sets quota information. Further options can be displayed with beegfs-fsck --enablequota --help. For information about quota please look at Enabling quota for an existing BeeGFS installation.

BeeGFS File System Check

The command beegfs-fsck --checkfs runs a consistency check of the file system and repairs it using default actions. For this purpose, it creates a database of the current file system contents on the local machine. This command should only be run with all targets in the “Good” state, and never if there are targets in a “Bad” or “Needs-resync” state or while a resync is running. This must be verified before executing the file system check by running the following commands:

$ beegfs-ctl --listtargets --nodetype=meta --state
$ beegfs-ctl --listtargets --nodetype=storage --state

Suggested Procedure for a Large File System

The following steps help you to find a balanced approach for checking the file system that is more suitable for your case.

  • First of all, it is always a good idea to update to the latest available version to make sure that you are using the latest available fixes and performance improvements (e.g. the beegfs-fsck tool was redesigned to be much faster as of BeeGFS release 2015.03.r13). (This does not apply to releases which require an offline-upgrade. Before the offline upgrade is performed, any file system inconsistencies have to be repaired.)

  • For the execution of beegfs-fsck choose a client machine with multiple CPU cores, a fast network interface, and a fast storage device for the database, such as SSDs or even a RAM disk.

  • It is a good practice to clear disposal files (see Deletion of files in use) in order to avoid false positives related to the disposal entries.

  • Decide whether the file system will remain online or offline during the execution of beegfs-fsck. Checkups in offline mode are faster compared to online mode with potentially lots of concurrent accesses by users. However, in some situations, the system should not become unavailable for users, and it is preferable that the checkup takes longer to complete, so that the system can remain online. If the system is going to be online (i.e., users could possibly access the system while beegfs-fsck is running), it is very important to add the option --runOnline to the commands below for releases prior to 2015.03-r18. As of release 2015.03-r18, beegfs-fsck runs in online mode by default, and there is no reason to disable online mode, even if the system is not being accessed by users.

  • Execute a read-only checkup first, as seen in the example below. Provide a path to a directory located at a fast storage device to store the metadata database files. If the system is online and being accessed by users, a small overall performance decrease is expected, but this execution may reveal that the filesystem has no errors and make further steps unnecessary.

    client# beegfs-fsck --checkfs --readOnly --databasePath=/mnt/ssd/fsck
    
  • If errors were reported in the read-only execution, run the file system check a second time, and fix the errors based on the stored database. The option --nofetch makes sure that the existing database is used instead of gathering all information again from the servers.

    Warning

    This is only safe to do if the filesystem is not modified between the read-only and the --nofetch run. Never reuse a database that has already been used in a read-write run, as that can lead to data corruption.

    If you don’t mind pressing some keys to confirm fixing operations, do not use the option --automatic and have the chance of saying “no” if you want to.

    client# beegfs-fsck --checkfs --noFetch --databasePath=/mnt/ssd/fsck
    
  • If you have multiple BeeGFS instances, specify which one is to be checked by adding the option --cfgFile=<path> to the commands above, specifying the path to the client configuration file of the target BeeGFS instance.

  • The command beegfs-fsck does not require the BeeGFS client to be mounted during the checkup.

  • When the file system is online and being accessed by users during the checkup, it is possible that some uncritical false positives occur and beeegfs-fsck reports more file attributes errors than actually exist. Don’t worry. Simply let beegfs-fsck fix them all.

  • In rare cases, it is possible that BeeGFS clients need to refresh their caches to recognize the updated information of repaired file system entries. In order to do that, a simple restart of the client services will be enough.

    client# systemctl restart beegfs-client
    
  • Avoid running beegfs-fsck on machines where server daemons (beegfs-meta, beegfs-storage, beegfs-mgmtd, beegfs-mon) are running. The command may end up consuming a large amount of memory (by default, 50% of the physical RAM) that could disturb server daemons in crucial moments. In case you can’t avoid that (e.g.: in systems where multiple BeeGFS services share the same physical machine), inform a maximum amount of RAM in bytes for the file system check to use by adding the option --tundeDbFragmentSize, as seen in the example below.

    client# beegfs-fsck --checkfs --tuneDbFragmentSize=1073741824