General Questions

‘Access denied’ error on the client, even with correct permissions

Please check if you have SELinux enabled on the client machine. If it is enabled, disabling it should solve your problem. SELinux can be disabled by setting SELINUX=disabled in the configuration file /etc/selinux/config. Afterwards, you might need to reboot your client for the new setting to become effective.

Client refuses to mount because of an ‘unknown storage target’

Scenario

While testing BeeGFS, you removed the storage directory of a storage server, but kept the storage directory of the management server. Now the BeeGFS client refuses to mount and prints an error about an unknown storage target to the log file.

What happened to your file system

When you start a new beegfs-storage daemon with a given storage directory, the daemon initializes this directory by assigning an ID to this storage target path and registering this target ID at the management server. When you delete this directory, the storage server creates a new directory on next startup with a new ID and also registers this ID at the management server. (Because the storage server cannot know what happened to the old directory and whether you might have just moved the data to another machine, so it needs a new ID here.)

When the client starts, it performs a sanity check by querying all registered target IDs from the management server and checks whether all of them are accessible. If you removed a storage directory, this check fails and thus the client refuses to mount. (Note: This sanity check can be disabled, but it is definitely a good thing in this case and saves you from more trouble.)

Now you have two alternative options:

Solution A

Simply remove the storage directories of all BeeGFS services to start with a clean new file system:

  1. Stop all the BeeGFS server daemons, i.e. beegfs-mgmtd, beegfs-meta, beegfs-storage:

    # systemctl stop beegfs\*
    
  2. Delete (rm -rf) all their storage directories. The paths to the server storage directories can be looked up in the server configuration files:

    • storeMgmtdDirectory in configuration file /etc/beegfs/beegfs-mgmtd.conf

    • storeMetaDirectory in configuration file /etc/beegfs/beegfs-meta.conf

    • storeStorageDirectory in configuration file /etc/beegfs/beegfs-storage.conf

  3. Restart the daemons.

    # systemctl start beegfs\*
    

Now you have a fresh new file system without any of the previously registered target IDs.

Solution B

Unregister the invalid target ID from the management server. For this, you would first use the beegfs-ctl tool (the tool is part of the beegfs-utils package on a client) to list the registered target IDs:

$ beegfs-ctl --listtargets --longnodes

Then check the contents of the file targetNumID in your storage directory on the storage server to find out which target ID is the current one that you want to keep. For all other target IDs from the list, which are assigned to this storage server but are no longer valid, use this command to unregister them from the management daemon:

$ beegfs-ctl --unmaptarget <targetID>

Afterwards, your client will no longer complain about the missing storage targets.

Note

There are options in the server config files to disallow initialization of new storage directories and registration of new servers or targets, which are not set by default, but should be set for production environments. See storeAllowFirstRunInit and sysAllowNewServers.

Too many open files on beegfs-storage server

This usually happens when a user application leaks open files, e.g. it creates a lot of files and forgets to close them due to a bug in the application. (Note that open files will automatically be closed by the kernel when an application ends, so this problem is usually temporary.)

There are per-process limits and system-wide limits (accounting for all processes on a machine together) to control how many files can be kept open at the same time on a host.

To avoid applications from opening too many files at once and make sure that such application problems do not affect servers, it makes sense to reduce the per-process limit for normal applications of normal users to a reasonably low value, e.g. 1024 via the nofile setting in /etc/security/limits.conf.

If your applications actually need to open a lot of files at the same time and you need to raise the limit in the beegfs-storage service, here are the steps to do this:

  1. You can check the current limit for the maximum number of open files through the /proc file system, e.g. for running beegfs-storage processes on a machine:

    $ for i in `pidof beegfs-storage`; do cat /proc/$i/limits | grep open; done
    Max open files            50000             50000             files
    
  2. The beegfs-storage and the beegfs-meta processes can try to increase their own limits through the configuration option tuneProcessFDLimit, but this will be subject to the hard limits that were defined for the system. If the beegfs-storage service fails to increase its own limit, it will print a message line to its log file (/var/log/beegfs-storage.log). Set the following in /etc/beegfs/beegfs-storage.conf to let the beegfs-storage service try to increase its own limit to 10 million files:

    tuneProcessFDLimit=10000000
    
  3. You can increase the system-wide limits (the limits that account for all processes together) to 20 million at runtime by using the following commands:

    # sysctl -w fs.file-max=20000000
    fs.file-max = 20000000
    # sysctl -w fs.nr_open=20000000
    fs.nr_open = 20000000
    
  4. Make the changes from the previous step persistent across reboots by adding the following lines to /etc/sysctl.conf (or a corresponding file in the subdir /etc/sysctl.d):

    fs.file-max = 20000000
    fs.nr_open = 20000000
    
  5. Add the following line to /etc/security/limits.conf (or a corresponding file in the subdir /etc/security/limits.d) to increase the per-process limit to 10 million. If this server is not only used for BeeGFS, but also for other applications, you might want to set this only for processes owned by root.

    *                -      nofile          10000000
    
  6. Now you need to close your current shell and reopen a new shell on the system make the new settings effective. You can then restart the beegfs-storage process from the new shell and look at its limits:

    $ for i in `pidof beegfs-storage`; do cat /proc/$i/limits | grep open; done
    Max open files            10000000             10000000             files
    

What needs to be done when a server hostname has changed

Scenario: hostname or $HOSTNAME report a different name than during the BeeGFS installation and BeeGFS servers refuse to start up. Logs tell the nodeID has changed and therefore a startup was refused.

Note that by default, node IDs are generated based on the hostname of a server. As IDs are not allowed to change, see here for information on how to manually set your ID back to the previous value: Setting node or target IDs.

Change log level during runtime

You can use beegfs-ctl to change the log level of any service. As in the config files, the levels range from 1 to 5 with 5 being the most verbose.

$ beegfs-ctl --genericdebug --nodetype=meta --nodeid=1 "setloglevel 5"

Client won’t unmount due to filesystem being busy

  1. Run the command below to identify which processes are keeping the mount point busy.

    $ lsof /mnt/beegfs
    
  2. If that is the case, you will see a list of processes as follows:

    COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
    bash    21354  john  cwd    DIR   0,18        1    2 /mnt/beegfs
    bash    21355  mary  cwd    DIR   0,18        1    2 /mnt/beegfs
    
  3. Close the found processes. If that doesn’t work, try to use the following command to kill them:

    $ fuser -k /mnt/beegfs
    
  4. Wait 5 seconds.

  5. Retry the unmount / client stop

  6. If you still get the same error message, you might still be able to identify the processes by running lsof (without path argument) and checking the referenced paths. If a process is keeping a handler to a path like /mnt/beegfs/mydir1, then the path in the new list would appear as /mydir1 (the former mountpoint removed from the path).

What happens with BeeGFS in a split brain scenario?

In a split brain scenario, only the nodes of one system partition remain online, which is the partition where the management service is running. The nodes of the separated partition will deny access to files until they were reconnected and could resume their communication with the management service. Actually, their services would stall and only start producing IO errors if the system remained split for too long.

This behavior is intentional, because in typical BeeGFS use-cases, users can not be allowed to access data that might be out of date. Especially if write access is allowed in both partitions. In this case, the same file could end up being modified at both partitions, during the time when they were split, making it impossible for the synchronization to occur later when the split was over.