General Questions¶

‘Access denied’ error on the client, even with correct permissions¶

Please check if you have SELinux enabled on the client machine. If it is enabled, disabling it should solve your problem. SELinux can be disabled by setting SELINUX=disabled in the configuration file /etc/selinux/config. Afterwards, you might need to reboot your client for the new setting to become effective.

Client refuses to mount because of an ‘unknown storage target’¶

Scenario¶

While testing BeeGFS, you removed the storage directory of a storage server, but kept the storage directory of the management server. Now the BeeGFS client refuses to mount and prints an error about an unknown storage target to the log file.

What happened to your file system¶

When you start a new beegfs-storage daemon with a given storage directory, the daemon initializes this directory by assigning an ID to this storage target path and registering this target ID at the management server. When you delete this directory, the storage server creates a new directory on next startup with a new ID and also registers this ID at the management server. (Because the storage server cannot know what happened to the old directory and whether you might have just moved the data to another machine, so it needs a new ID here.)

When the client starts, it performs a sanity check by querying all registered target IDs from the management server and checks whether all of them are accessible. If you removed a storage directory, this check fails and thus the client refuses to mount. (Note: This sanity check can be disabled, but it is definitely a good thing in this case and saves you from more trouble.)

Now you have two alternative options:

Solution A¶

Simply remove the storage directories of all BeeGFS services to start with a clean new file system:

Stop all the BeeGFS server daemons, i.e. beegfs-mgmtd, beegfs-meta, beegfs-storage:
```
# systemctl stop beegfs\*
```
Delete the management database and remove all metadata and storage target directories. The paths where each service stores it data can be looked up in the server configuration files:
- db-file in configuration file /etc/beegfs/beegfs-mgmtd.toml
- storeMetaDirectory in configuration file /etc/beegfs/beegfs-meta.conf
- storeStorageDirectory in configuration file /etc/beegfs/beegfs-storage.conf
Restart the daemons.
```
# systemctl start beegfs\*
```

Now you have a fresh new file system without any of the previously registered target IDs.

Solution B¶

Unregister the invalid target ID from the management server. For this, you would first use the beegfs tool provided in the beegfs-tools package to list the registered target IDs:

$ beegfs target list

Then check the contents of the file targetNumID in your storage directory on the storage server to find out which target ID is the current one that you want to keep. For all other target IDs from the list, which are assigned to this storage server but are no longer valid, use this command to unregister them from the management daemon:

$ beegfs target delete <targetID>

Afterwards, your client will no longer complain about the missing storage targets.

Note

There are options in the server config files to disallow initialization of new storage directories and registration of new servers or targets, which are not set by default, but should be set for production environments. On the management service see registration-disable and on the metadata and storage services see storeAllowFirstRunInit.

Too many open files on beegfs-storage server¶

This usually happens when a user application leaks open files, e.g. it creates a lot of files and forgets to close them due to a bug in the application. (Note that open files will automatically be closed by the kernel when an application ends, so this problem is usually temporary.)

There are per-process limits and system-wide limits (accounting for all processes on a machine together) to control how many files can be kept open at the same time on a host.

To avoid applications from opening too many files at once and make sure that such application problems do not affect servers, it makes sense to reduce the per-process limit for normal applications of normal users to a reasonably low value, e.g. 1024 via the nofile setting in /etc/security/limits.conf.

If your applications actually need to open a lot of files at the same time and you need to raise the limit in the beegfs-storage service, here are the steps to do this:

You can check the current limit for the maximum number of open files through the /proc file system, e.g. for running beegfs-storage processes on a machine:
```
$ for i in `pidof beegfs-storage`; do cat /proc/$i/limits | grep open; done
Max open files            50000             50000             files
```
The beegfs-storage and the beegfs-meta processes can try to increase their own limits through the configuration option tuneProcessFDLimit, but this will be subject to the hard limits that were defined for the system. If the beegfs-storage service fails to increase its own limit, it will print a message line to the log (journalctl -u beegfs-storage). Set the following in /etc/beegfs/beegfs-storage.conf to let the beegfs-storage service try to increase its own limit to 10 million files:
```
tuneProcessFDLimit=10000000
```
You can increase the system-wide limits (the limits that account for all processes together) to 20 million at runtime by using the following commands:
```
# sysctl -w fs.file-max=20000000
fs.file-max = 20000000
# sysctl -w fs.nr_open=20000000
fs.nr_open = 20000000
```
Make the changes from the previous step persistent across reboots by adding the following lines to /etc/sysctl.conf (or a corresponding file in the subdir /etc/sysctl.d):
```
fs.file-max = 20000000
fs.nr_open = 20000000
```
Add the following line to /etc/security/limits.conf (or a corresponding file in the subdir /etc/security/limits.d) to increase the per-process limit to 10 million. If this server is not only used for BeeGFS, but also for other applications, you might want to set this only for processes owned by root.
```
*                -      nofile          10000000
```
Now you need to close your current shell and reopen a new shell on the system make the new settings effective. You can then restart the beegfs-storage process from the new shell and look at its limits:
```
$ for i in `pidof beegfs-storage`; do cat /proc/$i/limits | grep open; done
Max open files            10000000             10000000             files
```

Change log level during runtime¶

You can use beegfs debug to change the log level of any service. As in the config files, the levels range from 1 to 5 with 5 being the most verbose.

$ beegfs debug meta:1 setloglevel 5

Client won’t unmount due to filesystem being busy¶

Run the command below to identify which processes are keeping the mount point busy.
```
$ lsof /mnt/beegfs
```

If that is the case, you will see a list of processes as follows:

COMMAND   PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
bash    21354  john  cwd    DIR   0,18        1    2 /mnt/beegfs
bash    21355  mary  cwd    DIR   0,18        1    2 /mnt/beegfs

Close the found processes. If that doesn’t work, try to use the following command to kill them:
```
$ fuser -k /mnt/beegfs
```
Wait 5 seconds.
Retry the unmount / client stop
If you still get the same error message, you might still be able to identify the processes by running lsof (without path argument) and checking the referenced paths. If a process is keeping a handler to a path like /mnt/beegfs/mydir1, then the path in the new list would appear as /mydir1 (the former mountpoint removed from the path).

What happens with BeeGFS in a split brain scenario?¶

In a split brain scenario, only the nodes of one system partition remain online, which is the partition where the management service is running. The nodes of the separated partition will deny access to files until they were reconnected and could resume their communication with the management service. Actually, their services would stall and only start producing IO errors if the system remained split for too long.

This behavior is intentional, because in typical BeeGFS use-cases, users can not be allowed to access data that might be out of date. Especially if write access is allowed in both partitions. In this case, the same file could end up being modified at both partitions, during the time when they were split, making it impossible for the synchronization to occur later when the split was over.

Why do BeeGFS clients using RDMA log “beegfs: enabling unsafe global rkey” in the kernel?¶

Should I be concerned about this message?

No, this message is expected behavior on BeeGFS clients utilizing Remote Direct Memory Access (RDMA) when the kernel does not support “DMA keys” and only supports global rkeys.

What does this message mean?

Remote Direct Memory Access (RDMA) allows data to be transferred directly between the memory of two computers on a network without involving the processor, cache, or operating system of either computer. In RDMA, a ‘remote key’ or ‘rkey’ is used to permit peer RDMA READ/WRITE access to a specific region of memory that is registered with a protection domain using the shared rkey. This message indicates that the Linux kernel has logged the creation of a protection domain with the global rkey option enabled, which is a common practice in Linux RDMA software modules to optimize performance for small I/O requests. The ‘unsafe’ in the message merely highlights that the usage is not the default secure option but doesn’t necessarily imply any vulnerability or risk, especially since BeeGFS employs stringent security practices to mitigate any potential risks associated with using global rkey.

In BeeGFS, each connection has a separate protection domain. This means when BeeGFS’ uses a global rkey it is still only providing access to memory registered for a particular connection. This greatly secures the use of a global rkey while optimizing RDMA performance. It is always strongly recommended to enable Connection Based Authentication. to prevent unauthorized clients or servers from interacting with the file system.

How can I get rid of this message?

Because this warning is logged by the kernel, it is not possible for BeeGFS to suppress this message when global rkeys are used. One alternative to using global rkeys is to use a “DMA key”, though this is not necessarily considered to be safer because they both provide access to all RDMA registered memory for a particular protection domain. However opting to use a “DMA key” will prevent the logging of “enabling unsafe global rkey”.

Historically BeeGFS would automatically use the DMA key option if it was supported by the kernel, but support for DMA keys was removed in kernel 4.9 and later. DMA keys are now are only supported when using the NVIDIA (Mellanox) OFED drivers. To allow users to continue using the “DMA key” option, starting in BeeGFS 7.4.3 and 7.2.13 a new client side connRDMAKeyType option was introduced that can be set to “dma” to enable the use of DMA keys when the OFED is installed.

If you wish to use the inbox RDMA drivers with kernel 4.9 or later then the only way to suppress the message is to adjust the kernel log level to only log kernel errors and above. Please note adjusting the kernel log level may result in other useful log messages being omitted, which may make future system troubleshooting difficult.

Loading the BeeGFS client module fails for “could not insert ‘beegfs’: Invalid argument”?¶

When attempting to load the BeeGFS client module, you may encounter an error that reads: “could not insert ‘beegfs’: Invalid argument”. This FAQ provides troubleshooting steps to resolve this issue. Begin by examining the output of dmesg leading up to the error and match it to one of the sections below.

beegfs: disagrees about version of symbol¶

The BeeGFS client module must be compiled to work with the exact combination of kernel and RDMA driver versions currently in use by each Linux installation it is used with. While various symbols might be flagged by the error, in nearly all cases this class of errors indicates the client module was built for a different combination of versions than is actively loaded on the system. A frequent scenario is when the NVIDIA (formerly Mellanox) OFED drivers are loaded, but the module was built for the inbox RDMA drivers provided by the kernel.

To correct:

Run ofed_info to check for the installation of the Mellanox OFED drivers.
If you are using the “beegfs-client” package modify /etc/beegfs/beegfs-client-autobuild.conf to append OFED_INCLUDE_PATH=<PATH> to the buildArgs=-j8 line, modifying <PATH> to point to the appropriate headers (for detailed information, refer to the help comments in the file).
If you are using the “beegfs-dkms-client” package refer to the documentation for the BeeGFS DKMS Client and Handling Third-party OFED Installations.