Quota

Introduction

BeeGFS allows the definition of system-wide quotas of disk space allocation and number of chunk files, on a per-user or per-group basis. This can be used to organize users in different access layers with different levels of restriction and also prevent individuals from consuming alone all file system’s resources.

The BeeGFS quota management mechanism is composed of two features: quota tracking and quota enforcement. Quota tracking allows the query of the amount of data and the number of chunk files that users and groups are using in the system, without imposing any restriction.

Quota enforcement allows the definition and application of quota limits in the whole system. When this feature is enabled, the BeeGFS management daemon periodically collects quota reports from all storage targets in regular intervals, checks for exceeded quota limits, and informs the rest of the system about which users are no longer allowed to consume more resources.

BeeGFS quota management relies on quota data provided by the underlying file systems of storage server targets. Therefore, the capabilities of such file systems determine which types of quota BeeGFS is able to manage. For example, if the storage targets a version of ZFS prior to 0.7.4, BeeGFS will allow the definition of quotas only for used space, not for the number of files, as the latter is not supported by old releases of ZFS. If you use ZFS 0.7.4 or later, the latest version of BeeGFS will allow you to define both types of quota.

Quota limits can be configured globally, or separately for each storage pool. The creation of new files will be prohibited when either the global or the per pool limit is reached.

The following sections explain in more detail how these features work and how they can be configured.

Quota tracking

This section provides information on how to enable tracking of used disk space and number of chunk files on the storage targets.

Requirements and general notes

Quota tracking is designed to generally work with any underlying local file system on the storage servers that supports user and group quota (reported through the system call quotactl()), but has only been fully tested with ext4, XFS, and ZFS.

Make sure that the local systems of all nodes are correctly configured to query passwd and group databases, by running the commands below. The first command should print the complete list of user IDs. The second one should print the complete list of group IDs.

$ getent passwd
$ getent group

If the commands above do not list all users and groups, you will not be able to use the command beegfs-ctl --getquota --all to query used space for all users at once, and you will not be able to use quotaQueryType = system in file beegfs-mgmtd.conf for quota enforcement. However, there are alternatives to both, which you will find in further sections.

If you are also creating files on the storage targets outside of the BeeGFS storage directory, note that the blocks and inodes occupied by those files will also account as used resources for the corresponding owner user. The reports would also be distorted if multiple storage targets were located within the same local file system instance.

Files stored in the disposal directory (which do not appear under the BeeGFS client mountpoint) also account for the amount of space used by users. Therefore, try to clear the disposal directory if you think that shown used space defers from actually used disk space.

Quota tracking has no requirement concerning metadata targets.

It is important to note that quota limits of the number of files concern data chunk files created on storage targets and not files created by end-users under the BeeGFS mount point. It is also important to understand that such quota limits do not concern the number of directories created in the system.

Enabling quota during a new BeeGFS installation

Walk-through these steps if you are about to setup a new BeeGFS instance that should support quota.

In this example, we assume that /dev/sdb is the underlying disk or RAID array of a storage target, which is mounted to the directory /data.

  1. Start by enabling quota support for the underlying file system on the storage targets, as described below for ext4, XFS, and ZFS.

    ext4: Enable quota support for ext4:

    # Mount device with quota support for users and groups
    $ mount /dev/sdb /data -t ext4 -orw,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv1,...
    
    # Create quota database files
    $ quotacheck -cug /data
    
    # Calculate current quota values
    $ quotacheck -vug /data
    
    # Enable quota counting
    $ quotaon -vug /data
    

    XFS: Enable quota support for XFS:

    # Mount device with quota support for users and groups
    $ mount /dev/sdb /data -t xfs -orw,uqnoenforce,gqnoenforce,...
    

    ZFS: Enable quota support for ZFS:

    Make sure that the package libzfs2-devel is installed on your system. On Debian/Ubuntu systems install libzfslinux-dev. Nothing else needs to be done, because quota tracking is supported automatically based on libzfs.

  2. Perform the BeeGFS installation as usual. Before you start the client services, apply the setting below in the configuration file /etc/beegfs/beegfs-client.conf of all client nodes.

    quotaEnabled = true
    

    This setting will cause the client to transfer extra user data to the servers, namely the uid and gid of the user making every IO syscall. This extra data allows BeeGFS to correctly compute disk space use and the number of files created by each user. If this setting is not done on a client node, all syscalls performed on that node will affect the quota consumption of the root user, instead of the actual caller.

Enabling quota for an existing BeeGFS installation

Take these steps if you want to enable quota support for an existing BeeGFS instance that was previously used without quota support.

In this example, we assume that /dev/sdb is the underlying disk or RAID array of a storage target, which is mounted to the directory /data.

  1. Stop all BeeGFS server and client services.

  2. Enable quota support for the underlying file system on the storage targets, as described below for ext4, XFS and ZFS.

    ext4: Enable quota support for ext4:

    # Mount device with quota support for users and groups
    $ mount /dev/sdb /data -t ext4 -orw,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv1,...
    
    # Create quota database files
    $ quotacheck -cug /data
    
    # Calculate current quota values
    $ quotacheck -vug /data
    
    # Enable quota counting
    $ quotaon -vug /data
    

    XFS: Enable quota support for XFS:

    # Mount device with quota support for users and groups
    $ mount /dev/sdb /data -t xfs -orw,uqnoenforce,gqnoenforce,...
    

    ZFS: Enable quota support for ZFS:

    Make sure that the package libzfs2-devel is installed on your system. On Debian/Ubuntu systems install libzfslinux-dev. Nothing else needs to be done, because quota tracking is supported automatically based on libzfs.

  3. Apply the setting below in the configuration file /etc/beegfs/beegfs-client.conf of all client nodes.

    quotaEnabled = true
    

    This setting will cause the client to transfer extra user data to the servers, namely the uid and gid of the user making every IO syscall. This extra data allows BeeGFS to correctly compute disk space use and the number of files created by each user. If this setting is not done on a client node, all syscalls performed on that node will affect the quota consumption of the root user, instead of the actual caller.

  4. Start all BeeGFS services.

  5. Run the following command on one of the client nodes to update the ownership information of the existing data chunk files on the storage servers for quota tracking. This command can take a while to complete, but it is executed only once, and the system can be online while the chunk files are being updated.

    $ beegfs-fsck --enablequota
    

    This command could be re-executed if you discover later that some clients didn’t have option quotaEnabled set to true, and you want to update the ownership information of the data chunk files created in the meantime.

Querying quota information

Quota information can be queried with beegfs-ctl --getquota. The command directly collects quota reports from all storage servers and quota limits from the management service (if defined) and aggregates all the quota information. A table will be printed for each storage pool. Here are some usage examples.

  • Show quota information for all normal users:

    $ beegfs-ctl --getquota --uid --all
    
  • Show quota information for the user ID 1000:

    $ beegfs-ctl --getquota --uid 1000
    
  • Show quota information for group IDs range 1000 to 1500:

    $ beegfs-ctl --getquota --gid --range 1000 1500
    
  • Show the default quota limits:

    $ beegfs-ctl --getquota --defaultlimits
    
  • To get quota information for a specific storage pool, include the --storagepoolid=X option in the command. For example:

    $ beegfs-ctl --getquota --uid 1000 --storagepoolid=2
    
  • Show more examples and general help:

    $ beegfs-ctl --getquota --help
    

If the underlying file system of the storage targets is ZFS and therefore, the quota of the number of files is not supported, the values of the column for used files/inodes will be marked with a dash (“-“).

Quota enforcement

This section provides information on how to activate quota enforcement in a BeeGFS system.

Requirements

Quota enforcement requires quota tracking to be enabled, as described above.

Enable quota enforcement

Take the steps below on each service to enable quota enforcement in the whole system.

Storage Service Setting

  1. Set the option below to true in the storage and management configuration file /etc/beegfs/beegfs-storage.conf:

    quotaEnableEnforcement = true
    
  2. Restart the storage service daemon.

Management Service Settings

Take the following steps below to enable quota enforcement in the system. All options presented in this section are found in file /etc/beegfs/beegfs-mgmtd.conf.

  1. Quota reports are collected from the storage targets and quota limits checked by the management service at regular intervals. Such interval is set by option quotaUpdateIntervalMin, in minutes (by default: 10 minutes). A shorter interval will reduce the time until an exceeded limit is noticed, and the quota enforced. Thus, in order to reduce the possibility of a user momentarily exceeding its limits, this interval should be kept as low as possible. On the other hand, constant queries will cause some workload overhead on the system, possibly reducing performance. So, change this option with caution. If you reduce this interval, please consider changing also the type of quota query, as discussed below. All quota query types (system, range and file) will be updated with the specified interval.

    quotaUpdateIntervalMin = 10
    
  2. Configure the type of query performed by the management daemon to get the user and group IDs. The default type of query is system, in which user and group IDs are retrieved from the same source used by commands getent passwd and getent group. This source could be a central LDAP database or another user management system. When the user database system is slow, “system” might not be the best query type.

    quotaQueryType = system
    

    The second valid value for quotaQueryType is range, which allows you to specify intervals of uids and gids in options quotaQueryUIDRange and quotaQueryGIDRange. In this case, all IDs of the user ID range and the group ID range will be queried. Do not define unnecessarily large ranges, as this could decrease query performance. This query type may help increase performance in cases where only a small range of IDs should be queried, instead of all IDs available in the system.

    quotaQueryType = range
    quotaQueryUIDRange = 1200,2000
    quotaQueryGIDRange = 15000,20000
    

    The third valid value for quotaQueryType is file, which allows you to specify the uids and gids in two text files (one ID per line). The path to the file with the uids is provided in option quotaQueryUIDFile and the path to file with the gids is provided in the option quotaQueryGIDFile. In this case, all uids and gids from the files will be queried. This query type is suitable for cases where the IDs are not sequential.

    quotaQueryType = file
    quotaQueryGIDFile = /etc/beegfs/groupIDs
    quotaQueryUIDFile = /etc/beegfs/userIDs
    
  3. Set the following option to true to activate quota enforcement on the system.

    quotaEnableEnforcement = true
    
  4. Restart the management service daemon.

  5. These changes won’t be noticed by the other server services until they are restarted. Therefore, restart the storage service daemons and the metadata service daemons.

Setting quota limits

Quota limits can be set with the command beegfs-ctl --setquota. Here are some usage examples.

  • Set quota limit for user ID 1000 to 1 gigabyte and 500 chunk files:

    $ beegfs-ctl --setquota --uid 1000 --sizelimit=1G --inodelimit=500
    
  • Set quota limit for group ID 1289 to 10 gigabyte and 22 chunk files:

    $ beegfs-ctl --setquota --gid 1289 --sizelimit=10G --inodelimit=50
    
  • Set quota limit for user ID 1000 to unlimited size and 500 chunk files:

    $ beegfs-ctl --setquota --uid 1000 --sizelimit=unlimited --inodelimit=500
    
  • Set quota limit for user ID 1000 to unlimited size and reset the chunk files to use the default quota limit:

    $ beegfs-ctl --setquota --uid 1000 --sizelimit=unlimited --inodelimit=reset
    
  • Set quota limit for group ID 1289 to 10 gigabyte and unlimited chunk files:

    $ beegfs-ctl --setquota --gid 1289 --sizelimit=10G --inodelimit=unlimited
    
  • Set default quota limits for the users to 10 gigabyte and unlimited chunk files:

    $ beegfs-ctl --setquota --uid --default --sizelimit=10G --inodelimit=unlimited
    

Similar to the --getquota mode, it is possible to set the quota limits via --all, --range and --list parameters. The --setquota mode also allows the import of quota limits from a file. Each line defines the limit for a user or group. Only one type of ID (either user or group) can be given in a quota file. The quota file line format is: <ID or name>,<size limit>,<inode limit>.

Example file contents for user quota limits (e.g., located at /tmp/user_quota_limits.txt):

2345,1T,500
8999,5G,20
dbadmin,20G,5000
  • To load the example user quota limits file and apply the user quota limits:

    $ beegfs-ctl --setquota --uid --file=/tmp/user_quota_limits.txt
    
  • Quota can be configured per storage pool by specifying a storage pool id when running the setquota command. For example:

    $ beegfs-ctl --setquota --uid 1000 --sizelimit=1G --inodelimit=500 --storagepoolid=2
    
  • To show general help:

    $ beegfs-ctl --setquota --help
    

Project directory quota tracking

The BeeGFS quota management mechanism is based on user and group quota. Group quota can be used for project directories by using the setgid flag on a directory (chmod g+s /mnt/beegfs/project01). If this flag is set, all files created in the directory will automatically have the group of the directory instead of the primary group of the user who created the file.

With this approach, it is useful to also create a separate group for the project, e.g., a group project01 and apply it to the project directory (chown root:project01 /mnt/beegfs/project01). To avoid conflicts with per-user quota limits, the same approach can be used not only for shared project directories but also for user directories, in which case each user has its own group.

Alternatively, if you want to track used space or number of files based on subdirectory trees, you might want to look at the Robinhood Policy Engine.

Robinhood can run parallel scans of the file system at regular intervals and store the discovered file and directory information in a SQL database. On the one hand, this has the advantage of enabling various queries of the database with fast results. On the other hand, automatic actions for certain events can be defined in Robinhood, e.g., if the defined used space limit for a certain subdirectory tree is exceeded.

As BeeGFS keeps all the metadata for such scans readily available on the metadata servers (usually flash storage), crawling a file system in parallel is fast. To make sure that the SQL database of Robinhood does not reduce the scan speed, it is recommended to have the Robinhood database also on flash storage.