Quota¶
Introduction¶
BeeGFS allows the definition of system-wide quotas of disk space allocation and number of chunk files, on a per-user or per-group basis. This can be used to organize users in different access layers with different levels of restriction and also prevent individuals from consuming alone all file system’s resources.
The BeeGFS quota management mechanism is composed of two features: quota tracking and quota enforcement. Quota tracking allows the query of the amount of data and the number of chunk files that users and groups are using in the system, without imposing any restriction.
Quota enforcement allows the definition and application of quota limits in the whole system. When this feature is enabled, the BeeGFS management daemon periodically collects quota reports from all storage targets in regular intervals, checks for exceeded quota limits, and informs the rest of the system about which users are no longer allowed to consume more resources.
BeeGFS quota management relies on quota data provided by the underlying file systems of storage server targets. Therefore, the capabilities of such file systems determine which types of quota BeeGFS is able to manage. For example, if the storage targets a version of ZFS prior to 0.7.4, BeeGFS will allow the definition of quotas only for used space, not for the number of files, as the latter is not supported by old releases of ZFS. If you use ZFS 0.7.4 or later, the latest version of BeeGFS will allow you to define both types of quota.
Quota limits can be configured globally, or separately for each storage pool. The creation of new files will be prohibited when either the global or the per pool limit is reached.
The following sections explain in more detail how these features work and how they can be configured.
Quota tracking¶
This section provides information on how to enable tracking of used disk space and number of chunk files on the storage targets.
Requirements and general notes¶
Quota tracking is designed to generally work with any underlying local file system on the storage
servers that supports user and group quota (reported through the system call quotactl()
), but
has only been fully tested with ext4, XFS, and ZFS.
Make sure that the local systems of all nodes are correctly configured to query passwd and group databases, by running the commands below. The first command should print the complete list of user IDs. The second one should print the complete list of group IDs.
$ getent passwd
$ getent group
If the commands above do not list all users and groups, you will not be able to use the command
beegfs-ctl --getquota --all
to query used space for all users at once, and you will not be able
to use quotaQueryType = system
in file beegfs-mgmtd.conf
for quota enforcement. However,
there are alternatives to both, which you will find in further sections.
If you are also creating files on the storage targets outside of the BeeGFS storage directory, note that the blocks and inodes occupied by those files will also account as used resources for the corresponding owner user. The reports would also be distorted if multiple storage targets were located within the same local file system instance.
Files stored in the disposal directory (which do not appear under the BeeGFS client mountpoint) also account for the amount of space used by users. Therefore, try to clear the disposal directory if you think that shown used space defers from actually used disk space.
Quota tracking has no requirement concerning metadata targets.
It is important to note that quota limits of the number of files concern data chunk files created on storage targets and not files created by end-users under the BeeGFS mount point. It is also important to understand that such quota limits do not concern the number of directories created in the system.
Enabling quota during a new BeeGFS installation¶
Walk-through these steps if you are about to setup a new BeeGFS instance that should support quota.
In this example, we assume that /dev/sdb
is the underlying disk or RAID array of a storage
target, which is mounted to the directory /data
.
Start by enabling quota support for the underlying file system on the storage targets, as described below for ext4, XFS, and ZFS.
ext4: Enable quota support for ext4:
# Mount device with quota support for users and groups $ mount /dev/sdb /data -t ext4 -orw,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv1,... # Create quota database files $ quotacheck -cug /data # Calculate current quota values $ quotacheck -vug /data # Enable quota counting $ quotaon -vug /data
XFS: Enable quota support for XFS:
# Mount device with quota support for users and groups $ mount /dev/sdb /data -t xfs -orw,uqnoenforce,gqnoenforce,...
ZFS: Enable quota support for ZFS:
Make sure that the package libzfs2-devel is installed on your system. On Debian/Ubuntu systems install libzfslinux-dev. Nothing else needs to be done, because quota tracking is supported automatically based on libzfs.
Perform the BeeGFS installation as usual. Before you start the client services, apply the setting below in the configuration file
/etc/beegfs/beegfs-client.conf
of all client nodes.quotaEnabled = true
This setting will cause the client to transfer extra user data to the servers, namely the uid and gid of the user making every IO syscall. This extra data allows BeeGFS to correctly compute disk space use and the number of files created by each user. If this setting is not done on a client node, all syscalls performed on that node will affect the quota consumption of the root user, instead of the actual caller.
Enabling quota for an existing BeeGFS installation¶
Take these steps if you want to enable quota support for an existing BeeGFS instance that was previously used without quota support.
In this example, we assume that /dev/sdb
is the underlying disk or RAID array of a storage
target, which is mounted to the directory /data
.
Stop all BeeGFS server and client services.
Enable quota support for the underlying file system on the storage targets, as described below for ext4, XFS and ZFS.
ext4: Enable quota support for ext4:
# Mount device with quota support for users and groups $ mount /dev/sdb /data -t ext4 -orw,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv1,... # Create quota database files $ quotacheck -cug /data # Calculate current quota values $ quotacheck -vug /data # Enable quota counting $ quotaon -vug /data
XFS: Enable quota support for XFS:
# Mount device with quota support for users and groups $ mount /dev/sdb /data -t xfs -orw,uqnoenforce,gqnoenforce,...
ZFS: Enable quota support for ZFS:
Make sure that the package libzfs2-devel is installed on your system. On Debian/Ubuntu systems install libzfslinux-dev. Nothing else needs to be done, because quota tracking is supported automatically based on libzfs.
Apply the setting below in the configuration file
/etc/beegfs/beegfs-client.conf
of all client nodes.quotaEnabled = true
This setting will cause the client to transfer extra user data to the servers, namely the uid and gid of the user making every IO syscall. This extra data allows BeeGFS to correctly compute disk space use and the number of files created by each user. If this setting is not done on a client node, all syscalls performed on that node will affect the quota consumption of the root user, instead of the actual caller.
Start all BeeGFS services.
Run the following command on one of the client nodes to update the ownership information of the existing data chunk files on the storage servers for quota tracking. This command can take a while to complete, but it is executed only once, and the system can be online while the chunk files are being updated.
$ beegfs-fsck --enablequota
This command could be re-executed if you discover later that some clients didn’t have option quotaEnabled set to true, and you want to update the ownership information of the data chunk files created in the meantime.
Querying quota information¶
Quota information can be queried with beegfs-ctl --getquota
. The command directly collects quota
reports from all storage servers and quota limits from the management service (if defined) and
aggregates all the quota information. A table will be printed for each storage pool. Here are some
usage examples.
Show quota information for all normal users:
$ beegfs-ctl --getquota --uid --all
Show quota information for the user ID 1000:
$ beegfs-ctl --getquota --uid 1000
Show quota information for group IDs range 1000 to 1500:
$ beegfs-ctl --getquota --gid --range 1000 1500
Show the default quota limits:
$ beegfs-ctl --getquota --defaultlimits
To get quota information for a specific storage pool, include the
--storagepoolid=X
option in the command. For example:$ beegfs-ctl --getquota --uid 1000 --storagepoolid=2
Show more examples and general help:
$ beegfs-ctl --getquota --help
If the underlying file system of the storage targets is ZFS and therefore, the quota of the number of files is not supported, the values of the column for used files/inodes will be marked with a dash (“-“).
Quota enforcement¶
This section provides information on how to activate quota enforcement in a BeeGFS system.
Requirements¶
Quota enforcement requires quota tracking to be enabled, as described above.
Enable quota enforcement¶
Take the steps below on each service to enable quota enforcement in the whole system.
Storage Service Setting¶
Set the option below to true in the storage configuration file
/etc/beegfs/beegfs-storage.conf
:quotaEnableEnforcement = true
Restart the storage service daemon.
Meta Service Setting¶
Set the option below to true in the meta configuration file
/etc/beegfs/beegfs-meta.conf
:quotaEnableEnforcement = true
Restart the meta service daemon.
Management Service Settings¶
Take the following steps below to enable quota enforcement in the system. All options presented in
this section are found in file /etc/beegfs/beegfs-mgmtd.conf
.
Quota reports are collected from the storage targets and quota limits checked by the management service at regular intervals. Such interval is set by option
quotaUpdateIntervalMin
, in minutes (by default: 10 minutes). A shorter interval will reduce the time until an exceeded limit is noticed, and the quota enforced. Thus, in order to reduce the possibility of a user momentarily exceeding its limits, this interval should be kept as low as possible. On the other hand, constant queries will cause some workload overhead on the system, possibly reducing performance. So, change this option with caution. If you reduce this interval, please consider changing also the type of quota query, as discussed below. All quota query types (system, range and file) will be updated with the specified interval.quotaUpdateIntervalMin = 10
Configure the type of query performed by the management daemon to get the user and group IDs. The default type of query is
system
, in which user and group IDs are retrieved from the same source used by commandsgetent passwd
andgetent group
. This source could be a central LDAP database or another user management system. When the user database system is slow, “system” might not be the best query type.quotaQueryType = system
The second valid value for quotaQueryType is
range
, which allows you to specify intervals of uids and gids in options quotaQueryUIDRange and quotaQueryGIDRange. In this case, all IDs of the user ID range and the group ID range will be queried. Do not define unnecessarily large ranges, as this could decrease query performance. This query type may help increase performance in cases where only a small range of IDs should be queried, instead of all IDs available in the system.quotaQueryType = range quotaQueryUIDRange = 1200,2000 quotaQueryGIDRange = 15000,20000
The third valid value for quotaQueryType is
file
, which allows you to specify the uids and gids in two text files (one ID per line). The path to the file with the uids is provided in option quotaQueryUIDFile and the path to file with the gids is provided in the option quotaQueryGIDFile. In this case, all uids and gids from the files will be queried. This query type is suitable for cases where the IDs are not sequential.quotaQueryType = file quotaQueryGIDFile = /etc/beegfs/groupIDs quotaQueryUIDFile = /etc/beegfs/userIDs
Set the following option to true to activate quota enforcement on the system.
quotaEnableEnforcement = true
Restart the management service daemon.
These changes won’t be noticed by the other server services until they are restarted. Therefore, restart the storage service daemons and the metadata service daemons.
Setting quota limits¶
Quota limits can be set with the command beegfs-ctl --setquota
. Here are some usage examples.
Set quota limit for user ID 1000 to 1 gigabyte and 500 chunk files:
$ beegfs-ctl --setquota --uid 1000 --sizelimit=1G --inodelimit=500
Set quota limit for group ID 1289 to 10 gigabyte and 22 chunk files:
$ beegfs-ctl --setquota --gid 1289 --sizelimit=10G --inodelimit=50
Set quota limit for user ID 1000 to unlimited size and 500 chunk files:
$ beegfs-ctl --setquota --uid 1000 --sizelimit=unlimited --inodelimit=500
Set quota limit for user ID 1000 to unlimited size and reset the chunk files to use the default quota limit:
$ beegfs-ctl --setquota --uid 1000 --sizelimit=unlimited --inodelimit=reset
Set quota limit for group ID 1289 to 10 gigabyte and unlimited chunk files:
$ beegfs-ctl --setquota --gid 1289 --sizelimit=10G --inodelimit=unlimited
Set default quota limits for the users to 10 gigabyte and unlimited chunk files:
$ beegfs-ctl --setquota --uid --default --sizelimit=10G --inodelimit=unlimited
Similar to the --getquota
mode, it is possible to set the quota limits via --all
,
--range
and --list
parameters. The --setquota
mode also allows the import of quota
limits from a file. Each line defines the limit for a user or group. Only one type of ID (either
user or group) can be given in a quota file. The quota file line format is: <ID or name>,<size
limit>,<inode limit>
.
Example file contents for user quota limits (e.g., located at /tmp/user_quota_limits.txt
):
2345,1T,500
8999,5G,20
dbadmin,20G,5000
To load the example user quota limits file and apply the user quota limits:
$ beegfs-ctl --setquota --uid --file=/tmp/user_quota_limits.txt
Quota can be configured per storage pool by specifying a storage pool id when running the
setquota
command. For example:$ beegfs-ctl --setquota --uid 1000 --sizelimit=1G --inodelimit=500 --storagepoolid=2
To show general help:
$ beegfs-ctl --setquota --help
Project directory quota tracking¶
The BeeGFS quota management mechanism is based on user and group quota. Group quota can be used for
project directories by using the setgid
flag on a directory (chmod g+s
/mnt/beegfs/project01
). If this flag is set, all files created in the directory will automatically
have the group of the directory instead of the primary group of the user who created the file.
A slightly different approach that achieves a very similar result is to mount BeeGFS with the
grpid
mount option. Doing so acts like a global setgid bit and makes new files and directories
inherit group IDs from their parent directories across the entire filesystem.
With this approach, it is useful to also create a separate group for the project, e.g., a group
project01 and apply it to the project directory (chown root:project01 /mnt/beegfs/project01
). To
avoid conflicts with per-user quota limits, the same approach can be used not only for shared
project directories but also for user directories, in which case each user has its own group.
Alternatively, if you want to track used space or number of files based on subdirectory trees, you might want to look at the Robinhood Policy Engine.
Robinhood can run parallel scans of the file system at regular intervals and store the discovered file and directory information in a SQL database. On the one hand, this has the advantage of enabling various queries of the database with fast results. On the other hand, automatic actions for certain events can be defined in Robinhood, e.g., if the defined used space limit for a certain subdirectory tree is exceeded.
As BeeGFS keeps all the metadata for such scans readily available on the metadata servers (usually flash storage), crawling a file system in parallel is fast. To make sure that the SQL database of Robinhood does not reduce the scan speed, it is recommended to have the Robinhood database also on flash storage.