Benchmarking a BeeGFS System¶
Built-in Benchmarking Tools¶
BeeGFS includes a built-in storage targets benchmark (StorageBench) and a built-in network benchmark (NetBench).
StorageBench¶
The storage targets benchmark is intended to determine the maximum theoretical performance of BeeGFS on the storage targets or to detect defective or misconfigured storage targets.
This benchmark measures the streaming throughput of the underlying file system and devices independent of the network performance. To simulate client IO, this benchmark generates read/write work packages locally on the servers without any client communication.
Note that without any network communication, file striping cannot be simulated, so the benchmark results are rather comparable to client IO with disabled striping (i.e., one target per file).
It is possible to benchmark only specific targets or all targets together.
The storage benchmark is started and monitored with the beegfs tool.
The following example starts a write benchmark on all targets of all BeeGFS storage servers with an IO blocksize of 512 KB, using 10 threads (i.e., simulated client streams) per target, each of which will write 200 GB of data to its own file.
$ beegfs benchmark start --block-size=512KiB --size=200GiB --num-tasks=10
To query the benchmark status/result of all targets, execute the command below.
$ beegfs benchmark status
You can use the watch flag for repeating the query in a given interval in seconds, as shown below:
$ beegfs benchmark status --watch=1s
The generated files will not be automatically deleted when a benchmark is complete. You can delete them by using the following command.
$ beegfs benchmark cleanup
More details about the storage benchmark and its options are available in the help of the beegfs
tool, as follows.
$ beegfs benchmark --help
$ beegfs benchmark start --help
$ beegfs benchmark status --help
NetBench¶
The netbench mode is intended for network streaming throughput benchmarking. In this mode, write and read requests are transmitted over the network from the client to the storage servers like BeeGFS does it during normal operation (i.e., with disabled netbench mode). The difference is that with enabled netbench mode, the servers will discard received write requests instead of actually submitting the received data to the underlying file system (and vice very for read requests, in which case only memory buffers will be sent to the clients instead of actually reading from the underlying file system on the servers.) Thus, this mode helps to detect slow network connections and can be used to test the maximum network throughput between the clients and the storage servers, as throughput in this mode is independent of the underlying disks.
To test streaming throughput, you can use any tool that writes data to the BeeGFS mount point, e.g.
dd or IOR.  (Note that due to write operations being discarded on the servers, written files will
continue to have length 0 after writing, so it is normal that some benchmark tools might print a
warning about the unexpected file size.)
All other operations, like file creation and unlink will work normally with enabled netbench mode, only write and read operations are affected.
Netbench mode is enabled via the client runtime configuration in /proc/fs/beegfs.  The following
command will enable netbench mode for the particular client on which it is executed (other clients
are not affected).  A remount of the client is not required and will disable netbench mode.
$ echo 1 > /proc/fs/beegfs/<clientID>/netbench_mode
Obviously, it is important to disable netbench mode after the benchmarking is done to re-enable normal reads and writes to the file system. This can be done at runtime via the following command.
$ echo 0 > /proc/fs/beegfs/<clientID>/netbench_mode
Note that this command will only affect the client on which it is executed. If you enabled netbench mode on multiple clients, you also have to run this command on all of those clients.
External Benchmarking Tools¶
This section shows some of the commonly used benchmarks for file IO and metadata performance.
IOR¶
IOR is a benchmark tool to measure the performance of a single or multiple clients with one or more processes per client. IOR is based on MPI for distributed execution. It can be used to measure streaming throughput or small random IO performance (IOPS).
Optionally install the beegfs-client-devel package before building to enable BeeGFS support if
you want to automatically configure the BeeGFS tuning parameters listed below on your test directory
using IOR. If you don’t want to enable BeeGFS support in IOR you can also manually apply the desired
striping pattern on the test directory before initially running IOR.
The value for the number of processes ${NUM_PROCS} depends on the number of clients to test and the
number of processes per client.  The block size ${BLOCK_SIZE} can be calculated with ((3 *
RAM_SIZE_PER_STORAGE_SERVER * NUM_STORAGE_SERVERS) / ${NUM_PROCS}).
Multi-stream Throughput Benchmark¶
$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
         /usr/bin/IOR -wr -i5 -t2m -b ${BLOCK_SIZE} -g -F -e -o /mnt/beegfs/test.ior
IOPS Benchmark¶
$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
         /usr/bin/IOR -w -i5 -t4k -b ${BLOCK_SIZE} -F -z -g -o /mnt/beegfs/test.ior
BeeGFS Tuning Parameters¶
The following BeeGFS specific IOR parameters are also available if you have installed the
beegfs-client-devel package and compiled IOR with BeeGFS support:
-O beegfsNumTargets=<n> Number of storage targets to use for striping.
-O beegfsChunkSize=<b> Striping chunk size, in bytes. Accepts k=kilo, M=mega, G=giga, etc.
mpirun Parameters¶
-hostfile $PATH (file with the hostnames of the clients/servers to benchmark)
-np $N (number of processes)
IOR Parameters¶
-w (write benchmark)
-r (read benchmark)
-i $N (repetitions)
-t $N (transfer size, for dd it is the block size)
-b $N (block size, amount of data for a process)
-g (use barriers between open, write/read, and close)
-e (perform fsync upon POSIX write close, make sure reads are only started are all writes are done.)
-o $PATH (path to file for the test)
-F (one file per process)
-z (random access to the file)
References¶
IOR project git repository: https://github.com/hpc/ior
IOR project homepage: https://sourceforge.net/projects/ior-sio/
IOR at “Read the Docs”: https://ior.readthedocs.io/en/latest/
mdtest¶
mdtest is a metadata benchmark tool, which needs MPI for distributed execution. It can be used to measure values like file creations per seconds or stat operations per second of a single process or of multiple processes.
The value for the number of processes ${NUM_PROCS} depends on the number on clients to test and the
number of processes per client to test.  The number of directories can be calculated as
${NUM_DIRS} = (parameter -b ^ parameter -z).  The total amount of files should always be higher
than 1 000 000, so ${FILES_PER_DIR} is calculated as ${FILES_PER_DIR} = (1000000 / ${NUM_DIRS} /
${NUM_PROCS}).
File Create/Stat/Remove Benchmark¶
$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
          mdtest -C -T -r -F -d /mnt/beegfs/mdtest -i 3 -I ${FILES_PER_DIR} -z 2 -b 8 -L -u``
mpirun Parameters¶
-hostfile $PATH (file with the hostnames of the clients/servers to benchmark)
-np $N (number of processes)
mdtest Parameters¶
-C (perform create tests)
-T (perform stat tests)
-r (perform remove tests)
-F (perform only file tests)
-d $PATH (path to test directory)
-i $N (iterations)
-I $N (number of files per directory)
-z $N (depth of the directory structure)
-b $N (how many subdirectories to be created per directory of a higher “-z” level)
-L (use leaf level of the tree for file tests)
-u (each task gets its own working directory)
On October 23, 2017, mdtest was merged into IOR. See https://github.com/hpc/ior
Recommendations¶
Regardless of which tool you use, it is important to take some points into consideration when benchmarking a BeeGFS file system.
- Start with your system configured as advised in our tuning recommendations (see Metadata Node Tuning, Storage Node Tuning, Client Node Tuning). Then, perform adjustments on the tuning values and measure their impact on the benchmark results. Be aware that some of the tuning values might be interdependent. Trying to understand what a tuning value is good for, how it might be related to other tuning values and how it might influence the result will save you a lot of time. 
- The amount of data used in the benchmark execution should always be around 2.5 times the amount of RAM on the storage server machines, in order to prevent their cache from distorting the results, and make sure that you are really measuring sustained throughput. 
- Change the algorithm used for choosing the storage targets when files are created, set by option - tuneTargetChooserin file- /etc/beegfs/beegfs-meta.conf. The default value of that option is- randomized, and it means that the metadata service picks random targets when a new file is created. In a production environment, this is usually the best option, because multiple users create files of different types and sizes. However, in an artificial test like you are doing, some storage targets may end up with data of more benchmark files than others. So, in order to make sure that, in this test, files are distributed evenly across the available targets, it makes sense to set option- tuneTargetChooserto- roundrobinor- randomrobin. Check their documentation on the bottom of the configuration file.
