Benchmarking a BeeGFS System

Built-in Benchmarking Tools

BeeGFS includes a built-in storage targets benchmark (StorageBench) and a built-in network benchmark (NetBench).

StorageBench

The storage targets benchmark is intended to determine the maximum theoretical performance of BeeGFS on the storage targets or to detected defective or misconfigured storage targets.

This benchmark measures the streaming throughput of the underlying file system and devices independent of the network performance. To simulate client IO, this benchmark generates read/write work packages locally on the servers without any client communication.

Note that without any network communication, file striping cannot be simulated, so the benchmark results are rather comparable to client IO with disabled striping (i.e., one target per file).

It is possible to benchmark only specific targets or all targets together.

The storage benchmark is started and monitored with the beegfs-ctl tool.

The following example starts a write benchmark on all targets of all BeeGFS storage servers with an IO blocksize of 512 KB, using 10 threads (i.e., simulated client streams) per target, each of which will write 200 GB of data to its own file.

$ beegfs-ctl --storagebench --alltargets --write --blocksize=512K --size=200G --threads=10

To query the benchmark status/result of all targets, execute the command below.

$ beegfs-ctl --storagebench --alltargets --status

You can use the watch command for repeating the query in a given interval in seconds, as shown below.

$ watch -n 5 beegfs-ctl --storagebench --alltargets --status

The generated files will not be automatically deleted when a benchmark is complete. You can delete them by using the following command.

$ beegfs-ctl --storagebench --alltargets --cleanup

More details about the storage benchmark and its options are available in the help of the beegfs-ctl tool, as follows.

$ beegfs-ctl --storagebench --help

NetBench

The netbench mode is intended for network streaming throughput benchmarking. In this mode, write and read requests are transmitted over the network from the client to the storage servers like BeeGFS does it during normal operation (i.e., with disabled netbench mode). The difference is that with enabled netbench mode, the servers will discard received write requests instead of actually submitting the received data to the underlying file system (and vice very for read requests, in which case only memory buffers will be sent to the clients instead of actually reading from the underlying file system on the servers.) Thus, this mode helps to detect slow network connections and can be used to test the maximum network throughput between the clients and the storage servers, as throughput in this mode is independent of the underlying disks.

To test streaming throughput, you can use any tool that writes data to the BeeGFS mountpoint, e.g. dd or IOR. (Note that due to write operations being discarded on the servers, written files will continue to have length 0 after writing, so it is normal that some benchmark tools might print a warning about the unexpected file size.)

All other operations, like file creation and unlink will work normally with enabled netbench mode, only write and read operations are affected.

Netbench mode is enabled via the client runtime configuration in /proc/fs/beegfs. The following command will enable netbench mode for the particular client on which it is executed (other clients are not affected). A remount of the client is not required and will disable netbench mode.

$ echo 1 > /proc/fs/beegfs/<clientID>/netbench_mode

Obviously, it is important to disable netbench mode after the benchmarking is done to re-enable normal reads and writes to the file system. This can be done at runtime via the following command.

$ echo 0 > /proc/fs/beegfs/<clientID>/netbench_mode

Note that this command will only affect the client on which it is executed. If you enabled netbench mode on multiple clients, you also have to run this command on all of those clients.

External Benchmarking Tools

This section shows some of the commonly used benchmarks for file IO and metadata performance.

IOR

IOR is a benchmark tool to measure the performance of a single or multiple clients with one or more processes per client. IOR is based on MPI for distributed execution. It can be used to measure streaming throughput or small random IO performance (IOPS).

Please install the beegfs-client-devel package before building to enable BeeGFS support.

The value for the number of processes ${NUM_PROCS} depends on the number of clients to test and the number of processes per client. The block size ${BLOCK_SIZE} can be calculated with ((3 * RAM_SIZE_PER_STORAGE_SERVER * NUM_STORAGE_SERVERS) / ${NUM_PROCS}).

Multi-stream Throughput Benchmark

$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
         /usr/bin/IOR -wr -i5 -t2m -b ${BLOCK_SIZE} -g -F -e -o /mnt/beegfs/test.ior

Shared File Throughput Benchmark

$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
         /usr/bin/IOR -wr -i5 -t1200k -b ${BLOCK_SIZE} -g -e -o /mnt/beegfs/test.ior

Note

We’ve picked 1200k just as an example for a transfer size that is not aligned to the BeeGFS chunksize.

IOPS Benchmark

$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
         /usr/bin/IOR -w -i5 -t4k -b ${BLOCK_SIZE} -F -z -g -o /mnt/beegfs/test.ior

BeeGFS Tuning Parameters

-O beegfsNumTargets=<n> Number of storage targets to use for striping.

-O beegfsChunkSize=<b> Striping chunk size, in bytes. Accepts k=kilo, M=mega, G=giga, etc.

mpirun Parameters

-hostfile $PATH (file with the hostnames of the clients/servers to benchmark)

-np $N (number of processes)

IOR Parameters

-w (write benchmark)

-r (read benchmark)

-i $N (repetitions)

-t $N (transfer size, for dd it is the block size)

-b $N (block size, amount of data for a process)

-g (use barriers between open, write/read, and close)

-e (perform fsync upon POSIX write close, make sure reads are only started are all writes are done.)

-o $PATH (path to file for the test)

-F (one file per process)

-z (random access to the file)

References

IOR project git repository: https://github.com/hpc/ior

IOR project homepage: https://sourceforge.net/projects/ior-sio/

IOR at “Read the Docs”: https://ior.readthedocs.io/en/latest/

mdtest

mdtest is a metadata benchmark tool, which needs MPI for distributed execution. It can be used to measure values like file creations per seconds or stat operations per second of a single process or of multiple processes.

The value for the number of processes ${NUM_PROCS} depends on the number on clients to test and the number of processes per client to test. The number of directories can be calculated as ${NUM_DIRS} = (parameter -b ^ parameter -z). The total amount of files should always be higher than 1 000 000, so ${FILES_PER_DIR} is calculated as ${FILES_PER_DIR} = (1000000 / ${NUM_DIRS} / ${NUM_PROCS}).

File Create/Stat/Remove Benchmark

$ mpirun -hostfile /tmp/nodefile --map-by node -np ${NUM_PROCS} \
          mdtest -C -T -r -F -d /mnt/beegfs/mdtest -i 3 -I ${FILES_PER_DIR} -z 2 -b 8 -L -u``

mpirun Parameters

-hostfile $PATH (file with the hostnames of the clients/servers to benchmark)

-np $N (number of processes)

mdtest Parameters

-C (perform create tests)

-T (perform stat tests)

-r (perform remove tests)

-F (perform only file tests)

-d $PATH (path to test directory)

-i $N (iterations)

-I $N (number of files per directory)

-z $N (depth of the directory structure)

-b $N (how many subdirectories to be created per directory of a higher “-z” level)

-L (use leaf level of the tree for file tests)

-u (each task gets its own working directory)

On October 23, 2017, mdtest was merged into IOR. See https://github.com/hpc/ior

Recommendations

Regardless of which tool you use, it is important to take some points into consideration when benchmarking a BeeGFS file system.

  • Start with your system configured as advised in our tuning recommendations (see Metadata Node Tuning, Storage Node Tuning, Client Node Tuning). Then, perform adjustments on the tuning values and measure their impact on the benchmark results. Be aware that some of the tuning values might be interdependent. Trying to understand what a tuning value is good for, how it might be related to other tuning values and how it might influence the result will save you a lot of time.

  • The amount of data used in the benchmark execution should always be around 2.5 times the amount of RAM on the storage server machines, in order to prevent their cache from distorting the results, and make sure that you are really measuring sustained throughput.

  • Change the algorithm used for choosing the storage targets when files are created, set by option tuneTargetChooser in file /etc/beegfs/beegfs-meta.conf. The default value of that option is randomized, and it means that the metadata service picks random targets when a new file is created. In a production environment, this is usually the best option, because multiple users create files of different types and sizes. However, in an artificial test like you are doing, some storage targets may end up with data of more benchmark files than others. So, in order to make sure that, in this test, files are distributed evenly across the available targets, it makes sense to set option tuneTargetChooser to roundrobin or randomrobin. Check their documentation on the bottom of the configuration file.