BeeGFS Hive Index

In today’s world where enormous amounts of data are generated every day, especially in supercomputing datacenters, querying and managing metadata about that data becomes very difficult as the scale of the data is so vast. For data management tasks like archival, scanning the file system to get the list of old files can impact the on-going file system operations. Another challenge is to get internal metadata of the BeeGFS parallel file system. BeeGFS Hive Index helps to solve these problems efficiently by providing users with faster and more efficient access to their data and easy to use and highly performant interface and without impacting general file system operations.

BeeGFS Hive Index is a hierarchical metadata index created by extracting file system metadata and BeeGFS specific metadata and storing it efficiently in the local database. It provides the user a simple interface to query the index to get detailed metadata information, like file names, access/modification/creation time, permissions, file size, extended attributes and BeeGFS specific information, like stripe pattern, storage targets where file’s data is stored, owner metadata node ID on which file’s metadata is stored, etc.

../_images/hive_index_overview.png

Description

BeeGFS Hive Index is a solution which allows users to quickly search through billions of files/directories and get the required information without impacting the file system performance. Its multithreaded design allows it to scan through multiple directories in parallel and extract the metadata efficiently. BeeGFS Hive Index supports POSIX user permissions, directory tree attributes and hierarchy representation.

By default it will also extract BeeGFS specific metadata for regular files using BeeGFS ioctls. Additional ioctls on each regular file can put extra load on the file system during index creation and update. There is an option to disable extraction of BeeGFS specific metadata. If that option is disabled during index creation, no BeeGFS specific metdata will be extracted initially but user can enable the option during index update to collect BeeGFS specific metadata for newly modified/created files.

Only the root user will be allowed to create the index as it has to scan through the complete directory hierarchy, and it may have directories/files from different users. So, it’s recommended to create an index by the administrator. Having a single index directory per BeeGFS file system is better to keep it consistent with changing file system.

With different BeeGFS Hive Index versions, database schemas might change. The BeeGFS Hive Index Database schema can be upgraded or downgraded using the utility command “bee db-upgrade”. This utility uses SQL scripts to upgrade/downgrade index databases. Database versions are maintained using “PRAGMA user_version” SQL statement in SQLite3 database. The utility also allows the user to take backups and restore the original database files so that in case of errors, the user can rollback to the previous database version. Once the upgrade/downgrade is successful, the stale database files can be deleted.

As the index maintains the user permissions, the index can be shared by all users and administrators. Even with a shared index, users are allowed to query only their own data.

There are two methods to create the BeeGFS Hive Index

  1. In-Tree index within the BeeGFS file system and

  2. Out-Tree index at a different location (on another local file system)

In-Tree index

The index will be created inside the BeeGFS itself. Each subdirectory will have a .bdm.db database file containing metadata information specific to that subdirectory. Tables containing detailed metadata information about the files inside that directory (for example filename, timestamps, size, extended attributes, etc.).

The Disadvantages of creating an index inside BeeGFS are that it may interfere with on-going BeeGFS file system operations. If you want to keep the index data within BeeGFS, metadata scanning can be scheduled when there is little traffic on the file system.

../_images/hive_index_in-tree-index.png

Out-Of-Tree Index

An Out-Of-Tree index allows users to create a file system index at a different location, (for example a local file system or an NFS share)

Index creation process will create a directory hierarchy structure similar to the source directory at the chosen index location. Each directory stores metadata information about its entries (files and subdirectories). The index directory hierarchy maintains the user permissions of the source directory and the hierarchical structure allows querying metadata faster.

BeeGFS metadata could be spread across multiple metadata nodes and querying the file system for administrative tasks can generate a lot of network messages which can interfere with user’s important operations.

As the Indexed data is stored on a local file system, querying the indexed metadata becomes really fast, and it does not interfere with file system important operations.

../_images/hive_index_out-of-tree-index.png

In-Tree vs Out-Of-Tree Index

In-Tree Index:

Pros: No separate storage needed as index database files are stored inside the BeeGFS file system itself.

Cons: Filesystem performance can reduce due to Hive Index operations.

Out-Tree Index:

Pros: As index databases are stored outside the file system, running queries on the indexed data will not affect the on-going important file system operations.

Cons: Need dedicated space to store the indexed metadata of the file system.

BeeGFS Hive Index Database

BeeGFS Hive Index database (.bdm.db stored inside each subdirectory) holds three types of records as listed below:

  1. Entries Table: Stores detailed metadata information about files and

    links (filename, size, blks, mtime, ctime, atime, extended attributes, also BeeGFS specific metadata like stripe pattern, storage targets, metadata owner ID, the file’s entryID, the parent’s entryID, etc).

  2. Directory Summary Table: Stores information like total files inside

    the directory, total subdirectories inside the directory, total size of the directory (sum of all file’s sizes inside the directory) , etc.

  3. Tree Summary Table (optional): Stores a summary of the complete

    directory hierarchy. This information is present only at the top level directory. The table contains information like total files in the directory tree, total subdirectories, maxfilesize, minfilesize.

Step-By-Step Guide for BeeGFS Hive Index

Prerequisites

The following packages need to be installed on the node that runs BeeGFS Hive Index:

  • libpcre3

  • libpcre3-dev

  • SQLite

  • python3

  • python3-distutils

Installation

RHEL based systems

BeeGFS Hive Index repositories and packages are digitally signed. Add the public key to your package manager. This can be done as follows:

rpm --import https://www.beegfs.io/release/hive/gpg/GPG-KEY-hive

Download the BeeGFS Hive Index package repository file for your distribution from BeeGFS Hive Index package repository. eg. For RHEL 8 this can be done as follows.

wget https://www.beegfs.io/release/hive/dists/beegfs-hive-rhel8.repo -O /etc/yum.repos.d/beegfs_hive.repo

Install BeeGFS Hive Index package.

yum install beegfs-hive-index

Debian based systems

BeeGFS Hive Index repositories and packages are digitally signed. Add the public key to your package manager. This can be done as follows:

wget -q -O - https://www.beegfs.io/release/hive/gpg/GPG-KEY-hive | apt-key add -

Download the BeeGFS Hive Index package repository file for your distribution from BeeGFS Hive Index package repository. eg. For Debian 10 this can be done as follows.

wget https://www.beegfs.io/release/hive/dists/beegfs-hive-buster.list -O /etc/apt/sources.list.d/beegfs_hive.list
apt update

Install BeeGFS Hive Index package.

apt install beegfs-hive-index

Initial configuration

BeeGFS Hive Index commands use a configuration file which must be present at

/etc/beegfs/index/config

A sample configuration file will be installed during the package installation at /etc/beegfs/index/config.example which can be modified and renamed to /etc/beegfs/index/config

Following are the important parameters from the config file:

# Number of threads for running index operations
Threads=10

Number of threads to run the Hive Index commands. You should set this count as per the number of cores available on the system where the Hive Index will be created.

# absolute path to bee executables
# single path string
Executable=/opt/beegfs

The absolute path where the BeeGFS Hive Index binaries are installed. Default path is /opt/beegfs.

# Source directory path:Index directory path
IndexPaths=/mnt/beegfs:/mnt/index

Colon separated path to BeeGFS Filesystem directory for which index will be created and absolute path for the index directory. (Filesystem Source directory path:Index directory path), for example IndexPaths=/mnt/beegfs:/work/index.

These two paths allow Hive Index commands to run without explicitly specifying absolute paths for source directory or Index directory.

If you want to create the index inside the file system directory itself, only specify the source directory path in IndexPaths, for example IndexPaths=/mnt/beegfs.

# Filesystem mount point path
MountPath=/mnt/beegfs

BeeGFS mount point.

# size of per-thread print buffers
OutputBuffer=4096

Size of print buffers to print outputs (default value is 4096)

BeeGFS Hive Index Utilities

All BeeGFS Hive Index operations can be performed through a single command called bee which is generally installed to /opt/beegfs/python/index but will be added to the standard PATH via symlink in /usr/bin.

Operations like creating an index, updating an index, listing the files from the index and finding files from the index are the subcommands to bee.

BeeGFS Hive Index Creation

You can create an index using In-Tree (Inside Filesystem) or Out-Of-Tree(Outside Filesystem) options as per the requirements.

For creating the index use the subcommand create-index.

Following are some of the important options to create-index.

-F : Filesystem directory path for which index will be created.

-I : Index directory path.
    If -I is not provided and the config file does not have an index directory path specified,
    the index will be created inside the filesystem directory itself.

-X : Maximum memory create-index operation should use e.g 8M, 12GB.
    The index creation process scans through the entire directory hierarchy.
    For really large filesystems, there can be a large number of files and directories
    and large directory depth.
    Users should set a limit on the memory that can be used during index creation to avoid memory
    pressure on other applications running on the same machine.

-n : Number of threads which will be started for scanning directory hierarchy
    and extracting and storing the metadata information. Specify this value looking
    at the number of cores available for use during index creation.

-B : Create index without extracting BeeGFS specific metadata. By default BeeGFS metadata will be
     extracted and stored in the entries table.

Note

Both these options (-F/-I) are optional and default values will be taken from the configuration file (/etc/beegfs/index/config)

See also

man bee-create-index

[root@admin-compute1~]# bee create-index --help

usage: bee [--help] [-X MAX_MEMORY_USAGE] [-F FS_PATH] [-I INDEX_PATH]
[-n NUM_THREADS] [-s] [-x] [-z MAX_LEVEL] [-S] [--version] [-C]

BeeGFS Hive Index version of create

optional arguments:

--help show this help message and exit

-X <MAX_MEMORY_USAGE> Max memory usage e.g 8M, 1G, 8GB, 16gb

-F <FS_PATH> File system path for which index will be created

-I <INDEX_PATH> Index directory path

-n <NUM_THREADS> Number of threads to create index

-s Create tree summary table along with other tables

-x Pull xattars from source

-z <MAX_LEVEL> Max level to go down

-S Create only tree summary table

--version, -v show program's version number and exit

-C print the number of scanned directories

-B Create index without extracting BeeGFS specific metadata

Creating an In-Tree index

To create the index inside the file system itself, don’t specify index path in the /etc/beegfs/index/config IndexPaths, for example:

IndexPaths=/mnt/beegfs

#Create index inside the filesystem

[root@admin-compute1~]# bee create-index

#If user wants to specifically mention the file system path

[root@admin-compute1~]# bee create-index -F /mnt/beegfs

#create in-tree index with tree summary table and memory limit 10GB and #4 threads

[root@admin-compute1~]# bee create-index -s -X 10GB -n 4

Creating an Out-Of-Tree Index

To create the index outside the file system, specify index path in the /etc/beegfs/index/config IndexPaths, for example

IndexPaths=/mnt/beegfs:/work/index

/work/index could be a local file system or NFS shared directory.

If the config file has both file system source directory path and index path mentioned, bee create-index command will pick up the paths from config file.

#Create index outside the filesystem

[root@admin-compute1~]# bee create-index

#If user wants to specifically mention the filesystem and index paths

[root@admin-compute1~]# bee create-index -F /mnt/beegfs -I /work/index

#create in-tree index with tree summary table and memory limit 10G and #4 threads

[root@admin-compute1~]# bee create-index -s -X 10G -n 4

bee update

Note

The changes to beefgs-event listener needed to interface with bee update will be available in BeeGFS releases from versions 7.3.3 and 7.2.9.

The bee update subcommand allows users to update the index directory with on-going operations. BeeGFS Hive Index uses modification events generated by BeeGFS metadata nodes to update the index directory. BeeGFS’s event listener captures the events generated on metadata nodes and sends it over the network to the update service.

The modification logs from BeeGFS have the file path on which the operation is performed. bee-update service takes advantage of these events and modifies the Index directory for the specific file(s) or directories only. This mechanism avoids scanning the file system entirely to update the Index directory. And the index directory remains up-to-date with file system modifications.

Event listener should be started on all the metadata nodes so that all file system operations can be captured and re-replayed on the index directory by bee update.

For bee update to work, there should be an index created first using bee create-index.

The bee update should run as a service to capture all on-going file system changes.

Users can run bee update from the command line or start it using systemctl.

To run using systemctl, update the configuration file /etc/beegfs/index/updateEnv.conf

# updateEnv.conf

CMD_NAME=update

SRC_PATH=-F /mnt/beegfs

IDX_PATH=-I /work/index

MNT_POINT=-M /mnt/beegfs

PORT=-p 9000

DEBUG_MODE=-V 1

NO_BEEGFS_METADATA=-B

The update service should be started first and then the event listeners on all meta nodes.

# systemctl start bee-update

Following are the parameters for starting the bee update from the command line.

[root@admin-compute1~]# bee update --help

usage: bee [--help] -F FS_PATH [-I INDEX_PATH] [-M MNT_PATH] [-p PORT_NUM] [-v DEBUG_MODE]

BeeGFS Hive Index version of update

optional arguments:

--help show this help message and exit

-F <FS_PATH> File system path for which index will be created

-I <INDEX_PATH> Index directory path

-M <MNT_PATH> File system mount point path

-p <PORT_NUM>, --port <PORT_NUM>  port number to connect with client

-V <DEBUG_MODE>, --verbose <DEBUG_MODE>   enable/disable bee update debugging by giving 1/0

-B <BEEGFS_MODE>  Update index without extracting BeeGFS specific metadata

BeeGFS Hive Index Create and Update

You can create and update an index using In-Tree (Inside Filesystem) or Out-Of-Tree(Outside Filesystem) options as per the requirements. BeeGFS specific metadata will be extracted by default and stored with other file system metadata. User can opt to not extract BeeGFS specific metadata during index creation and update.

To create and update the index, you can use the systemctl service named bee-index.

The bee-index service allows users to create and update the index directory with on-going operations. BeeGFS Hive Index uses modification events generated by BeeGFS metadata nodes to update the index directory. BeeGFS’s event listener captures the events generated on metadata nodes and sends it over the network to the bee-index service.

The modification logs from BeeGFS have the file path on which the operation is performed. bee-index service takes advantage of these events and modifies the Index directory for the specific file(s) or directories only. This mechanism avoids scanning the file system entirely to update the Index directory. And the index directory remains up-to-date with file system modifications.

Event listener should be started on all the metadata nodes so that all file system operations can be captured and re-replayed on the index directory by bee-index.

The bee-index should run as a service to capture all on-going file system changes.

To run using systemctl, update the configuration file /etc/beegfs/index/indexEnv.conf

# indexEnv.conf

CMD_NAME = index

SRC_PATH = -F "/mnt/beegfs"

IDX_PATH = -I "/work/index"

MNT_POINT = -M "/mnt/beegfs"

PORT = -p "9000"

DEBUG_MODE = -V "1"

NUM_THREADS = -n "8"

MAX_LEVEL = -z "10"

MEMORY_SIZE = -X "5G"

PULL_XATTERS = -x

PRINT_SCANNED_DIR_COUNT = -C

CREATE_TREE_SUM_WITH_OTHER_TABLES = -s

RUN_CREATE_INDEX = -k

RUN_UPDATE_INDEX = -U

NO_BEEGFS_METADATA = -B

The following options are important, essential and must be provided in the indexEnv.conf for bee-index:

-F : Filesystem directory path for which index will be created.

-I : Index directory path.
    If -I is not provided and the config file does not have an index directory path specified,
    the index will be created inside the filesystem directory itself.

-M  : Mount point path.
      File system mount point path.

-p/--port : port number to connect with client

-X : Maximum memory create-index operation should use e.g 8M, 12GB.
    The index creation process scans through the entire directory hierarchy.
    For really large filesystems, there can be a large number of files and directories
    and large directory depth.
    Users should set a limit on the memory that can be used during index creation to avoid memory
    pressure on other applications running on the same machine.

-n : Number of threads which will be started for scanning directory hierarchy
    and extracting and storing the metadata information. Specify this value looking
    at the number of cores available for use during index creation.

See also

man bee-index

The bee-index service should be started first and then the event listeners on all meta nodes.

# systemctl start bee-index

Following are the parameters for starting the bee index from the command line.

[root@admin-compute1~]# bee index --help

usage: bee [--help] [-X MAX_MEMORY_USAGE] [-F FS_PATH] [-I INDEX_PATH]
         [-n NUM_THREADS] [-s] [-x] [-z MAX_LEVEL] [-S] [--version] [-C]
         [-M MNT_PATH] [-p PORT_NUM] [-V DEBUG_MODE] [-k] [-U]

BeeGFS Hive Index version of create


optional arguments:

  --help, -h                show this help message and exit

  -X <MAX_MEMORY_USAGE>     Max memory usage e.g 8M, 1G, 8GB, 16gb

  -F <FS_PATH>              File system path for which index will be created

  -I <INDEX_PATH>           Index directory path

  -n <NUM_THREADS>          Number of threads to create index

  -z <MAX_LEVEL>            Max level to go down

  -M <MNT_PATH>             File system mount point path

  -p <PORT_NUM>, --port <PORT_NUM>
                            port number to connect with client

  -V <DEBUG_MODE>, --verbose <DEBUG_MODE>
                            enable/disable update debugging by giving 1/0

  -C                        print the number of scanned directories

  -s                        Create tree summary table along with other tables

  -x                        Pull xattars from source

  -k                        Run the create-index

  -U                        Run the update-index

  -B                        Create/Update index without BeeGFS specific metadata

  --version, -v             show program's version number and exit

beegfs-event-listener

Note

The changes to beefgs-event listener needed to interface with bee-update/ bee-index will be available in BeeGFS releases from versions 7.3.3 and 7.2.9.

Metadata services must be configured to send file system modification events to the event listener. The event listener(s) on the meta node(s) can be started from the command line or using systemctl.

Update the configuration file for event listener

/etc/beegfs/beegfs-eventlistener.conf

# beegfs-eventlistener.conf

clientAddr=xxx.xxx.xxx.xxx

fileEventLogTarget = /tmp/beegfslog

updatePort=9000

BEE_UPDATE_DEBUG=1

The admin can configure which operations to capture from event listener by updating BeeGFS client config file:

sysFileEventLogMask = flush,trunc,setattr,close,link-op,read

Also admin can configure metadata server and enable the event stream, by specifying a path for the UNIX socket.

For example:

sysFileEventLogTarget = unix:/tmp/beegfslog

For more information, please check the BeeGFS event listener documentation for the same

Once the bee-update/bee-index service is successfully started, start the event listener from all the meta nodes**. Starting event listener on all meta nodes is important otherwise index will not be updated correctly with the on-going operations. clientAddr is the IP address of the admin node on which index was created and bee-index service is running. And the updatePort is the port on which bee-index is listening for the file system updates.

# systemctl start beegfs-eventlistener

BeeGFS Hive Index Database Upgrade Utilities

The BeeGFS Hive Index database upgrade utility is designed to update the database schema to a specific BeeGFS Hive Index version. Only root users can upgrade the BeeGFS Hive Index database schema through SQL scripts. SQL scripts default config directory is “/opt/beegfs/db”. If there are multiple SQL scripts present in the script directory then the utility sorts all scripts in ascending order and executes one by one on each database file.

Steps to upgrade or downgrade database schema:

  1. Stop following services

    # systemctl stop bee-index.service
    # systemctl stop beegfs-eventlistener.service
    
  2. Install latest BeeGFS Hive Index released version

  3. Update configuration files from /etc/beegfs/index directory

    • Update configurations files: config, indexEnv.conf, updateEnv.conf

  4. Check the current database version

    # sqlite3 /mnt/index/.bdm.db
    sqlite> pragma user_version;
    1
    sqlite>
    
  5. This utility will upgrade or downgrade an SQLite3 database version incrementally.

  6. Examples of BeeGFS Hive Index upgrade utility

    • Upgrade the Hive Index database with backup database files:

      # bee db-upgrade -T "2" -I /mnt/index -b
      // Check the database version after upgrade
      # sqlite3 /mnt/index/.bdm.db
      sqlite> pragma user_version;
      2
      sqlite>
      
    • Downgrade the Hive Index database with backup database files:

      # bee db-upgrade -T "1" -I /mnt/index -b
      // Check the database version after downgrade
      # sqlite3 /mnt/index/.bdm.db
      sqlite> pragma user_version;
      1
      sqlite>
      
    • Delete all backed up database files from index directory path recursively.

      # bee db-upgrade -I /mnt/index -d
      
    • Restore all backed up database files from index directory path recursively.

      # bee db-upgrade -I /mnt/index -r
      
    • Perform BeeGFS Hive Index Database Upgrade/Downgrade operations in parallel using “-n max_threads”

      # bee db-upgrade -T "2" -I /mnt/index -b -n 8
      
    • For other options please check the man page db-upgrade

      # man bee db-upgrade
      
  7. Start the services

    # systemctl start beegfs-eventlistener.service
    # systemctl start bee-index.service
    

bee stats

The stats subcommand allows users to obtain file system statistics.

like the total number of files, directories or links in the directory hierarchy, files, directories, or links per level, maximum and minimum file sizes..

Note

bee stats should be run from Index directory or the Filesystem directory. We can also provide an absolute path of directory.

e.g. bee stats total-filecount /work/index/arch/alpha or bee stats total-filecount /mnt/beegfs/arch/alpha

Some of the important options are:

1. total-filecount: Get the total number of files under a directory. It reports the count per uid. But with –cumulative shows cumulative numbers rather than per uid.

2. total-linkcount: Similar to total-filecount, gets the total number of links under a directory.

[root@admin-compute1~]# bee stats total-filecount
6980 1301

[root@admin-compute1~]# bee stats total-linkcount
2301 29

3. dirs-per-level/files-per-level/links-per-level: Get the count of <type>s in each directory level per uid.

[root@admin-compute1~]# bee stats files-per-level
root 0 14
root 1 801
root 2 11182
root 3 17738

[root@admin-compute1~]# bee stats dirs-per-level
root 0 1
root 1 19
root 2 468
root 3 1104

For other options please check the man page bee-stats

See also

man bee-stats

bee find

The find subcommand can be used to find the files in an index directory hierarchy. find has very similar options to GNU find and allows users to get results by running queries over the index directory. If BeeGFS specific metadata is extracted during create or update index, that metadata can also be queried using find command. Hive’s find is way faster than running actual find commands over the file system.

Following are some of the examples for bee find:

# Get the list of files which are greater than 1GB in size

[root@admin-compute1~]#bee find -size=+1G
/mnt/beegfs/dataset0/user1/test1.txt
/mnt/beegfs/dataset0/application.txt
/mnt/beegfs/dataset0/user2/test.txt

# Get the list of files which are created within 24 hours.

[root@admin-compute1~]#bee find -ctime=-1
/mnt/beegfs/dataset0/file.txt

# Get the list of files which are not accessed from the last 10 days.

[root@admin-compute1~]#bee find -atime=-10
/mnt/beegfs/dataset2/file1.txt

Some important options to query BeeGFS specific metadata

# Reverse map the entryID to filename

[root@admin-compute1~]# bee find -entryID "0-660D2950-1"
/mnt/beegfs/linux.tar.gz
[root@admin-compute1~]#

# List files whose data is stored on given storage target

[root@admin-compute1~]# bee find -targetID "101"
/mnt/beegfs/newfile
/mnt/beegfs/file3.txt
/mnt/beegfs/linux.tar.gz
[root@admin-compute1~]#

# List files whose metadata is stored on give metadata node

[root@admin-compute1 index]# bee find -ownerID "1"

/mnt/beegfs/dataset0/stg1/test1.txt
/mnt/beegfs/dataset0/stg1/test2.txt
/mnt/beegfs/dataset0/nmv/data1
[root@admin-compute1 index]#

See also

man bee-find

bee ls

bee ls lists the index directory contents. It has similar options as standard ls commands. bee ls can work with absolute paths and relative paths both. Run from index directory or file system directory By adding the --beegfs flag, users can print BeeGFS specific metadata for a file. if running with relative paths.

Following are some of the options for bee ls.

# Print the inode number of the files in a directory

[root@admin-compute1~]# bee ls -i user/data/
3224807882485797791 core.c
9553679804207926350 kfence.h
10232643267511233810 kfence_test.c
16487850910921490669 Makefile
348824457185005735 report.c

# Sort the files in a directory in descending order and print the size.

[root@admin-compute1~]# bee ls arch/alpha/boot/ -Ss
27 bootpz.c
13 stdio.c
12 bootp.c
9 misc.c

# List the files in a directory with sizes in human readable format.

[root@admin-compute1~]# bee ls -hls
1 -rw-rw-r-- 1 root root 496.0 Feb 11 08:26 COPYING
198 -rw-rw-r-- 1 root root 98.6K Feb 11 08:26 CREDITS
1 drwxrwxr-x 4 root root 144.0 Feb 11 08:26 data
1 drwxrwxr-x 81 root root 97.0 Feb 11 08:26 Documentation

List the files with BeeGFS specific metadata

-b or –beegfs: List the BeeGFS specific metadata for the regular files. It prints EntryID, ParentID, Metadata OwnerID, Chunk Size and Number of targets respectively.

[root@admin-compute1]# bee ls -b

4-66854319-2 0-668542DD-2 1 524288 5 analytics
1-66854319-2 0-668542DD-2 1 524288 2 edge_solution.txt
2-66854319-2 0-668542DD-2 1 524288 4 data.log

See also

man bee-ls

bee stat

bee stat displays file and directory metadata information. It has similar options as GNU stat. With a special --beegfs option, user can print BeeGFS specific metadata for a file.

bee stat can be run with the absolute path of the file system or index directory from anywhere. To use relative paths, bee stat commands should be run from inside a file system source directory or index directory.

[root@admin-compute1~]# bee stat README

File: '/dataset/index/README'
Size: 727 Blocks: 2 IO Block: 524288 regular file
Device: h/ d Inode: 8523084679231612220 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 0/ root) Gid: ( 0/ root)
Context:
Access: 2023-01-03 09:09:48 +0100
Modify: 2022-02-11 09:26:32 +0100
Change: 2023-01-03 09:09:48 +0100
Birth:

To include BeeGFS specific metadata, add the ``–beegfs`` flag.

[root@admin-compute1 scratch-work]# bee stat --beegfs system.log
path: system.log
Entry type: file
EntryID: 0-66854319-2
ParentID: 0-668542DD-2
Metadata Owner ID: 2
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 524288
+ Number of storage targets: 4
 + Target ID 501
 + Target ID 601
 + Target ID 701
 + Target ID 801

See also

man bee-stat

bee query

bee query subcommand allows users to execute SQL queries directly over the index database. This command can be useful if users want to get specific metadata information which is not accessible using the Hive Index commands.

[root@admin-compute1~]# bee query-index --help

usage: bee [--help] [-I DB_PATH] [-s SQL_QUERY]

BeeGFS Hive Index version of query-index

optional arguments:

--help show this help message and exit

-I <DB_PATH> Index directory path

-s <SQL_QUERY> Provide sql query

List the database entries for files inside net directory

[root@admin-compute1~]# bee query-index -I . -s "select * from entries"
|id|name|type|inode|mode|nlink|uid|gid|size|blksize|blocks|atime|mtime|ctime|linkname|xattrs|crtime|
ossint1|ossint2|ossint3|ossint4|osstext1|osstext2|pinode|ownerID|entryID|parentID|entryType|featureFlag|
stripe_pattern_type|chunk_size|num_targets|target_info|

1|devres|f|11002537685141370162|33188|1|0|0|25560551|524288|49923|1720521953|1720521953|1720521953|||0|0|0|0|0|||
140174793085336|1|7D8-668D14D1-2|0-668D1457-1|2|3|1|524288|2|1:2:|
2|README|f|16374403434744466135|33188|1|0|0|21258827|524288|41522|1720521953|1720521953|1720521953|||0|0|0|0|0|||
140174793085336|1|7D9-668D14D1-2|0-668D1457-1|2|3|1|524288|2|1:2:|
3|spec|f|17175560844328665268|33188|1|0|0|23669|524288|47|1720521954|1720521954|1720521954|||0|0|0|0|0|||
140174793085336|1|90E-668D14D1-2|0-668D1457-1|2|3|1|524288|2|2:1:|

Get the summary information for net directory

[root@admin-compute1~]#bee query-index -I net/ -s "select * from summary"
|id|name|type|inode|mode|nlink|uid|gid|size|blksize|blocks|atime|mtime|ctime|linkname|xattrs|totfiles|totlinks|minuid|maxuid|
|mingid|maxgid|minsize|maxsize|totltk|totmtk|totltm|totmtm|totmtg|totmtt|totsize|minctime|maxctime|minmtime|maxmtime|minatime|
|maxatime|minblocks|maxblocks|totxattr|depth|mincrtime|maxcrtime|minossint1|maxossint1|totossint1|minossint2|maxossint2|
|totossint2|minossint3|maxossint3|totossint3|minossint4|maxossint4|totossint4|rectype|pinode|

|1|net|d|5920216113766933286|16893|72|0|0|76|524288|1|1672733464|1644567992|1672733467|||6|0|0|0|0|0|2250|89444|0|6|6|0|0|0|126954|1672733464|1672733467|
|1644567992|1644567992|1672733464|167273|3467|5|175|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|10872060183036577154|

See also

man bee-query-index

Limitations and Known Issues

  • Index is updated using modification events captured on metadata nodes by event-listener service. These events are sent over the network from metadata nodes to the bee-update service. Due to potential network disconnects, the bee-update process might miss some file system modification events which can cause the index to go out of sync.

  • The modification events from event listener provide the file/directory path and operation type but do not provide information about changed metadata. For example, for a truncate modification event, the new file size. bee-update needs to get the modified metadata again from the file system, i.e. via a stat() operation, to update the index.

Potential Future Enhancements

  • Modification events generated by a metadata node could also include the modified metadata e.g. file size, timestamps, etc. other than path name. With this metadata information, bee-update wouldn’t have to send a request to the file system to update the index.

  • Event listener could buffer modification events when it is not able to send them to bee-update immediately. This would solve the issues of potential loss of events in case of high load on the metas or network disconnects.