Remote Storage Targets

Introduction

By default all files in BeeGFS are striped across one or more storage targets using a proprietary chunk file format optimized for high performance parallel access. BeeGFS also supports synchronizing files with one or more Remote Storage Targets. Any S3-compatible external storage provider can be used as a remote target including on-premises solutions or cloud providers. In addition to core file system services (Management, Metadata, and Storage) using Remote Storage Targets requires deploying an additional Remote service that coordinates synchronizing files using one or more Sync services.

The Remote and Sync services can be deployed on the same physical server as other core file system services, or on dedicated servers, depending on the available hardware and requirements of a particular environment. This allows remote targets to be added to existing BeeGFS deployments where the servers running core services may not have been sized with the remote targets feature in mind. This also makes it possible to avoid exposing the servers running core BeeGFS services directly to the internet, and treat the Remote/Sync servers as “gateway” nodes.

                                              ┌────────────────┐
                                              │ Remote Service ◄───┐
                           ┌──────────────┐   └────────▲───────┘   │
                           │              │            │           │   ┌─────────────┐
                           │    BeeGFS    ◄────────────┤           ├──►│ S3 Provider │
                           │              │            │           │   └─────────────┘
                           └──────────────┘   ┌────────▼───────┐   │
                                              │  Sync Service  ◄───┘
                                              └────────────────┘

Files and directories in BeeGFS can be associated with one or more Remote Storage Targets. The remote target configuration for each entry is stored by the BeeGFS metadata service alongside all the other configuration for the entry. The beegfs command-line tool is used to synchronize entries with a remote target and can be used to push (upload) entries or pull (download) entries.

Limitations and Known Issues

  • Because BeeGFS Remote uses the file size to determine how to split files into multiple work requests, when the underlying file system for storage targets is ZFS the beegfs entry refresh command must be run against new/modified files before using beegfs remote push. This is because ZFS does not immediately update file sizes when files are closed which can lead to discrepancies between the size recorded in the BeeGFS Metadata server and the actual file size.

Requirements

Capacity

When a path in BeeGFS needs to be synchronized, a job is created in the Remote service. Depending on the size of the file one or more work requests will generated and assigned to the Sync service(s) that handle actually moving around data. The system is designed so both the Remote and Sync services can be restarted and resume synchronizing data from where they left off while minimizing the amount of data that must be retransferred.

The Remote service needs to store information about pending, active, and historical jobs on a per-path basis, but until a path is synchronized it will not have an entry in the Remote database. While capacity requirements vary depending on the job, generally jobs only require a few kiB each, and the number of historical jobs retained for each path is capped by default at four. This means the capacity requirements for a file system containing 1 billion files retaining at most four jobs for each file and jobs averaging 3KB would require ~11TiB. Note this is a rough worst case estimate and the actual requirements would likely be less due to compression.

The Sync service only needs to store information about work requests it is currently assigned thus requires significantly less space. However this also makes the capacity requirements harder to estimate since they depend on the number of active jobs that may be present at any one time. For example a system synchronizing tens of thousands of files concurrently generally only require 200-300MiB, but to allow for bursts of activity where potentially millions of files are being synchronized it would be better to allocate 100-200GiB per Sync service. Note that again this is a rough worst case estimate and the actual requirements would likely be less.

Getting Started

Prerequisites

On all servers that will run Remote or Sync services:

  1. Add the BeeGFS package repositories to your package manager.

  2. Configure TLS. The exact steps will depend how you choose to configure TLS for the BeeGFS management service.

    • Unless you opted to disable TLS or want to use different TLS certificates for Remote/Sync services, you could just copy the same certificate and key files used by the management service and install them under /etc/beegfs on all Remote/Sync servers. This is what the default configuration files expect and is the easiest way to get TLS configured uniformly.

  3. Follow the steps in the quick start or manual installation guides to install and configure the client. Before starting/enabling the client service or mounting the client (via fstab/mount), ensure the following require client configuration is in place for all Remote+Sync nodes:

  • Starting in BeeGFS 8.1.0 the Remote and Sync services use the BeeGFS Data Management API to establish exclusive read-write access to files being synced, and manage the contents of offloaded (stub) files. This requires setting sysBypassFileAccessCheckOnMeta=true in /etc/beegfs/beegfs-client.conf before running systemctl enable --now beegfs-client.

Important

The Remote and Sync services use a client mount that allows access to files locked by the BeeGFS Data Management API. Ensure not to use these client mounts with other applications, otherwise unintended data access may happen.

Install/Configure the Sync Service(s)

On all servers that will run Sync service:

  1. Install the beegfs-sync package using your package manager.

    • For example: dnf install beegfs-sync.

  2. Most of the service’s configuration is centrally managed by the Remote service and automatically pushed to all Sync servers. This means the default /etc/beegfs/beegfs-sync.toml file may not need to be modified, however there are some settings to be aware of (see the default file for more details):

    • The [server] section controls how the Remote service connects to the Sync service. By default the service listens on all available addresses using port 9011, but you can optionally limit what address the service listens on, and the TLS configuration.

    • The [manager] section is used to configure where the Sync service keeps track of active and pending work requests. It uses the journal-db to track and resume ongoing work requests after a restart, and the job-db to optimize looking up jobs and work requests.

    • The [remote] section controls how the Sync service connects to the Remote service. Because the address is synchronized only the TLS configuration may need customization.

  3. Start and enable the service to ensure it automatically restarts after a reboot: systemctl enable --now beegfs-sync.

  4. Verify the service finished startup and is serving gRPC requests: journalctl -u beegfs-sync.

Install/Configure the Remote Service

On a single server that will run the Remote service:

  1. Install the beegfs-remote package using your package manager.

    • For example: dnf install beegfs-remote.

  2. Using your preferred text editor, edit the /etc/beegfs/beegfs-remote.toml file. Available settings are documented in the file, here is an overview of the minimum configuration required to run the Remote service:

    • The [management] section controls how Remote connects to the management service. Update the address to the IP address/hostname and port where the management service is listening for gRPC traffic. If you are using connection based authentication download the same shared secret as used for your BeeGFS management service to /etc/beegfs/conn.auth (otherwise set auth-disable = true). If needed adjust the TLS configuration used to connect to the management service.

    • The [server] section controls how the beegfs tool and Sync services connect to the Remote service. By default the service listens on all available addresses using port 9010, but you can optionally limit what address the service listens on, and the TLS configuration.

    • The [job] section controls where the Remote service keeps track of historical, active, and pending data synchronization jobs for each path. This information is stored using BadgerDB, and the path where the database is stored can be customized with path-db.

    • Add [[worker]] sections for each Sync node. Only the configuration for a single Sync node should be specified in each [[worker]] section.

    • Add [[remote-storage-target]] sections for each remote target. Example configuration for common S3 providers is documented in the default file. Note the id is how entries in BeeGFS are associated with one or more remote targets and must be greater than zero and unique for each remote target.

  3. Start and enable the service to ensure it automatically restarts after a reboot: systemctl enable --now beegfs-remote.

  4. Verify the service finished startup and was able to connect to all Sync nodes by running: journalctl -u beegfs-remote.

    • If you now check the Sync node logs you should see “successfully applied new configuration”.

Using Remote Storage Targets

Warning

Uploading and downloading data using remote push/pull commands may incur charges depending on the storage provider used for remote targets. These charges typically include ingress/egress fees based on the amount of data transferred and the number of API requests made to the storage provider. While the Remote Storage Target feature is designed to minimize unnecessary requests, it is highly recommended that you first perform a test run with a small dataset that accurately represents your typical data. This will help you evaluate the potential cost of syncing data with a particular storage provider before transferring large amounts of data. In particular pulling from a remote target involves some overhead as we need to walk the storage provider to find matching remote files.

To interact with Remote Storage Targets use the beegfs tool provided with the beegfs-tools package. The beegfs tool needs to know the Remote gRPC server address configured above, which can be specified using --remote-addr=address:port. You can also configure the Remote address by setting the environment variable export BEEGFS_REMOTE_ADDR=address:port, either once for the current shell session or persistently using .bashrc or equivalent for your shell.

Commands for interacting with remote targets can be found under beegfs remote. For example to get the list of available remote targets run: beegfs remote list. Similar to storage pools, remote targets can be configured on a per file or directory basis. When configured on a directory they will be automatically inherited by entries created under that directory. Unlike storage pools, remote targets can be updated on both file and directories at any time. Use the standard commands for interacting with entries to also check and apply remote target configuration:

  • Inspect remote target configuration: beegfs entry info <path>

  • Set remote target configuration: beegfs entry set --remote-targets=<id>[,<id>]... <path>

    • It is also possible to set a remote target when pushing or pulling files using the --update and --remote-target flags together. The update flag will always overwrite any/all existing remote targets on the entries with the new one. At this time, only a single remote target can be specified when pushing or pulling a file.

The following sections outlines common tasks, but help is also available as part of the beegfs tool by appending --help to any command.

Push (Upload)

Warning

Running push will always overwrite existing remote file(s) with local file(s). Consider enabling snapshots or object versioning if files in the remote target may be updated outside BeeGFS Remote.

Once files have remote target(s) set they can be pushed (uploaded) to those target(s) by running: beegfs remote push <path>. If the specified <path> is a directory then all entries under that directory will be recursively pushed to whichever target(s) they are associated with. It is also possible to perform a one-time push to a particular remote target by specifying the --remote-target flag when pushing files.

To push a single file to a specific remote target:

$ cd project152/ && touch newfile
$ beegfs remote push --remote-target=1 newfile
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

Optionally file contents can be offloaded from BeeGFS to a single target using the --stub-local flag when pushing file(s). This will first ensure file(s) are synced with the Remote target then truncate the contents replacing it with an internally used URL indicating the RST ID and remote path that can be later used to pull the file contents back into BeeGFS. Files that are “offloaded” cannot be opened for reading or writing:

$ beegfs remote push newfile --remote-target=1 --stub-local
Success: ✅ 0 synced | ☁️ 1 offloaded | 🔄 0 syncing | ⏳ 0 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

$ cat newfile
cat: newfile: Resource temporarily unavailable

$ echo "test" >> newfile
bash: newfile: Resource temporarily unavailable

If a file is already synced it will immediately show offloaded. If a sync is needed the status will first show scheduled then later offloaded.

The pull command (see below) is used to restore files and unblock regular client access if needed.

Pull (Download)

To pull (download) files from a remote target run: beegfs remote pull --remote-target=<id> --remote-path=<string> <local-path>. If <local-path> does not already exist with a single remote target then --remote-target must be specified. If <local-path> does not already exist then --remote-path must always be specified. The --remote-path can be the name of a single remote file, a prefix, or globbing pattern (e.g., project-01-*.txt). If the specified remote path exactly matches the name of a remote file only that file is pulled. If there is no exact match the remote path is treated as a prefix and all matching remote files are pulled. Globbing patterns will always pull multiple files if the pattern has multiple matches. In all cases if there are no matching remote files the pull is automatically cancelled (see Investigating and Fixing Errors).

Note

The remote targets configured in each file’s BeeGFS metadata are always inherited from the parent directory. This means when new file(s) are pulled into BeeGFS they will not be automatically associated with the remote target they were pulled from unless that remote target is set on the directory where the files are being pulled.

To pull a single remote file “helloworld” as local file “world”:

$ beegfs remote pull --remote-target=1 --remote-path=helloworld world
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

$ beegfs remote status --remote-targets=1 world
Summary: found 1 entries | ✅ 1 synchronized | ☁️ 0 offloaded | ⚠️‌ 0 unsynchronized | ⭕ 0 not attempted | ⛔ 0 without remote targets | 🚫 0 not supported | 📂 0 directories

To verify a single existing file is in sync just run pull:

$ beegfs remote pull --remote-target=1 world --verbose
OK  PATH               TARGET  JOB_CREATED           JOB_UPDATED           START_FILE_MTIME      END_FILE_MTIME        JOB_ID                                REQUEST_TYPE  STATE      STATUS_MESSAGE
✅  /project152/world  1       2025-06-02T18:49:30Z  2025-06-02T18:49:32Z  2025-05-30T16:28:30Z  2025-05-30T16:28:30Z  35d7b1b5-b2fb-404a-af4b-83eab060b146  DOWNLOAD      COMPLETED  successfully completed job

Success: ✅ 1 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 0 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

If the remote file was updated and no longer in sync with the local file there would be an error an the --overwrite flag could be used to replace the local file with the remote one:

$ beegfs remote pull --remote-target=1 world --verbose
OK  PATH               TARGET  JOB_CREATED           JOB_UPDATED           START_FILE_MTIME      END_FILE_MTIME        JOB_ID                                REQUEST_TYPE  STATE      STATUS_MESSAGE
🚫  /project152/world  1       2025-06-02T18:51:16Z  2025-06-02T18:51:16Z  1970-01-01T00:00:00Z  1970-01-01T00:00:00Z  8e2c8e40-baa2-4959-bb46-fdfcb40e108c  DOWNLOAD      CANCELLED  | job failed precondition: failed to  |
                                                                                                                                                                                      | prepare file state: download would  |
                                                                                                                                                                                      | overwrite existing path but the ove |
                                                                                                                                                                                      | rwrite flag was not set: file alrea |
                                                                                                                                                                                      | dy exists                           |

Error: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 0 scheduled | ❌ 0 previous sync failure | ⚠️‌ 1 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

$ beegfs remote pull --remote-target=1 world --verbose --overwrite
OK  PATH               TARGET  JOB_CREATED           JOB_UPDATED           START_FILE_MTIME      END_FILE_MTIME        JOB_ID                                REQUEST_TYPE  STATE      STATUS_MESSAGE
⏳  /project152/world  1       2025-06-02T18:52:51Z  2025-06-02T18:52:51Z  2025-06-02T18:51:10Z  1970-01-01T00:00:00Z  542adb8f-3fee-4b08-b714-11702c331118  DOWNLOAD      SCHEDULED  finished scheduling work requests

Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

Note

When pull is specified without naming a --remote-path then the remote file is determined based on the last remote file that was synchronized with the local file. Here the remote file “helloworld” would again be downloaded as the local file “/projects152/world”.

If the local file is later pushed back to this same target it will always use the local path inside BeeGFS as the remote path, and the association with the original remote path will be broken. This is intended to avoid accidentally overwriting unintended remote files. If the remote and local file paths are the same, existing remote files will always be overwritten by a push.

It is possible to push to a local file to a second remote target as “/projects152/world”, then later if the “helloworld” remote file on remote target 1 changed, sync those changed into BeeGFS and push the updated file to the second remote target.

To pull all files with the prefix “models/” and suffix “.json” into the current directory:

$ beegfs remote pull --remote-target=1 --remote-path=models/**/*.json .
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ tree
.
├── newfile
├── results
│   ├── accuracy_metrics.json
│   └── loss_metrics.json
├── results2
│   ├── accuracy_metrics.json
│   └── loss_metrics.json
└── world

2 directories, 6 files

Note

To verify multiple files are still in sync you can use wildcards and globbing patterns for the local path, however they must be quoted (e.g., "**/*.json") to avoid expansion in the shell ensuring they are passed to Remote which will schedule a background job that handles walking the local file system and verifying each file is in sync. Once that job is complete you can use the remote status command (see below) to verify files are in sync. If files are out of sync the local file will never be overwritten by the remote file unless the --overwrite flag is set.

By default if the name of a remote file contains a slash implying a directory hierarchy, that structure will be maintained when remote files are pulled into the current directory to avoid conflicting file names. Optionally use the --flatten flag to replace directory slashes with underscores so all files are pulled into the current directory:

$ beegfs remote pull --remote-target=1 --remote-path=models/**/*.json . --flatten
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ tree
.
├── newfile
├── results
│   ├── accuracy_metrics.json
│   └── loss_metrics.json
├── results2
│   ├── accuracy_metrics.json
│   └── loss_metrics.json
├── results2_accuracy_metrics.json
├── results2_loss_metrics.json
├── results_accuracy_metrics.json
├── results_loss_metrics.json
└── world

2 directories, 10 files

Warning

The flatten mode pulls all remote files into the same directory, which means in BeeGFS they will all reside on the same metadata node. While BeeGFS does not impose a strict file-per-directory limit, in general large directories in Linux file systems can cause poor performance, especially depending on what hardware is used by the BeeGFS metadata nodes. It is always recommended to test what your hardware can actually support with gradually increasing numbers of files per directory.

It is also possible to use pull with the --stub-local flag to recreate some remote file structure locally without actually downloading any file contents. This makes it possible to then browse the file structure inside BeeGFS and only pull the contents of select files.

$ beegfs remote pull --remote-target=1 --remote-path=unmirrored/linux-6.9-rc2/samples --stub-local .
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

$ beegfs remote status samples/ --recurse --remote-targets=1
Summary: found 323 entries | ✅ 0 synchronized | ☁️‌ 280 offloaded | ⚠️‌ 0 unsynchronized | ⭕ 0 not attempted | ⛔ 0 without remote targets | 🚫 0 not supported | 📂 43 directories

$ tree samples/ | head -10
samples/
├── acrn
│   ├── guest.ld
│   ├── Makefile
│   ├── payload.ld
│   └── vm-sample.c
├── auxdisplay
│   ├── cfag12864b-example.c
│   └── Makefile
├── binderfs

$ cat samples/acrn/Makefile
cat: samples/acrn/Makefile: Resource temporarily unavailable

To pull a stub file’s contents into BeeGFS it is not necessary to specify the --remote-path:

$ beegfs remote pull samples/acrn/Makefile
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

$ cat samples/acrn/Makefile | head -1
# SPDX-License-Identifier: GPL-2.0

All stub files in a directory can be rehydrated by simply specifying the directory:

$ beegfs remote pull samples
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️‌ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)

Status

When a single entry is pushed or pulled the command will return immediately if the file is already in sync. If the file needs to be synced or multiple entries are pushed or pulled at once, the command will create and schedule a “builder job” then return immediately. This builder job is scheduled to a sync node and runs in the background to walk the local file system (for a push) or remote storage target (for a pull) spawning sync jobs as needed based on the matching local/remote file(s). These sync jobs will in turn be queued behind other outstanding jobs and run asynchronously in the background to verify synchronization status and sync each file if needed.

To check the synchronization status use beegfs remote status <path> where path is a single file, globbing pattern, or directory. When a single path is specified the --recurse flag can also be specified to recursively check the status of all files under that path.

OK  PATH                                        EXPLANATION
☁️  /project152/newfile                         Target 1: File contents are offloaded to this target.
✅  /project152/results2_accuracy_metrics.json  Target 1: File is in sync based on the most recent job.
✅  /project152/results2_loss_metrics.json      Target 1: File is in sync based on the most recent job.
✅  /project152/results_accuracy_metrics.json   Target 1: File is in sync based on the most recent job.
✅  /project152/results_loss_metrics.json       Target 1: File is in sync based on the most recent job.
✅  /project152/world                           Target 1: File is in sync based on the most recent job.

Summary: found 8 entries | ✅ 5 synchronized | ☁️ 1 offloaded | ⚠️‌ 0 unsynchronized | ⭕ 0 not attempted | ⛔ 0 without remote targets | 🚫 0 not supported | 📂 2 directories

Warning

By default, the status command only checks if files in BeeGFS have been modified since the most recently created job for each remote target in the Remote database. This means if the remote file was modified this would not be reflected by simply running remote status. If the remote file may have changed you should use the --verify-remote flag. If you are certain the remote file(s) have not changed (or you have versioning enabled), you can leave off this flag to avoid making requests to the remote target (which may have an associated cost) and simply check the status based on the Remote database.

Investigating and Fixing Errors

There are two main problems a job might encounter that prevent if from completing successfully.

Jobs are cancelled if they fail some precondition that is necessary for the job to run at all. For example if a download would overwrite a local file but the overwrite flag was not set, or if an invalid local or remote path was specified. Because jobs are spawned asynchronously it is not always possible to warn the user interactively. In these scenarios a job is still created and associated with the relevant local path(s) to allow for subsequent troubleshooting, but the state is cancelled allowing for corrected job requests to be made without requiring the job to be manually cancelled.

Jobs are failed if they begin to run then encounter an unrecoverable error. Jobs in a failed state always require user intervention and generally indicate misconfiguration such as incorrect/expired RST credentials, or some transient issue such as the RST being unavailable temporarily. In these scenarios the user must manually cancel the job after correcting the issue to ensure any lingering resources (e.g., multipart uploads) or artifacts (e.g., stub files that need to be reset) from the original job are cancel up or reverted.

Note

In some cases corrective measures may require updating the remote target configuration on the Remote service, for example if the provided credentials have expired. To update remote target configuration it is currently required to first reconfigure/restart the Remote service then restart each of the Sync services.

When a failure occurs first use the beegfs remote status --debug <path> command to identify paths that have encountered sync issues. If the status command does not return enough detail, or if there is no associated path in BeeGFS (for example on a failed pull), inspect individual jobs for a path with beegfs remote job list <path> adding the --verbose and potentially --debug flags to get full details about individual work requests. Refer to the built-in --help for additional ways to query jobs for one or more paths.

Once you have identified the problem and taken any necessary corrective measures, cancel jobs for all affected paths with beegfs remote job cancel then retry the original push/pull operation.

FAQs

Why does accessing files being show “resource temporarily unavailable” (EWOULDBLOCK)?

When a sync is started Remote ensures no existing clients are accessing the file and write locks the file to prevent clients from changing the file contents while a sync is underway. This ensures a complete version of the file is either pushed into the remote target or pulled into BeeGFS at the point in time the operation was triggered. If a race happened between when the push/pull was initiated and the file could be locked the job will be automatically cancelled.

When a sync is complete files will be unlocked, unless beegfs remote push was invoked with the --stub-local flag. In that scenario if the sync was successful the file contents are truncated and the file remains write locked even after the job is done to indicate the contents are not currently available in BeeGFS. The file size will not be zero bytes as the file contains a special URL pointing to the remote target and path where the file was offloaded. Users are not able to open stub files for reading or writing.

Does enabling Remote Storage Targets have any impact on file system performance?

When Remote Storage Targets are assigned to entries in BeeGFS an additional hidden extended attribute is created on the associated inode (or dentry for inlined inodes) that is used to store the list of targets and other RST specific configuration for a particular entry. This minimizes the impact enabling Remote functionality has on the core file system when remote targets are not in use, at the cost of incurring a slight performance penalty when performing certain metadata operations on entries with remote targets configured. For example you may observe slightly lower performance creating entries in a directory assigned to remote target(s) as there is an additional extended attribute that must be read from the parent directory and created for each child entry.

Tip

If you are concerned assigning remote targets to entries is impacting performance you can entirely remove remote configuration from entries with beegfs entry set --remote-targets=none.

While the performance impact is minimal, it is also possible to use all Remote functionality without assigning remote targets to entries by simply specifying the target you want to interact with using the --remote-target flag when running remote commands such as push, pull, and status. Even if you have remote targets assigned to entries, if you know the remote target you want to interact with, specifying the target will optimize command execution as it allows the beegfs tool to skip reading remote targets from each entries’ metadata.

Assigning remote targets to entries is helpful if you want to associate certain directory trees or files with specific remote targets so you can simply use beegfs remote push to ensure everything is synced with their correct targets. If you are mostly syncing data to a single target or have a few directories synced with different targets, you may find configuring remote targets on individual entries is not needed. The feature is designed and intended to be used with or without configuring remote targets on individual entries depending on the requirements of a particular environment.

What happens if the Remote database is lost?

The status of historical and active synchronizations will be lost. Resources created by sync jobs that were active when the database was lost will need to be manually cleaned up, notably multi-part uploads or partially downloaded files. Files actively being synced will remain write locked and need to be manually unlocked (see Data Management API).

If the Sync databases are still intact it is possible to determine what files were being synced when the Remote database was lost, please contact ThinkParQ support for assistance. Before placing the Remote node back in service all Sync databases will need to also be deleted and manually recreated after extracting any forensic data needed to manually cleanup.

The Remote database will be automatically recreated when the Remote service is restarted. Missing Remote database entries will also be automatically recreated based on the actual state of the local and remote files whenever a push/pull is executed to prevent unnecessary syncing, however detailed information about the original work requests/results will no longer be available.

What happens if a Sync database is lost?

Any active work requests assigned to the node will be lost. Jobs associated with these requests will forever show scheduled or running and need to be cancelled using the --force flag and retried. The Sync database will be automatically recreated when the Sync service is restarted.