Remote Storage Targets¶
Introduction¶
By default all files in BeeGFS are striped across one or more storage targets using a proprietary chunk file format optimized for high performance parallel access. BeeGFS also supports synchronizing files with one or more Remote Storage Targets. Any S3-compatible external storage provider can be used as a remote target including on-premises solutions or cloud providers. In addition to core file system services (Management, Metadata, and Storage) using Remote Storage Targets requires deploying an additional Remote service that coordinates synchronizing files using one or more Sync services.
The Remote and Sync services can be deployed on the same physical server as other core file system services, or on dedicated servers, depending on the available hardware and requirements of a particular environment. This allows remote targets to be added to existing BeeGFS deployments where the servers running core services may not have been sized with the remote targets feature in mind. This also makes it possible to avoid exposing the servers running core BeeGFS services directly to the internet, and treat the Remote/Sync servers as “gateway” nodes.
┌────────────────┐
│ Remote Service ◄───┐
┌──────────────┐ └────────▲───────┘ │
│ │ │ │ ┌─────────────┐
│ BeeGFS ◄────────────┤ ├──►│ S3 Provider │
│ │ │ │ └─────────────┘
└──────────────┘ ┌────────▼───────┐ │
│ Sync Service ◄───┘
└────────────────┘
Files and directories in BeeGFS can be associated with one or more Remote Storage Targets. The
remote target configuration for each entry is stored by the BeeGFS metadata service alongside all
the other configuration for the entry. The beegfs
command-line tool is used to synchronize
entries with a remote target and can be used to push (upload) entries or pull (download) entries.
Limitations and Known Issues¶
Because BeeGFS Remote uses the file size to determine how to split files into multiple work requests, when the underlying file system for storage targets is ZFS the
beegfs entry refresh
command must be run against new/modified files before usingbeegfs remote push
. This is because ZFS does not immediately update file sizes when files are closed which can lead to discrepancies between the size recorded in the BeeGFS Metadata server and the actual file size.
Requirements¶
Capacity¶
When a path in BeeGFS needs to be synchronized, a job is created in the Remote service. Depending on the size of the file one or more work requests will generated and assigned to the Sync service(s) that handle actually moving around data. The system is designed so both the Remote and Sync services can be restarted and resume synchronizing data from where they left off while minimizing the amount of data that must be retransferred.
The Remote service needs to store information about pending, active, and historical jobs on a per-path basis, but until a path is synchronized it will not have an entry in the Remote database. While capacity requirements vary depending on the job, generally jobs only require a few kiB each, and the number of historical jobs retained for each path is capped by default at four. This means the capacity requirements for a file system containing 1 billion files retaining at most four jobs for each file and jobs averaging 3KB would require ~11TiB. Note this is a rough worst case estimate and the actual requirements would likely be less due to compression.
The Sync service only needs to store information about work requests it is currently assigned thus requires significantly less space. However this also makes the capacity requirements harder to estimate since they depend on the number of active jobs that may be present at any one time. For example a system synchronizing tens of thousands of files concurrently generally only require 200-300MiB, but to allow for bursts of activity where potentially millions of files are being synchronized it would be better to allocate 100-200GiB per Sync service. Note that again this is a rough worst case estimate and the actual requirements would likely be less.
Getting Started¶
Prerequisites¶
On all servers that will run Remote or Sync services:
Add the BeeGFS package repositories to your package manager.
Configure TLS. The exact steps will depend how you choose to configure TLS for the BeeGFS management service.
Unless you opted to disable TLS or want to use different TLS certificates for Remote/Sync services, you could just copy the same certificate and key files used by the management service and install them under
/etc/beegfs
on all Remote/Sync servers. This is what the default configuration files expect and is the easiest way to get TLS configured uniformly.
Follow the steps in the quick start or manual installation guides to install and configure the client. Before starting/enabling the client service or mounting the client (via fstab/mount), ensure the following require client configuration is in place for all Remote+Sync nodes:
Starting in BeeGFS 8.1.0 the Remote and Sync services use the BeeGFS Data Management API to establish exclusive read-write access to files being synced, and manage the contents of offloaded (stub) files. This requires setting
sysBypassFileAccessCheckOnMeta=true
in/etc/beegfs/beegfs-client.conf
before runningsystemctl enable --now beegfs-client
.
Important
The Remote and Sync services use a client mount that allows access to files locked by the BeeGFS Data Management API. Ensure not to use these client mounts with other applications, otherwise unintended data access may happen.
Install/Configure the Sync Service(s)¶
On all servers that will run Sync service:
Install the
beegfs-sync
package using your package manager.For example:
dnf install beegfs-sync
.
Most of the service’s configuration is centrally managed by the Remote service and automatically pushed to all Sync servers. This means the default
/etc/beegfs/beegfs-sync.toml
file may not need to be modified, however there are some settings to be aware of (see the default file for more details):The
[server]
section controls how the Remote service connects to the Sync service. By default the service listens on all available addresses using port 9011, but you can optionally limit whataddress
the service listens on, and the TLS configuration.The
[manager]
section is used to configure where the Sync service keeps track of active and pending work requests. It uses thejournal-db
to track and resume ongoing work requests after a restart, and thejob-db
to optimize looking up jobs and work requests.The
[remote]
section controls how the Sync service connects to the Remote service. Because the address is synchronized only the TLS configuration may need customization.
Start and enable the service to ensure it automatically restarts after a reboot:
systemctl enable --now beegfs-sync
.Verify the service finished startup and is serving gRPC requests:
journalctl -u beegfs-sync
.
Install/Configure the Remote Service¶
On a single server that will run the Remote service:
Install the
beegfs-remote
package using your package manager.For example:
dnf install beegfs-remote
.
Using your preferred text editor, edit the
/etc/beegfs/beegfs-remote.toml
file. Available settings are documented in the file, here is an overview of the minimum configuration required to run the Remote service:The
[management]
section controls how Remote connects to the management service. Update theaddress
to the IP address/hostname and port where the management service is listening for gRPC traffic. If you are using connection based authentication download the same shared secret as used for your BeeGFS management service to/etc/beegfs/conn.auth
(otherwise setauth-disable = true
). If needed adjust the TLS configuration used to connect to the management service.The
[server]
section controls how thebeegfs
tool and Sync services connect to the Remote service. By default the service listens on all available addresses using port 9010, but you can optionally limit whataddress
the service listens on, and the TLS configuration.The
[job]
section controls where the Remote service keeps track of historical, active, and pending data synchronization jobs for each path. This information is stored using BadgerDB, and the path where the database is stored can be customized withpath-db
.Add
[[worker]]
sections for each Sync node. Only the configuration for a single Sync node should be specified in each[[worker]]
section.Add
[[remote-storage-target]]
sections for each remote target. Example configuration for common S3 providers is documented in the default file. Note theid
is how entries in BeeGFS are associated with one or more remote targets and must be greater than zero and unique for each remote target.
Start and enable the service to ensure it automatically restarts after a reboot:
systemctl enable --now beegfs-remote
.Verify the service finished startup and was able to connect to all Sync nodes by running:
journalctl -u beegfs-remote
.If you now check the Sync node logs you should see “successfully applied new configuration”.
Using Remote Storage Targets¶
Warning
Uploading and downloading data using remote push/pull commands may incur charges depending on the storage provider used for remote targets. These charges typically include ingress/egress fees based on the amount of data transferred and the number of API requests made to the storage provider. While the Remote Storage Target feature is designed to minimize unnecessary requests, it is highly recommended that you first perform a test run with a small dataset that accurately represents your typical data. This will help you evaluate the potential cost of syncing data with a particular storage provider before transferring large amounts of data. In particular pulling from a remote target involves some overhead as we need to walk the storage provider to find matching remote files.
To interact with Remote Storage Targets use the beegfs
tool provided with the beegfs-tools
package. The beegfs
tool needs to know the Remote gRPC server address configured above, which
can be specified using --remote-addr=address:port
. You can also configure the Remote address by
setting the environment variable export BEEGFS_REMOTE_ADDR=address:port
, either once for the
current shell session or persistently using .bashrc
or equivalent for your shell.
Commands for interacting with remote targets can be found under beegfs remote
. For example to
get the list of available remote targets run: beegfs remote list
. Similar to storage pools,
remote targets can be configured on a per file or directory basis. When configured on a directory
they will be automatically inherited by entries created under that directory. Unlike storage pools,
remote targets can be updated on both file and directories at any time. Use the standard commands
for interacting with entries to also check and apply remote target configuration:
Inspect remote target configuration:
beegfs entry info <path>
Set remote target configuration:
beegfs entry set --remote-targets=<id>[,<id>]... <path>
The following sections outlines common tasks, but help is also available as part of the beegfs
tool by appending --help
to any command.
Push (Upload)¶
Warning
Running push will always overwrite existing remote file(s) with local file(s). Consider enabling snapshots or object versioning if files in the remote target may be updated outside BeeGFS Remote.
Once files have remote target(s) set they can be pushed (uploaded) to those target(s) by running:
beegfs remote push <path>
. If the specified <path>
is a directory then all entries under
that directory will be recursively pushed to whichever target(s) they are associated with. It is
also possible to perform a one-time push to a particular remote target by specifying the
--remote-target
flag when pushing files.
To push a single file to a specific remote target:
$ cd project152/ && touch newfile
$ beegfs remote push --remote-target=1 newfile
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
Optionally file contents can be offloaded from BeeGFS to a single target using the --stub-local
flag when pushing file(s). This will first ensure file(s) are synced with the Remote target then
truncate the contents replacing it with an internally used URL indicating the RST ID and remote path
that can be later used to pull the file contents back into BeeGFS. Files that are “offloaded” cannot
be opened for reading or writing:
$ beegfs remote push newfile --remote-target=1 --stub-local
Success: ✅ 0 synced | ☁️ 1 offloaded | 🔄 0 syncing | ⏳ 0 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ cat newfile
cat: newfile: Resource temporarily unavailable
$ echo "test" >> newfile
bash: newfile: Resource temporarily unavailable
If a file is already synced it will immediately show offloaded. If a sync is needed the status will first show scheduled then later offloaded.
The pull command (see below) is used to restore files and unblock regular client access if needed.
Pull (Download)¶
To pull (download) files from a remote target run: beegfs remote pull --remote-target=<id>
--remote-path=<string> <local-path>
. If <local-path>
does not already exist with a single
remote target then --remote-target
must be specified. If <local-path>
does not already exist
then --remote-path
must always be specified. The --remote-path
can be the name of a single
remote file, a prefix, or globbing pattern (e.g., project-01-*.txt
). If the specified remote
path exactly matches the name of a remote file only that file is pulled. If there is no exact match
the remote path is treated as a prefix and all matching remote files are pulled. Globbing patterns
will always pull multiple files if the pattern has multiple matches. In all cases if there are no
matching remote files the pull is automatically cancelled (see Investigating and Fixing Errors).
Note
The remote targets configured in each file’s BeeGFS metadata are always inherited from the parent directory. This means when new file(s) are pulled into BeeGFS they will not be automatically associated with the remote target they were pulled from unless that remote target is set on the directory where the files are being pulled.
To pull a single remote file “helloworld” as local file “world”:
$ beegfs remote pull --remote-target=1 --remote-path=helloworld world
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ beegfs remote status --remote-targets=1 world
Summary: found 1 entries | ✅ 1 synchronized | ☁️ 0 offloaded | ⚠️ 0 unsynchronized | ⭕ 0 not attempted | ⛔ 0 without remote targets | 🚫 0 not supported | 📂 0 directories
To verify a single existing file is in sync just run pull
:
$ beegfs remote pull --remote-target=1 world --verbose
OK PATH TARGET JOB_CREATED JOB_UPDATED START_FILE_MTIME END_FILE_MTIME JOB_ID REQUEST_TYPE STATE STATUS_MESSAGE
✅ /project152/world 1 2025-06-02T18:49:30Z 2025-06-02T18:49:32Z 2025-05-30T16:28:30Z 2025-05-30T16:28:30Z 35d7b1b5-b2fb-404a-af4b-83eab060b146 DOWNLOAD COMPLETED successfully completed job
Success: ✅ 1 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 0 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
If the remote file was updated and no longer in sync with the local file there would be an error an
the --overwrite
flag could be used to replace the local file with the remote one:
$ beegfs remote pull --remote-target=1 world --verbose
OK PATH TARGET JOB_CREATED JOB_UPDATED START_FILE_MTIME END_FILE_MTIME JOB_ID REQUEST_TYPE STATE STATUS_MESSAGE
🚫 /project152/world 1 2025-06-02T18:51:16Z 2025-06-02T18:51:16Z 1970-01-01T00:00:00Z 1970-01-01T00:00:00Z 8e2c8e40-baa2-4959-bb46-fdfcb40e108c DOWNLOAD CANCELLED | job failed precondition: failed to |
| prepare file state: download would |
| overwrite existing path but the ove |
| rwrite flag was not set: file alrea |
| dy exists |
Error: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 0 scheduled | ❌ 0 previous sync failure | ⚠️ 1 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ beegfs remote pull --remote-target=1 world --verbose --overwrite
OK PATH TARGET JOB_CREATED JOB_UPDATED START_FILE_MTIME END_FILE_MTIME JOB_ID REQUEST_TYPE STATE STATUS_MESSAGE
⏳ /project152/world 1 2025-06-02T18:52:51Z 2025-06-02T18:52:51Z 2025-06-02T18:51:10Z 1970-01-01T00:00:00Z 542adb8f-3fee-4b08-b714-11702c331118 DOWNLOAD SCHEDULED finished scheduling work requests
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
Note
When pull is specified without naming a --remote-path
then the remote file is determined based
on the last remote file that was synchronized with the local file. Here the remote file
“helloworld” would again be downloaded as the local file “/projects152/world”.
If the local file is later pushed back to this same target it will always use the local path inside BeeGFS as the remote path, and the association with the original remote path will be broken. This is intended to avoid accidentally overwriting unintended remote files. If the remote and local file paths are the same, existing remote files will always be overwritten by a push.
It is possible to push to a local file to a second remote target as “/projects152/world”, then later if the “helloworld” remote file on remote target 1 changed, sync those changed into BeeGFS and push the updated file to the second remote target.
To pull all files with the prefix “models/” and suffix “.json” into the current directory:
$ beegfs remote pull --remote-target=1 --remote-path=models/**/*.json .
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ tree
.
├── newfile
├── results
│ ├── accuracy_metrics.json
│ └── loss_metrics.json
├── results2
│ ├── accuracy_metrics.json
│ └── loss_metrics.json
└── world
2 directories, 6 files
Note
To verify multiple files are still in sync you can use wildcards and globbing patterns for the
local path, however they must be quoted (e.g., "**/*.json"
) to avoid expansion in the shell
ensuring they are passed to Remote which will schedule a background job that handles walking the
local file system and verifying each file is in sync. Once that job is complete you can use the
remote status
command (see below) to verify files are in sync. If files are out of sync the
local file will never be overwritten by the remote file unless the --overwrite
flag is set.
By default if the name of a remote file contains a slash implying a directory hierarchy, that
structure will be maintained when remote files are pulled into the current directory to avoid
conflicting file names. Optionally use the --flatten
flag to replace directory slashes with
underscores so all files are pulled into the current directory:
$ beegfs remote pull --remote-target=1 --remote-path=models/**/*.json . --flatten
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ tree
.
├── newfile
├── results
│ ├── accuracy_metrics.json
│ └── loss_metrics.json
├── results2
│ ├── accuracy_metrics.json
│ └── loss_metrics.json
├── results2_accuracy_metrics.json
├── results2_loss_metrics.json
├── results_accuracy_metrics.json
├── results_loss_metrics.json
└── world
2 directories, 10 files
Warning
The flatten mode pulls all remote files into the same directory, which means in BeeGFS they will all reside on the same metadata node. While BeeGFS does not impose a strict file-per-directory limit, in general large directories in Linux file systems can cause poor performance, especially depending on what hardware is used by the BeeGFS metadata nodes. It is always recommended to test what your hardware can actually support with gradually increasing numbers of files per directory.
It is also possible to use pull with the --stub-local
flag to recreate some remote file
structure locally without actually downloading any file contents. This makes it possible to then
browse the file structure inside BeeGFS and only pull the contents of select files.
$ beegfs remote pull --remote-target=1 --remote-path=unmirrored/linux-6.9-rc2/samples --stub-local .
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ beegfs remote status samples/ --recurse --remote-targets=1
Summary: found 323 entries | ✅ 0 synchronized | ☁️ 280 offloaded | ⚠️ 0 unsynchronized | ⭕ 0 not attempted | ⛔ 0 without remote targets | 🚫 0 not supported | 📂 43 directories
$ tree samples/ | head -10
samples/
├── acrn
│ ├── guest.ld
│ ├── Makefile
│ ├── payload.ld
│ └── vm-sample.c
├── auxdisplay
│ ├── cfag12864b-example.c
│ └── Makefile
├── binderfs
$ cat samples/acrn/Makefile
cat: samples/acrn/Makefile: Resource temporarily unavailable
To pull a stub file’s contents into BeeGFS it is not necessary to specify the --remote-path
:
$ beegfs remote pull samples/acrn/Makefile
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
$ cat samples/acrn/Makefile | head -1
# SPDX-License-Identifier: GPL-2.0
All stub files in a directory can be rehydrated by simply specifying the directory:
$ beegfs remote pull samples
Success: ✅ 0 synced | ☁️ 0 offloaded | 🔄 0 syncing | ⏳ 1 scheduled | ❌ 0 previous sync failure | ⚠️ 0 error starting sync | ⛔ 0 no remote target (ignored) | 🚫 0 not supported (ignored)
Status¶
When a single entry is pushed or pulled the command will return immediately if the file is already in sync. If the file needs to be synced or multiple entries are pushed or pulled at once, the command will create and schedule a “builder job” then return immediately. This builder job is scheduled to a sync node and runs in the background to walk the local file system (for a push) or remote storage target (for a pull) spawning sync jobs as needed based on the matching local/remote file(s). These sync jobs will in turn be queued behind other outstanding jobs and run asynchronously in the background to verify synchronization status and sync each file if needed.
To check the synchronization status use beegfs remote status <path>
where path is a single file,
globbing pattern, or directory. When a single path is specified the --recurse
flag can also be
specified to recursively check the status of all files under that path.
OK PATH EXPLANATION
☁️ /project152/newfile Target 1: File contents are offloaded to this target.
✅ /project152/results2_accuracy_metrics.json Target 1: File is in sync based on the most recent job.
✅ /project152/results2_loss_metrics.json Target 1: File is in sync based on the most recent job.
✅ /project152/results_accuracy_metrics.json Target 1: File is in sync based on the most recent job.
✅ /project152/results_loss_metrics.json Target 1: File is in sync based on the most recent job.
✅ /project152/world Target 1: File is in sync based on the most recent job.
Summary: found 8 entries | ✅ 5 synchronized | ☁️ 1 offloaded | ⚠️ 0 unsynchronized | ⭕ 0 not attempted | ⛔ 0 without remote targets | 🚫 0 not supported | 📂 2 directories
Warning
The status command only checks if files in BeeGFS have been modified since the most recently
created job for each remote target in the Remote database. This means if the remote file was
modified this would not be reflected by simply running remote status
. If the remote file may
have changed you should always rerun push/pull depending if the local BeeGFS or remote target are
considered the “source of truth” then recheck status. If you are certain the remote file(s) have
not changed (or you have versioning enabled), you can avoid making requests to the remote target
(which may have an associated cost) and simply check the status based on the Remote database.
Investigating and Fixing Errors¶
There are two main problems a job might encounter that prevent if from completing successfully.
Jobs are cancelled if they fail some precondition that is necessary for the job to run at all. For example if a download would overwrite a local file but the overwrite flag was not set, or if an invalid local or remote path was specified. Because jobs are spawned asynchronously it is not always possible to warn the user interactively. In these scenarios a job is still created and associated with the relevant local path(s) to allow for subsequent troubleshooting, but the state is cancelled allowing for corrected job requests to be made without requiring the job to be manually cancelled.
Jobs are failed if they begin to run then encounter an unrecoverable error. Jobs in a failed state always require user intervention and generally indicate misconfiguration such as incorrect/expired RST credentials, or some transient issue such as the RST being unavailable temporarily. In these scenarios the user must manually cancel the job after correcting the issue to ensure any lingering resources (e.g., multipart uploads) or artifacts (e.g., stub files that need to be reset) from the original job are cancel up or reverted.
Note
In some cases corrective measures may require updating the remote target configuration on the Remote service, for example if the provided credentials have expired. To update remote target configuration it is currently required to first reconfigure/restart the Remote service then restart each of the Sync services.
When a failure occurs first use the beegfs remote status --debug <path>
command to identify
paths that have encountered sync issues. If the status
command does not return enough detail, or
if there is no associated path in BeeGFS (for example on a failed pull), inspect individual jobs for
a path with beegfs remote job list <path>
adding the --verbose
and potentially --debug
flags to get full details about individual work requests. Refer to the built-in --help
for
additional ways to query jobs for one or more paths.
Once you have identified the problem and taken any necessary corrective measures, cancel jobs for
all affected paths with beegfs remote job cancel
then retry the original push/pull operation.
FAQs¶
Does enabling Remote Storage Targets have any impact on file system performance?¶
When Remote Storage Targets are assigned to entries in BeeGFS an additional hidden extended attribute is created on the associated inode (or dentry for inlined inodes) that is used to store the list of targets and other RST specific configuration for a particular entry. This minimizes the impact enabling Remote functionality has on the core file system when remote targets are not in use, at the cost of incurring a slight performance penalty when performing certain metadata operations on entries with remote targets configured. For example you may observe slightly lower performance creating entries in a directory assigned to remote target(s) as there is an additional extended attribute that must be read from the parent directory and created for each child entry.
Tip
If you are concerned assigning remote targets to entries is impacting performance you can entirely
remove remote configuration from entries with beegfs entry set --remote-targets=none
.
While the performance impact is minimal, it is also possible to use all Remote functionality without
assigning remote targets to entries by simply specifying the target you want to interact with using
the --remote-target
flag when running remote
commands such as push, pull, and status. Even
if you have remote targets assigned to entries, if you know the remote target you want to interact
with, specifying the target will optimize command execution as it allows the beegfs
tool to skip
reading remote targets from each entries’ metadata.
Assigning remote targets to entries is helpful if you want to associate certain directory trees or
files with specific remote targets so you can simply use beegfs remote push
to ensure everything
is synced with their correct targets. If you are mostly syncing data to a single target or have a
few directories synced with different targets, you may find configuring remote targets on individual
entries is not needed. The feature is designed and intended to be used with or without configuring
remote targets on individual entries depending on the requirements of a particular environment.
What happens if the Remote database is lost?¶
The status of historical and active synchronizations will be lost. Resources created by sync jobs that were active when the database was lost will need to be manually cleaned up, notably multi-part uploads or partially downloaded files. Files actively being synced will remain write locked and need to be manually unlocked (see Data Management API).
If the Sync databases are still intact it is possible to determine what files were being synced when the Remote database was lost, please contact ThinkParQ support for assistance. Before placing the Remote node back in service all Sync databases will need to also be deleted and manually recreated after extracting any forensic data needed to manually cleanup.
The Remote database will be automatically recreated when the Remote service is restarted. Missing Remote database entries will also be automatically recreated based on the actual state of the local and remote files whenever a push/pull is executed to prevent unnecessary syncing, however detailed information about the original work requests/results will no longer be available.
What happens if a Sync database is lost?¶
Any active work requests assigned to the node will be lost. Jobs associated with these requests will
forever show scheduled or running and need to be cancelled using the --force
flag and retried.
The Sync database will be automatically recreated when the Sync service is restarted.