Menu

Cloud Storage Tools

October 15, 2025

Many options are available for researchers to transfer data into and out of their own or institutionally managed commercial cloud storage environments (e.g., AWS, Azure, GCP). These methods support the massive data transfer needs common across the national lab complex.  This is not a comprehensive list of all available tools, instead this lists some available options for different types of cloud data transfer needs.

Category Option Primary Use Case Recommended For
Managed & High-Throughput 1. Globus Connect Cloud Endpoints Large-scale, automated, secure, and reliable transfers. All users handling transfers >100 GB or requiring fault tolerance.
Command Line (CLI) 2. rclone A single tool for file transfers across multiple cloud and storage services. Users needing a unified, flexible CLI for diverse endpoints.
  3. Cloud Provider CLI Tools Scripted, direct-to-cloud transfers and workflow integration. Users comfortable with the command line and API-driven automation.
Graphical User Interface (GUI) 4. Cyberduck Simple, visual, and secure point-and-point file management. Users who prefer a desktop application and simpler transfers.
Programmatic 5. Cloud Provider SDKs Deep integration of data transfer into custom applications and services. Developers building custom scientific applications and web services.

1. Globus Connect Cloud Endpoints

Globus is the recommended platform for high-throughput, fault-tolerant, and secure data movement. Globus endpoints allow you to link your storage (HPC storage, on-premises lab storage, or personal machines) directly to cloud storage buckets.

Key Features & Benefits (Pros):

  • Reliability & Fault Tolerance: Automatically restarts interrupted transfers and verifies data integrity, ensuring your job completes even over long durations and across massive datasets.

  • Institutional Support: Institutions with Globus Connect Cloud endpoints will normally offer support for creating and monitoring connectivity.

  • Unified Access: Provides a single, simple interface (web, command line, or API) to move data securely between any registered endpoint.

Considerations (Cons):

  • Setup Requirement: Requires an institutionally managed Globus endpoints or connecting to an existing endpoint on your local system.

  • Authentication Flow: Requires an initial, federated login through your institutional credentials.

Additional Resources:

Resource Type Title/Description Link
Download Download Globus Connect Personal (for desktops/laptops) https://www.globus.org/globus-connect-personal
Informative Globus: Designed for Research Data Management https://www.globus.org/

2. rclone

rclone is an open-source command-line program that synchronizes files and directories to and from a large array of cloud storage providers, including AWS S3, Google Cloud Storage, and Azure Blob Storage.

Key Features & Benefits (Pros):

  • Versatility: Supports over 40 cloud storage platforms, providing a consistent interface regardless of the underlying cloud provider.

  • Flexibility: Offers commands for sync, copy, move, and crypt (encryption) functionality.

  • Simplicity: Uses a straightforward syntax that is often easier to learn than native cloud CLIs for basic copy operations.

Considerations (Cons):

  • Third-Party Tool: As a third-party tool, it may not immediately support the newest features or optimizations of a specific cloud provider.

  • Limited Support: Support for rclone is generally limited to configuration guidance.

Additional Resources:

Resource Type Title/Description Link
Download rclone Installation and Binary Downloads https://rclone.org/downloads/
Documentation rclone Configuration Guide for Cloud Providers https://rclone.org/docs/

3. Cloud Provider CLI Tools (AWS/GCP/Azure)

For users who prefer managing their transfers directly from the CLI or integrating data transfers directly into existing scripts and workflows, the native cloud provider CLIs (e.g., AWS S3 CLI, Google Cloud gsutil, or Azure AzCopy) offer powerful, direct-to-cloud capabilities.

Key Features & Benefits (Pros):

  • Native Integration: Seamlessly integrates with other cloud-native services using API calls.

  • Automation: Excellent for scheduled transfers or moving data as the final step in an automated data-processing pipeline.

  • Simplicity for Small Jobs: Very fast and simple to use for transfers under a few gigabytes.

Considerations (Cons):

  • Reliability for Very Large Jobs: Transfers of many terabytes may be more susceptible to network interruptions than Globus, potentially requiring manual retry logic in your scripts.

  • Authentication Complexity: Requires careful management of access keys, IAM roles, or service principals to maintain security compliance.

Additional Resources:

Resource Type Title/Description Link
Download AWS Command Line Interface (CLI) User Guide (Install) https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
Download Azure AzCopy Documentation and Download https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10
Documentation Google Cloud gsutil Documentation https://cloud.google.com/sdk/docs

4. Cyberduck (GUI Option)

Cyberduck is a popular cross-platform open-source file transfer client with an intuitive graphical user interface (GUI). It allows users to browse and transfer files using simple drag-and-drop actions.

Key Features & Benefits (Pros):

  • User-Friendly Interface: Excellent for non-developers or for smaller, ad-hoc transfers where a command line is not desirable.

  • Open Source & Free: A no-cost solution for simple file browsing and transfer needs.

  • Protocol Support: Supports a wide range of protocols, including connections to AWS S3, Google Cloud, and Azure.

Considerations (Cons):

  • Scale Limitation: Not suitable for massive, highly automated, or petabyte-scale transfers; performance is generally lower than optimized CLI tools or Globus.

  • Manual Process: Requires a user to be actively present to initiate and monitor the transfer.

Additional Resources:

Resource Type Title/Description Link
Download Download Cyberduck for Mac/Windows https://cyberduck.io/download/
Documentation Cyberduck Documentation Index https://docs.cyberduck.io/

5. Cloud Provider SDK Programming Kits

For developers building custom applications, web portals, or integrated scientific pipelines, the native Software Development Kits (SDKs) for each cloud provider (e.g., Python boto3 for AWS, azure-sdk-for-python for Azure, Cloud SDK for Google.) provide the most powerful and flexible programmatic control.

Key Features & Benefits (Pros):

  • Deepest Integration: Allows the embedding of data transfer logic directly into complex applications or microservices.

  • Custom Logic: Provides granular control over retries, multi-part uploads, security, and metadata tagging, all within a coding environment.

  • Cross-Platform: SDKs are available for major languages (Python, Java, Go, C#, etc.) for use in any environment.

Considerations (Cons):

  • Requires Programming Skill: Only suitable for users comfortable with software development and API interaction.

  • Requires Custom Optimization: While powerful, achieving maximum transfer throughput often requires the developer to implement custom parallelization and optimization logic.

Additional Resources:

Resource Type Title/Description Link
Documentation AWS SDK for Python (Boto3) Documentation https://aws.amazon.com/sdk-for-python/
Documentation Azure SDK for Python Documentation https://learn.microsoft.com/en-us/azure/developer/python/
Documentation Google Cloud Python Client Library https://cloud.google.com/sdk