Cloud Storage Tools
Many options are available for researchers to transfer data into and out of their own or institutionally managed commercial cloud storage environments (e.g., AWS, Azure, GCP). These methods support the massive data transfer needs common across the national lab complex. This is not a comprehensive list of all available tools, instead this lists some available options for different types of cloud data transfer needs.
Category | Option | Primary Use Case | Recommended For |
Managed & High-Throughput | 1. Globus Connect Cloud Endpoints | Large-scale, automated, secure, and reliable transfers. | All users handling transfers >100 GB or requiring fault tolerance. |
Command Line (CLI) | 2. rclone | A single tool for file transfers across multiple cloud and storage services. | Users needing a unified, flexible CLI for diverse endpoints. |
3. Cloud Provider CLI Tools | Scripted, direct-to-cloud transfers and workflow integration. | Users comfortable with the command line and API-driven automation. | |
Graphical User Interface (GUI) | 4. Cyberduck | Simple, visual, and secure point-and-point file management. | Users who prefer a desktop application and simpler transfers. |
Programmatic | 5. Cloud Provider SDKs | Deep integration of data transfer into custom applications and services. | Developers building custom scientific applications and web services. |
1. Globus Connect Cloud Endpoints
Globus is the recommended platform for high-throughput, fault-tolerant, and secure data movement. Globus endpoints allow you to link your storage (HPC storage, on-premises lab storage, or personal machines) directly to cloud storage buckets.
Key Features & Benefits (Pros):
-
Reliability & Fault Tolerance: Automatically restarts interrupted transfers and verifies data integrity, ensuring your job completes even over long durations and across massive datasets.
-
Institutional Support: Institutions with Globus Connect Cloud endpoints will normally offer support for creating and monitoring connectivity.
-
Unified Access: Provides a single, simple interface (web, command line, or API) to move data securely between any registered endpoint.
Considerations (Cons):
-
Setup Requirement: Requires an institutionally managed Globus endpoints or connecting to an existing endpoint on your local system.
-
Authentication Flow: Requires an initial, federated login through your institutional credentials.
Additional Resources:
2. rclone
rclone is an open-source command-line program that synchronizes files and directories to and from a large array of cloud storage providers, including AWS S3, Google Cloud Storage, and Azure Blob Storage.
Key Features & Benefits (Pros):
-
Versatility: Supports over 40 cloud storage platforms, providing a consistent interface regardless of the underlying cloud provider.
-
Flexibility: Offers commands for sync, copy, move, and crypt (encryption) functionality.
-
Simplicity: Uses a straightforward syntax that is often easier to learn than native cloud CLIs for basic copy operations.
Considerations (Cons):
-
Third-Party Tool: As a third-party tool, it may not immediately support the newest features or optimizations of a specific cloud provider.
-
Limited Support: Support for rclone is generally limited to configuration guidance.