Menu

Background

Historically, wide-area bulk data transfer has been plagued by poor performance for a variety of reasons. These include improper configuration of the sending and receiving hosts, software design issues, firewalls, and other factors. In most cases, however, large data sets can be moved long distances using today's networks with minimal effort.

A common technique to speed up file transfers is to break the file into smaller pieces that are transferred in parallel. A number of tools include the option to do parallel transfers. If you have a large number of files to copy, you can do parallel transfers by copying several files at once (typically 4-5 is a good number to try). But in general it is more efficient to copy larger files than smaller files, so bundling multiple small files into a single larger file using tar or zip is also recommended.

Selecting a File Transfer Tool

When selecting a file transfer tool, one of the first things to decide is which security model you require. The basic set of options are:

  1. anonymous: (e.g.: FTP, HTTP) anyone can access the data
  2. password encrypted: (e.g.: bbcp, Globus, FDT) control channel is encrypted, but data is unencrypted
  3. everything encrypted: (e.g.: scp, sftp, rsync over ssh, Globus, HTTPS-based web server) both control and data channels are encrypted

In general, most open science projects seem to prefer option #2. If you require option #3 over a WAN, we recommend Globus, or possibly HPN-patched ssh tools (e.g. HPN-patched scp/sftp or rsync over HPN-patched ssh).

The other issue is whether or not you have the requirement and/or ability to set up a server. HTTP and FTP based tools require a system administrator to install a web or FTP server. Other tools such as bbcp and FDT only require an sshd server to be installed by the administrator, and everything else can be install as a normal user. Globus is the only tool that does not require the user to install any software, as long as the endpoints (source and destination locations) are known within the system.

At this time we recommend  Globus as having the best combination of features and support.