When selecting a file transfer tool, one of the first things to decide is which security model you require.
The basic set of options are:
- anonymous: (e.g.: FTP, HTTP) anyone can access the data
- simple password: (e.g.: FTP, HTTP) most sites no longer allow this method since the password can be easily captured
- password encrypted: (e.g.: bbcp, bbftp, GridFTP) control channel is encrypted, but data is unencrypted
- everything encrypted: (e.g.: scp, sftp, GridFTP, HTTPS-based web server) both control and data channels are encrypted
In general, most open science projects seem to prefer option #3.
If you require option #4 over a WAN, the choice of tools that perform well over a WAN is limited to GridFTP with X509 keys.
scp/sftp perform very badly over a WAN due to
internal buffer limits.
The other issue is whether or not you have the requirement and/or ability to set up a server. HTTP and FTP
based tools require a system administrator to install a web or FTP server. Other tools such
as bbcp and GridFTP only require an sshd server to be installed by the administrator, and everything else
can be install as a normal user.
In order to obtain maximum throughput over a high-speed WAN, one needs use a file transfer
tool that includes the following features:
- Parallel data streams
- Ability to set the TCP buffer size
- Luckily this is becoming less important, as Linux and Windows Vista
now include "TCP buffer autotuning". Other OSes will likely follow.
Note that is it still important to
increase the maximum TCP buffer size
even on a system that does TCP autotuning.
There are a large number of
file transfer programs available, but unfortunately
almost none of them provide both of these features.
The following are some commonly used tools that do provide these features.
GridFTP: part of the
Globus Toolkit.
globus-url-copy -p 4 -tcp-bs 16M sshftp://data.lbl.gov/home/mydata/myfile file://home/mydir/myfile
To install GridFTP with ssh support, see our Quick Start Guide.
There is also a
Microsoft .NET version
of the GridFTP client and server available from the University of Virginia.
bbftp: from the Babar Project
bbftp -p 4 -e 'setrecvwinsize 1024; setsendwinsize 1024; put myfile' -E '/usr/local/bin/bbftpd -s' remotehost
bbcp: from SLAC
bbcp -P 4 -v -w 2M myfile remotehost:filename
More info on using bbcp is available from Caltech.
The remaining tools described on this page do not provide a way to set the TCP buffer size. However, if you
transfer enough files in parallel, or break a single file into enough parallel streams, one can still obtain
good total throughput using the tools described below. Also, most of the tools below assume the files
to be copied reside on an HTTP or FTP server.
Download Managers:
Download managers provide an easy-to-use GUI for downloading multiple files in parallel, for monitoring
download progress, and for pausing/restarting downloads. Most support uploads as well, and some such
as 'Free Download Manager' even support bittorent downloads.
Some recommended download managers include:
Wikipedia maintains a
comparison table of download managers.
Other MS Windows Tools:
- filezilla: Supports FTP transfer of multiple files in parallel
Other Unix Tools:
- lftp: Supports parallel file transfer, socket tuning, HTTP transfers, and more.
Sample command:
lftp -e 'set net:socket-buffer 4000000; pget -n 4 http://site/path/file; quit'
lftp -e 'set net:socket-buffer 4000000; pget -n 4 ftp://site/path/file; quit'
axel: simple parallel accelerator for HTTP and FTP.
Sample command:
axel -n 4 http://site/file
axel -n 4 ftp://site/file
Other useful command line tools for Unix/OSX include curl
and wget.
Special Purpose Tools:
hsi put local_file : hpss_file
hsi get hpss_file
The number of parallel streams you use is determined by which HPSS "class of service" you are using.
To adjust the TCP buffer size, look for SendSpace/RecvSpace in your hpss.conf file:
For example:
Network Options = {
Default = {
NetMask = 255.255.0.0
RFC1323 = 1
SendSpace = 4MB
RecvSpace = 4MB
WriteSize = 128KB
TcpNoDelay = 1
}
}
Note that in general pftp is faster than hsi/htar. For best performance across a WAN, use the
GridFTP interface to HPSS at sites where it is available.
The following are some commonly used tools that should be avoided:
-
scp / sftp: Do not use these tools on a WAN! They are built on top of libopenssl, which has a built in
64 KB buffer that severely limits performance on a WAN. A patch to fix this problem is available from the
Pittsburgh Supercomputer Center.