ESnet home page

ESnet Network Performance Knowledge Base

DOE Office of Science
Tutorials and Talks
Search
File Transfer Tools Summary

A note on using scp, sftp, and rsync

In a Unix environment scp, sftp, and rsync are commonly used to copy data between hosts. While these tools work fine in a local environment, they perform poorly on a WAN. The openssh versions of scp and sftp have a built in 1 MB buffer (previously only 64 KB in openssh older than version 4.7) that severely limits performance on a WAN. Even though rsync is not part of the openssh distribution, rsync typically uses ssh as transport and is therefore subject to the limitations imposed by the underlying ssh implementation. DO NOT USE THESE TOOLS if you need to transfer large data sets across a network path with a RTT of more than around 25ms.

A patch to fix this problem is available from the Pittsburgh Supercomputer Center. This patch makes it possible to optimize single stream performance on a WAN. However, to fully optimize bulk data transfers over a WAN we recommend using one of the parallel stream tools described below. More information on sftp tuning is described below as well.


Selecting a File Transfer Tool

When selecting a file transfer tool, one of the first things to decide is which security model you require. The basic set of options are:

  1. anonymous: (e.g.: FTP, HTTP) anyone can access the data
  2. simple password: (e.g.: FTP, HTTP) most sites no longer allow this method since the password can be easily captured
  3. password encrypted: (e.g.: bbcp, bbftp, GridFTP, FDT) control channel is encrypted, but data is unencrypted
  4. everything encrypted: (e.g.: scp, sftp, rsync over ssh, GridFTP, HTTPS-based web server) both control and data channels are encrypted

In general, most open science projects seem to prefer option #3. If you require option #4 over a WAN, the choice of tools that perform well over a WAN is limited to GridFTP with X509 keys, or possibly HPN-patched ssh tools (e.g. HPN-patched scp/sftp or rsync over HPN-patched ssh).

The other issue is whether or not you have the requirement and/or ability to set up a server. HTTP and FTP based tools require a system administrator to install a web or FTP server. Other tools such as bbcp and GridFTP only require an sshd server to be installed by the administrator, and everything else can be install as a normal user.

In order to obtain maximum throughput over a high-speed WAN, one needs use a file transfer tool that includes the following features:

  • Parallel data streams
  • Ability to set the TCP buffer size
    • Luckily this is becoming less important, as Linux, FreeBSD, OSX, and Windows Vista now all include "TCP buffer autotuning". Other OSes will likely follow. Note that is it still important to increase the maximum TCP buffer size even on a system that does TCP autotuning.

There are a large number of file transfer programs available, but unfortunately almost none of them provide both of these features.

The following are some commonly used tools that do provide these features.

GridFTP: part of the Globus Toolkit.

    Sample command:

    globus-url-copy -p 4 -tcp-bs 16M sshftp://data.lbl.gov/home/mydata/myfile file://home/mydir/myfile
        

    To install GridFTP with ssh support, see our Quick Start Guide. There is also a Microsoft .NET version of the GridFTP client and server available from the University of Virginia, and a Firefox extension that uses GridFTP from the University of Delaware.

FDT: Java-based tool from Caltech

    Sample command:

    java -jar fdt.jar [ OPTIONS ] [[[user@][host1:]]file1 [[[user@][host2:]]file2
        

bbftp: from the Babar Project

    Sample command:

    bbftp -p 4 -e 'setrecvwinsize 1024; setsendwinsize 1024; put myfile' -E '/usr/local/bin/bbftpd -s' remotehost
        

bbcp: from SLAC

    Sample command:

    bbcp -P 4 -v -w 2M myfile remotehost:filename
    

    More info on using bbcp is available from Caltech.

nuttscp: from NRL

This is a simple perl script wrapper that uses ssh and the nuttcp tool to copy files, and can achieve very high throughput.

    Sample command:

    nuttscp -v -N 4 -l 256K -f /mydir/myfile remotehost:/data1/mydir/myfile.out 
    

The remaining tools described on this page do not provide a way to set the TCP buffer size. However, you use a recent OS with TCP autotuning, or if you transfer enough files in parallel, or break a single file into enough parallel streams, one can still obtain very good total throughput using the tools described below. Also, most of the tools below assume the files to be copied reside on an HTTP or FTP server.

Download Managers:

Other MS Windows Tools:

  • filezilla: Supports FTP transfer of multiple files in parallel

Other Unix Tools:

  • sftp: Secure file transfer program.
  • As described above, don't even consider using this program for WAN transfers unless you have installed the HPN patch from PSC . But even with the patch, SFTP has the annoying characteristic of layering yet another flow control mechanism on top of everything else. By default sftp limits the total number of outstanding messages to 16 32KB messages. Since each datagram is a distinct message you end up with a 512KB outstanding data limit. You can increase both the number of outstanding messages ('-R') and the size of the message ('-B') from the command line though.

    Sample command for a 128MB window:

    sftp -R 512 -B 262144 user@host:/path/to/file outfile
    
  • lftp: Supports parallel file transfer, socket tuning, HTTP transfers, and more.
  • Sample command:

    lftp -e 'set net:socket-buffer 4000000; pget -n 4 http://site/path/file; quit'
    lftp -e 'set net:socket-buffer 4000000; pget -n 4 ftp://site/path/file; quit'
    
  • axel: simple parallel accelerator for HTTP and FTP.
  • Sample command:

    axel -n 4 http://site/file
    axel -n 4 ftp://site/file
    

    Other useful command line tools for Unix/OSX include curl and wget.

Commercial Tools:

  • Aspera sells a UDP-based solution that does a good job utilizing all available bandwidth on congested, high latency network paths up to 1 Gbps.
  • Data Expedition sells a set of tools based on their "Multipurpose Transaction Protocol" (MTP/IP), a UDP-based protocol that uses a data-pull model, and works well on congested links, satellite links, as well as high-speed networks up to and beyond 1 Gbps. They also provide an SDK that allows one to integrate MTP-based data transfers into custom applications.

Special Purpose Tools:

  • hsi/htar, pftp: HPSS client tools:
  • Sample command:

    hsi put local_file : hpss_file
    hsi get hpss_file
    
    The number of parallel streams you use is determined by which HPSS "class of service" you are using. To adjust the TCP buffer size, look for SendSpace/RecvSpace in your hpss.conf file: For example:
    Network Options = {
       Default = {
          NetMask = 255.255.0.0
          RFC1323 = 1
          SendSpace = 4MB
          RecvSpace = 4MB
          WriteSize = 128KB
          TcpNoDelay = 1
       }
    }
    

    Note that in general pftp is faster than hsi/htar. For best performance across a WAN, use the GridFTP interface to HPSS at sites where it is available.


© 2008-2010, ESnet

Privacy and Security Notice