Say No to scp/sftp
Why you should avoid scp/sftp over a WAN
In a Unix environment scp, sftp, and rsync are commonly used to copy data between hosts. While these tools work fine in a local environment, they perform poorly on a WAN. The openssh versions of scp and sftp have a built in 2 MB buffer that severely limits performance on a WAN. Even though rsync is not part of the openssh distribution, rsync typically uses ssh as transport and is therefore subject to the limitations imposed by the underlying ssh implementation. DO NOT USE THESE TOOLS if you need to transfer large data sets across a network path with a RTT 10ms or greater.
Also note that in 2022 the scp transfer protocol was deprecated by the openssh team, and scp is now basically a wrapper for sftp. Most experts recommend using sftp rather than scp, but we recommend avoiding both.
The following results are typical for a cross continent path: scp and sftp are more than 100x slower than single stream http, and parallel stream tools like Globus are faster yet.
Sample Results; RTT = 88 ms, network capacity = 100Gbps.
Tool | Throughput |
scp/sftp | 32 Mbps |
hpnscp/hpnsftp | 4.5 Gbps |
HTTP (e.g.:curl, wget, etc.) | 5.2 Gbps |
Globus, 4 streams | 9.5 Gbps (disk limited) |
Recommendations
One possible solution to this problem is HPN-SSH, which is is available from the Pittsburgh Supercomputer Center. HPN-SSH is a drop in replacement for scp and sftp, making it possible to optimize single stream performance on a WAN. However, to fully optimize bulk data transfers over a WAN we recommend using one of the parallel stream tools such as Globus.
Secure File Transfer (sftp) Tuning
By default, sftp limits the total number of outstanding messages to 64 32KB messages. You can increase both the number of outstanding messages ('-R') and the size of the message ('-B') from the command line though. However, we've only at best 20% improvement with these options.
Sample sftp command for a 256MB window:
sftp -R 128 -B 262144 user@host:/path/to/file outfile
Contact fasterdata@es.net if you have updates or corrections for information on fasterdata.