Say No to scp/sftp/rsync
Why you should avoid scp/sftp/rsync over a WAN
In a Unix environment SSH-based tools like scp, sftp, and rsync are commonly used to copy data between hosts. While these tools work fine in a local environment, they perform poorly on a WAN. The openssh versions of scp and sftp have a built in 2 MB buffer that severely limits performance on a WAN. Even though rsync is not part of the openssh distribution, rsync typically uses ssh as transport and is therefore subject to the limitations imposed by the underlying ssh implementation. DO NOT USE THESE TOOLS if you need to transfer large data sets across a network path with a RTT 10ms or greater.
The following results are typical for a cross continent path: scp and sftp are more than 100x slower than single stream http, and parallel stream tools like Globus are faster yet.
Sample Results; RTT = 88 ms, network capacity = 100Gbps.
| Tool | Throughput |
| scp/sftp/rsync | 32 Mbps |
| hpnscp/hpnsftp | 4.5 Gbps |
| HTTP (e.g.:curl, wget, etc.) | 5.2 Gbps |
| Globus, 4 streams | 9.5 Gbps (disk limited) |
Other Considerations
The SSH-based tools scp and sftp report success when the transport completes, but they do not perform file-level checksum verification to ensure the contents at the source and destination match. End-to-end checksums act as a digital fingerprint of file content; comparing them before and after transfer detects silent corruption from disk I/O, staging, or network issues. In contrast, Globus and tools like FTS perform automatic checksum validation as part of their transfer workflow, verifying that source and destination file checksums match and retrying or failing the transfer if they do not. This built-in integrity check, along with robust fault recovery capabilities, makes Globus more suitable for large or irreplaceable datasets than scp, which lacks these features without manual post-transfer checksum comparison. See this paper for more information. While rsync can perform checksums (use -c or --checksum) this feature is not enabled by default and further reduces the performance of rsync.
HPN-SSH
One possible solution to this problem is HPN-SSH, which is is available from the Pittsburgh Supercomputer Center. HPN-SSH is a drop in replacement for scp and sftp, making it possible to optimize single stream performance on a WAN. However, to fully optimize bulk data transfers over a WAN we recommend using one of the parallel stream tools such as Globus.
Secure File Transfer (sftp) Tuning
By default, sftp limits the total number of outstanding messages to 64 32KB messages. You can increase both the number of outstanding messages ('-R') and the size of the message ('-B') from the command line though. However, we've only at best 20% improvement with these options.
Sample sftp command for a 256MB window:
sftp -R 128 -B 262144 user@host:/path/to/file outfile
Contact fasterdata@es.net if you have updates or corrections for information on fasterdata.

