Many people think that data sets of 1 TeraByte are just too big
to move across the WAN, and resort to sending DVDs or USB drives. This
is no longer true. Moving a TeraByte between most large research
institutions in the US should only take around 8 hours. This assumes
an end-to-end path with a capacity of 1 Gbps or higher, and that only 1/3
of the capacity is used, leaving room for other users traffic.
This chart
shows the bandwidth requirements for various data set sizes and times.
If your network throughput is less than this, chances are that your hosts need tuning and/or
you are using the wrong file transfer tools. The purpose of this site is to
help you maximize your wide-area network bulk data transfer performance by tuning
the TCP settings for your end hosts and by using file transfer tools that are designed
to maximize network throughput.
Historically, wide-area bulk data transfer has been plagued by poor performance for
a variety of reasons. These include improper configuration of the sending
and receiving hosts, software design issues,
firewalls, and other factors. In most cases, however, large data sets
can be moved long distances using today's networks with minimal effort.
Most file transfer programs use the TCP protocol, and performance problems are often due to
a TCP window that is too small. The maximum congestion window is related to the amount of buffer space
that the kernel allocates for each socket, and most operating systems by default limit this buffer space to a value that is too small for today's high-speed networks.
To achieve maximum throughput, it is critical to use optimal TCP socket buffer sizes for the
link you are using. This means you must use a recent OS that supports TCP autotuning (or a file transfer tool that allows you to set the TCP buffer size), and that your end systems must be tuned to allow for large TCP socket buffers.
Another common technique to speed up file transfers
is to break the file into smaller pieces that are transferred in parallel. A number of tools
include the option to do parallel transfers. If you have a large number of files to
copy, you can do parallel transfers by copying several files at once (typically 4-5 is a good number to try). But in general it is more efficient to copy larger files than smaller files, so bundling multiple
small files into a single larger file using tar or zip is also recommended.
Following these steps will help ensure you are getting the best throughput possible.
-
Make sure that the client and server end hosts are
properly tuned
-
Determine your required and expected throughput
-
Use the right file transfer tool
If you follow these steps and are still getting are considerably less than your expected throughput, the problem may be related to packet loss or a firewall.
ESnet staff are experienced in solving TCP performance problems, and if a
user or project is having trouble moving the data to or from a DOE site
(especially if they are shipping tapes or disks to move data), they are welcome to contact
trouble@es.net for assistance.
These pages are under constant development. If you have suggestions for additions
or improvements, please email bltierney@es.net.