ESnet Data Transfer Nodes
Running tests to ESnet Data Transfer Nodes (DTNs)
ESnet has deployed a set of test hosts for high-speed disk-to-disk testing. Anyone on an R&E network anywhere in the world can use these hosts for anonymous GridFTP access. These hosts are capable saturating a 25+Gbps network reading from disk. These hosts are an example of a "Data Transfer Node", or DTN, for use in a Science DMZ.
Some features of the DTNs are:
- Over 8Gbps from a single filesystem namespace onto the network
- A range of test files and directory structures
- One data directory (currently /storage/data1/gridftp or /data1 - they point to the same place)
The test 100G DTN hosts are below and are hyperlinked directly to the Globus File Manager:
- star-dtn1.es.net : Chicago, IL, USA
- wash-dtn1.es.net : Washington, DC, USA
- sunn-dtn1.es.net : Sunnyvale, CA, USA
The future 100G DTN hosts are:
- denv-dtn1.es.net : Denver, CO, USA
- hous-dtn1.es.net : Houston, TX, USA
- cern-dtn1.es.net : Geneva, Switzerland
Globus Service Access
The test hosts are also available via the Globus data transfer service. They are configured for anonymous, read-only access.
100G Globus capable ESnet DTNs
- star-dtn1.es.net is registered as the endpoint ESnet Read-Only Test DTN at Starlight, Chicago, IL
- wash-dtn1.es.net is registered as the endpoint ESnet Read-Only Test DTN at Washington, DC
- sunn-dtn1.es.net is registered as the endpoint ESnet Read-Only Test DTN at Sunnyvale, CA
Test data sets
Each host has a high-performance disk array, mounted as /data1. The following test files are available on each server, and are generated using "/dev/urandom" (the size is what you would expect from reading the filename):
/data1/1M.dat, /data1/10M.dat, /data1/50M.dat, /data1/100M.dat,
/data1/1G.dat, /data1/10G.dat, /data1/50G.dat, /data1/100G.dat, /data1/500G.dat
In addition, there are currently several data sets composed of multiple files in a directory structure. These data sets are for testing multi-file transfers. The data sets each contain directories a through y. Each of these directories contains directories a through y. Each leaf directory contains data files named for their place in the directory structure. So, a-a-1M.dat is a 1,000,000 byte data file in the data set with path 5GB-in-small-files/a/a/a-a-1M.dat.
The structure and composition of the test data sets is designed so that the relative impact of metadata operations (e.g. file and directory creation) and data transfer (moving the data file contents) can be measured. The directory structures are identical, but the file sizes vary. This means that one can transfer the 5MB-in-tiny-files directory structure to measure the transfer overhead of file and directory creation performance, and one of the larger structures to measure the increase in data transfer time which is due to the increased file size.
The test data sets are:
/data1/5MB-in-tiny-files - 1KB, 2KB, and 5KB files in each leaf directory
/data1/5GB-in-small-files - 1MB, 2MB, and 5MB files in each leaf directory /data1/50GB-in-medium-files - 10MB, 20MB, and 50MB files in each leaf directory /data1/500GB-in-large-files - 100MB, 200MB, and 500MB files in each leaf directory
There are also four directories containing climate model data of different file sizes. Each data set is about 245GB in total size. These data sets are intended for use by the ICNWG member sites, though others are welcome to use them as well. The Climate-Small data set has an internal directory structure, where the other three do not (Climate-Small is a portion of the CORDEX data set, with its internal directory structure left intact and pruned to be about 245GB in size). The climate data set composition is as follows:
/data1/Climate-Huge - two files, each ~120GB /data1/Climate-Large - 10 files, each 21.5GB plus one 28.8GB file /data1/Climate-Medium - 117 files, ranging in size from 1.2GB to 6GB
/data1/Climate-Small - 1,496 files, ranging in size from 29MB to 425MB
It is important to make sure that your host is properly tuned for maximum TCP performance on the WAN. You should verify that htcp or cubic, and not reno, is the default TCP congestion control algorithm, and that the maximum TCP buffers are big enough for your paths of interest. For more information see the Host Tuning Section.
Sample Throughput Results
Running memory to memory tests between the ANL host and the BNL host using jumbo frames (MTU=9000 bytes) gives these results:
- 1 stream 1.0 GB/sec (8.0 Gbps)
- 4 streams, 1.1 GB/sec (8.8 Gbps)
- 1 stream UDT, 220 MB/sec (1.7 Gbps) (UDT is slower than TCP on a clean network, but will be faster than TCP on a network with packet loss)
Running disk to memory tests between the ANL host and the BNL host using jumbo frames (MTU=9000 bytes) gives only slightly lower results:
- 4 streams, 950 MB/sec (7.6 Gbps)
The hardware configuration for these hosts is available here.
Note: Sites with firewalls will need to open the ports used by GridFTP (443/2811 and 50000-51000). Depending on your firewall configuration you may need to set the environment variable GLOBUS_TCP_SOURCE_RANGE to 50000-51000 as well. For more information see the Globus firewall information guide.
These hosts also have IPV6 addresses you can use for testing:
For details on the hardware and/or software configuration of ESnet I/O test systems, email [email protected]