Application Tuning to Optimize International Astronomy Workflow from NERSC to LFI-DPC at INAF-OATs
Researchers were transferring data from the National Energy Research Scientific Computing Center (NERSC) to Italy’s National Institute for Astrophysics - Trieste Astronomical Observatory (INAF-OATs) in the framework of the Planck project, and reported that data transfers between the two facilities were taking too long to complete. Performance was poor enough that data transfers were taking months where they should have taken days.
The researchers contacted ESnet for assistance in evaluating their workflow. The network path was international, and crossed five administrative domains (NERSC, ESnet, GEANT, GARR, and INAF). Files were migrated between the facilities using the RSYNC application, running over a secure connection enabled by the SSH protocol. In practice the researchers were seeing network performance of approximately 400KBps (~3Mbps).
699205728 100% 389.66kB/s 0:29:12 (xfer#201, to-check=302/603)
706353440 100% 387.96kB/s 0:29:38 (xfer#202, to-check=301/603)
706442048 100% 388.94kB/s 0:29:33 (xfer#204, to-check=299/603)
For a dataset size of 3.3TB, these speeds would result in a transfer time of more than three months. When investigating workflow problems, there are many avenues that can be explored related to the software and hardware involved:
- Is the application involved capable of operating efficiently at high speeds over great distances?
- Is the underlying storage and processing hardware up to date, and properly tuned?
- Are the network components also tuned, and free of transmission errors?
Further inspection of the first component, the RSYNC/SSH application, revealed that the data transfer node at the European end did not have the HPN-SSH performance enhancements installed. This was the suggested first step, along with following the ESnet performance tuning guidelines for hosts and network devices.
Installation of the HPN-SSH modification resulted in an immediate performance improvement: speeds of 16MBps (128Mbps) were observed—an improvement of forty times the original. The improvement in workflow reduced the overall time required for data transfers to a mere 3 days.