Sample Network Issues Discovered Using PerfSONAR
ESnet is actively promoting the deployment of perfSONAR at sites that have performance-significant data transfers, as well as in the networks that are involved in the end-to-end path, in order to characterize each segment of the end-to-end path.
Our experience is that on almost all of the paths where perfSONAR has been deployed, perfSONAR has revealed previously undetected significant bandwidth limiting problems, many of which are relatively easily resolved after being identified. These are all in the category of "soft failures", where the network is up, but throughput on the path is 3-10x slower than expected.
In our experience the Internet is rife with such soft failures. The networking community is good at detecting hard failures, but not good at detecting soft failures. perfSONAR, specifically throughput testing, is very good at detecting soft failures. It is still difficult to pinpoint the exact cause of these failures, but the more measurement points that exist, the easier locating the problem becomes.
Here are some examples of the types of soft failure that we have discovered only after bringing up a perfSONAR-based measurement host and collecting a few days worth of active measurement data:
- multiple cases of bad fibers
- port-forwarding filter overloading router and causing packet drops
- under-powered firewalls
- router output buffer tuning needed
- previously unnoticed asymmetric routing causing poor performance
- under-powered host (doubled performance by switching to jumbo frames)
The EPOC project has written up a set of "Roadside Assistance Case Studies" that include use of perfSONAR.
We also have a set of historical case studies that used older versions of perfSONAR, but the concepts all still apply:
For those willing to submit a use case, please mail them to [email protected]