The US Atlas project installed perfSONAR measurement servers at a number of sites, and configured bwctl to run tests every few hours. After a couple of days they noticed that for the path from the University of Michigan to Brookhaven National Laboratory, performance varied from 50-80 Mbps, but it was expected that this path should be capable of supporting 800 Mbps flows. The path traversed 4 networks: ESnet, Internet2, BNL, and UMich; any of which might have been the source of the trouble.
Luckily there are several perfSONAR measurement hosts along the path, so it was easy to eliminate potential sources of trouble. Regular tests from bnl-pt1.es.net (Brookhaven) to chic-pt1.es.net (Chicago) showed no problems. bnl-pt1.es.net to lhcmon.bnl.gov also showed no problems. However psum02.aglt2.org (Michigan) to chic-pt1.es.net showed that something was wrong with this segment of the path.
This problem was not an easy one to find. There were no error counters incrementally tabulating errors for this. It turns out that the Cisco Express Forwarding (in a Cisco 6509) had an IPv4 fault status, probably due to a routing table overflow. A hard reset of this switch fixed the problem, and performance went to 900 Mbps, as shown in this plot.