While many of the causes of poor performance reside in end hosts, network devices can certainly cause problems of their own. Some of the problems caused by routers and switches are simple configuration errors, while other problems are caused by hardware limitations. In addition to these steps, please see the troubleshooting quickstart guide to assist in tuning efforts.
Ethernet Flow Control
Flow control allows a device receiving Ethernet frames to notify the sender that the receiver is having difficulty processing the frames it is being sent - this typically occurs when the receiver is temporarily overwhelmed and in danger of dropping packets. Flow control often helps avoid packet loss in high-speed environments, as illustrated by this example from ESnet's perfSONAR test infrastructure.
Buffer Size Requirements
There is a balance that must be considered when considering buffer size: e.g. is the primary use case short lived and small flows, or long lived large flows? Will the box be aggregating switch to switch traffic, or will it be directly in front of servers. Our presentation from NANOG 64 in 2016 discusses these needs.
The type of device, and the requirements for buffering, can be broken into a couple of categories:
- Access Devices (internal to a campus/facility building, aggregating end user devices)
- Core Devices
- Data Center Devices (traffic, but verging toward enterprise versus science)
- High Performance Computing (similar to data center, but with expectation of elephant flows)
Access devices should be able to support a mixture of traffic capacity needs - sometimes starting as low as 10Mbps and cresting to as much as 40Gbps to possible a 100Gbps. This wide range of needs can strain the choice of buffers required, particularly on the links used to connect the devices back toward the core. During evaluation pay attention to output queue buffering, and test fan-in extensively to ensure that congestion from smaller devices does not saturate uplinks More information can be found in this paper.
Core devices should have as much buffering as possible to handle the bursts coming from outside and inside of the campus network. Pay attention to bandwidth capacity changes (e.g. aggregation of multiple smaller links into a larger link, etc.) as the locations where this will matter the most.
Data center equipment is typically not geared for the task of handling large elephant flows. Top of rack devices are meant to aggregate a large number of like-speed devices, and most often not bursting at the same time. Some devices may have shared memory infrastructure, and not be capable of per-interface tuning. Consult the guide to see what the device you are targeting is, and is not, capable of: http://people.ucsc.edu/~warner/buffer.html
HPC needs are similar to data center (e.g. densely populated racks with similar networking needs), but the burst characteristics should be carefully considered. IT may be the case that many devices are simultaneously functioning, putting extreme strain on the switches that are managing inter-cluster traffic. It may also be the case that WAN needs mix with LAN needs, meaning that congestion should be managed as much as possible.
Processing Packets at Line Rate
The switches/routers that are used for a Science DMZ implementation must be able to keep up with the demands of the flow profile. This often means Data Transfer Nodes (e.g. DTNs) sending packets at a very high rate for long periods of time. If the switches in the infrastructure cannot keep up, there is a risk of increased latency or packet drops. Trident+ and Trident2 (found in many devices) have a hard time forwarding packets at line rate for a number of reasons - namely the use of unified buffer pools to share with multiple interfaces, and slower lookups for destination traffic.
Always verify that your device can handle line rate traffic (e.g. through perfSONAR or other forms of testing), and validate that the increase of cross traffic (e.g. fan-in from other devices) does not negatively impact the initial test flow.
Some tuning configurations are specific to particular vendors, or to a particular model of router or switch.
If you have information on other types of routers please let us know.