Network Switch Buffer Testing
Before you purchase a new network switch or router, we recommend running some tests to ensure the device has enough buffering to handle multiple high-speed flows. This page describes a test methodology that will help you verify the network switch can handle large science data flows.
You'll need 4 hosts capable of doing 10Gbps, and 2 network switches, configured as shown in the diagram on the right. We recommend configuring everything with 9K MTUs.
The basic idea is to generate 2Gbps background load between 2 hosts, and then use the other 2 hosts to oversubscribe the link, and see how TCP behaves. If the switch has enough buffering, TCP will detect loss, back off, and still get reasonable performance. If the switch does not have enough buffer space, there will be many more packets dropped, and TCP performance will be very unstable.
Testing
First, establish a performance baseline to make sure your test environment can achieve a full 10G:
host1> iperf3 -c host2 -P2 -t30 -O5
Next, use tc to add simulate a high latency path. Do this for the ethernet interface on both host1 and host2, so that the path is symmetric.
tc qdisc del dev eth1 root
tc qdisc add dev eth1 root netem delay 25ms
Rerun your baseline test to make sure you can still get 10G. Note that should take a bit longer for TCP to ramp up.
Next, use iperf3 with UDP to generate a constant amount of background traffic between host3 and host4:
host3> iperf3 -c host4 -u -b2G -t3000
Then generate some TCP traffic from host1 to host2 and see what happens:
host1> iperf3 -c host2 -P2 -t30 -O5
If your switch has enough buffering, you should see a stable 5Gbps of TCP throughput.
Sample Results:
Our testing has shown that 64MB of buffer are needed to get good performance. We'd love to hear about your test results. Please send them to fasterdata@es.net.
Here is sample output for a switch with 64MB on the output queue:
iperf3 -c sr-test-2 -t20
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 155 MBytes 1.30 Gbits/sec 0 5.85 MBytes
[ 4] 1.00-2.00 sec 115 MBytes 965 Mbits/sec 1 6.01 MBytes
[ 4] 2.00-3.00 sec 125 MBytes 1.05 Gbits/sec 0 6.62 MBytes
[ 4] 3.00-4.00 sec 142 MBytes 1.20 Gbits/sec 0 7.97 MBytes
[ 4] 4.00-5.00 sec 178 MBytes 1.49 Gbits/sec 0 10.1 MBytes
[ 4] 5.00-6.00 sec 224 MBytes 1.88 Gbits/sec 0 12.9 MBytes
[ 4] 6.00-7.00 sec 286 MBytes 2.40 Gbits/sec 0 16.6 MBytes
[ 4] 7.00-8.00 sec 368 MBytes 3.08 Gbits/sec 0 21.0 MBytes
[ 4] 8.00-9.00 sec 469 MBytes 3.93 Gbits/sec 0 26.5 MBytes
[ 4] 9.00-10.00 sec 565 MBytes 4.74 Gbits/sec 2 32.4 MBytes
[ 4] 10.00-11.00 sec 691 MBytes 5.80 Gbits/sec 0 39.6 MBytes
[ 4] 11.00-12.00 sec 789 MBytes 6.61 Gbits/sec 0 47.3 MBytes
[ 4] 12.00-13.00 sec 882 MBytes 7.41 Gbits/sec 0 55.5 MBytes
[ 4] 13.00-14.00 sec 924 MBytes 7.75 Gbits/sec 0 63.7 MBytes
[ 4] 14.00-15.00 sec 925 MBytes 7.76 Gbits/sec 0 71.7 MBytes
[ 4] 15.00-16.00 sec 926 MBytes 7.77 Gbits/sec 0 79.5 MBytes
[ 4] 16.00-17.00 sec 928 MBytes 7.78 Gbits/sec 0 84.9 MBytes
[ 4] 17.00-18.00 sec 924 MBytes 7.75 Gbits/sec 0 85.0 MBytes
Note there are minimal retransmits, and throughput is stable after ramping up.
Here is sample output for a switch with only 36MB buffering:
iperf3 -c sr-test-2 -t15
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 155 MBytes 1.30 Gbits/sec 0 5.86 MBytes
[ 4] 1.00-2.00 sec 118 MBytes 986 Mbits/sec 1 6.02 MBytes
[ 4] 2.00-3.00 sec 125 MBytes 1.05 Gbits/sec 0 6.74 MBytes
[ 4] 3.00-4.00 sec 148 MBytes 1.24 Gbits/sec 0 8.25 MBytes
[ 4] 4.00-5.00 sec 185 MBytes 1.55 Gbits/sec 3 10.5 MBytes
[ 4] 5.00-6.00 sec 235 MBytes 1.97 Gbits/sec 1 13.6 MBytes
[ 4] 6.00-7.00 sec 300 MBytes 2.52 Gbits/sec 0 17.4 MBytes
[ 4] 7.00-8.00 sec 385 MBytes 3.23 Gbits/sec 44 13.6 MBytes
[ 4] 8.00-9.00 sec 491 MBytes 4.12 Gbits/sec 0 27.8 MBytes
[ 4] 9.00-10.00 sec 590 MBytes 4.95 Gbits/sec 2 34.1 MBytes
[ 4] 10.00-11.00 sec 691 MBytes 5.80 Gbits/sec 99 20.4 MBytes
[ 4] 11.00-12.00 sec 406 MBytes 3.41 Gbits/sec 0 20.6 MBytes
[ 4] 12.00-13.00 sec 409 MBytes 3.43 Gbits/sec 0 20.9 MBytes
[ 4] 13.00-14.00 sec 416 MBytes 3.49 Gbits/sec 1 22.0 MBytes
[ 4] 14.00-15.00 sec 452 MBytes 3.80 Gbits/sec 0 24.0 MBytes
Note there are multiple periodic retransmits, and throughput is not stable. Also note that these results are with 9K MTUs, and TCP recovers slower with the default 1500 byte MTU. And note that TCP throughput on a lossy path is directly related to latency, so performance would be much worse on a longer path.
More details are in this talk from NANOG, June 2015.