Menu

Network Switch Buffer Testing

January 16, 2025

 

Before you purchase a new network switch or router, we recommend running some tests to ensure the device has enough buffering to handle multiple high-speed flows. This page describes a test methodology that will help you verify the network switch can handle large science data flows.

You'll need 4 hosts capable of doing 10Gbps, and 2 network switches, configured as shown in the diagram on the right. We recommend configuring everything with 9K MTUs.

The basic idea is to generate 2Gbps background load between 2 hosts, and then use the other 2 hosts to oversubscribe the link, and see how TCP behaves. If the switch has enough buffering, TCP will detect loss, back off, and still get reasonable performance. If the switch does not have enough buffer space, there will be many more packets dropped, and TCP performance will be very unstable.


Testing

First, establish a performance baseline to make sure your test environment can achieve a full 10G:

host1> iperf3 -c host2 -P2 -t30 -O5

Next, use tc to add simulate a high latency path. Do this for the ethernet interface on both host1 and host2, so that the path is symmetric.

tc qdisc del dev eth1 root
tc qdisc add dev eth1 root netem delay 25ms

Rerun your baseline test to make sure you can still get 10G. Note that should take a bit longer for TCP to ramp up.

Next, use iperf3 with UDP to generate a constant amount of background traffic between host3 and host4:

host3> iperf3 -c host4 -u -b2G -t3000 

Then generate some TCP traffic from host1 to host2 and see what happens:

host1> iperf3 -c host2 -P2 -t30 -O5

If your switch has enough buffering, you should see a stable 5Gbps of TCP throughput.

Sample Results:

Our testing has shown that 64MB of buffer are needed to get good performance. We'd love to hear about your test results. Please send them to fasterdata@es.net.

Here is sample output for a switch with 64MB on the output queue:

iperf3 -c sr-test-2 -t20
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   155 MBytes  1.30 Gbits/sec    0   5.85 MBytes       
[  4]   1.00-2.00   sec   115 MBytes   965 Mbits/sec    1   6.01 MBytes       
[  4]   2.00-3.00   sec   125 MBytes  1.05 Gbits/sec    0   6.62 MBytes       
[  4]   3.00-4.00   sec   142 MBytes  1.20 Gbits/sec    0   7.97 MBytes       
[  4]   4.00-5.00   sec   178 MBytes  1.49 Gbits/sec    0   10.1 MBytes       
[  4]   5.00-6.00   sec   224 MBytes  1.88 Gbits/sec    0   12.9 MBytes       
[  4]   6.00-7.00   sec   286 MBytes  2.40 Gbits/sec    0   16.6 MBytes       
[  4]   7.00-8.00   sec   368 MBytes  3.08 Gbits/sec    0   21.0 MBytes       
[  4]   8.00-9.00   sec   469 MBytes  3.93 Gbits/sec    0   26.5 MBytes       
[  4]   9.00-10.00  sec   565 MBytes  4.74 Gbits/sec    2   32.4 MBytes       
[  4]  10.00-11.00  sec   691 MBytes  5.80 Gbits/sec    0   39.6 MBytes       
[  4]  11.00-12.00  sec   789 MBytes  6.61 Gbits/sec    0   47.3 MBytes       
[  4]  12.00-13.00  sec   882 MBytes  7.41 Gbits/sec    0   55.5 MBytes       
[  4]  13.00-14.00  sec   924 MBytes  7.75 Gbits/sec    0   63.7 MBytes       
[  4]  14.00-15.00  sec   925 MBytes  7.76 Gbits/sec    0   71.7 MBytes       
[  4]  15.00-16.00  sec   926 MBytes  7.77 Gbits/sec    0   79.5 MBytes       
[  4]  16.00-17.00  sec   928 MBytes  7.78 Gbits/sec    0   84.9 MBytes       
[  4]  17.00-18.00  sec   924 MBytes  7.75 Gbits/sec    0   85.0 MBytes   

Note there are minimal retransmits, and throughput is stable after ramping up.

Here is sample output for a switch with only 36MB buffering:

iperf3 -c sr-test-2 -t15
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   155 MBytes  1.30 Gbits/sec    0   5.86 MBytes       
[  4]   1.00-2.00   sec   118 MBytes   986 Mbits/sec    1   6.02 MBytes       
[  4]   2.00-3.00   sec   125 MBytes  1.05 Gbits/sec    0   6.74 MBytes       
[  4]   3.00-4.00   sec   148 MBytes  1.24 Gbits/sec    0   8.25 MBytes       
[  4]   4.00-5.00   sec   185 MBytes  1.55 Gbits/sec    3   10.5 MBytes       
[  4]   5.00-6.00   sec   235 MBytes  1.97 Gbits/sec    1   13.6 MBytes       
[  4]   6.00-7.00   sec   300 MBytes  2.52 Gbits/sec    0   17.4 MBytes       
[  4]   7.00-8.00   sec   385 MBytes  3.23 Gbits/sec   44   13.6 MBytes       
[  4]   8.00-9.00   sec   491 MBytes  4.12 Gbits/sec    0   27.8 MBytes       
[  4]   9.00-10.00  sec   590 MBytes  4.95 Gbits/sec    2   34.1 MBytes       
[  4]  10.00-11.00  sec   691 MBytes  5.80 Gbits/sec   99   20.4 MBytes       
[  4]  11.00-12.00  sec   406 MBytes  3.41 Gbits/sec    0   20.6 MBytes       
[  4]  12.00-13.00  sec   409 MBytes  3.43 Gbits/sec    0   20.9 MBytes       
[  4]  13.00-14.00  sec   416 MBytes  3.49 Gbits/sec    1   22.0 MBytes       
[  4]  14.00-15.00  sec   452 MBytes  3.80 Gbits/sec    0   24.0 MBytes   

Note there are multiple periodic retransmits, and throughput is not stable. Also note that these results are with 9K MTUs, and TCP recovers slower with the default 1500 byte MTU. And note that TCP throughput on a lossy path is directly related to latency, so performance would be much worse on a longer path.

More details are in this talk from NANOG, June 2015.