Menu

Fair Queuing Scheduler

Packet Pacing, TSO (TCP Segmentation Offload) sizing, and the FQ (Fair Queuing) scheduler

Starting with the Linux kernel 3.11 or higher (available in Fedora 20, Debian 8, and Ubuntu 13.10), there is a new 'fair queuing' scheduler, which includes code that does a much better job of pacing packets out of a fast host. See https://lwn.net/Articles/564978/ for more details. For RHEL-based OSes, FQ has been backported to the 3.10.0-327 kernel in v7.2.

For more information is configuring FQ is available here and here.

On some long paths (50-80ms RTT), we've seen TCP performance improvements of 2-4X, as shown below. More experimental results are available here.

In particular, FQ helps if there is a network device in the path with less than 32MB of per-port buffering.

To enable Fair Queuing (which is off by default), do:

     tc qdisc add dev $ETH root fq

or to both pace and shape the bandwidth:

     tc qdisc add dev $ETH root fq maxrate Ngbit

A plot of the tcpdump for these transfers clearly shows why throughput with FQ is better. The plot on the left is with FQ, and the plot on the right its without FQ.

Details on these results.

Test path: Fermi National Lab (near Chicago) to NERSC (Oakland CA)

FNAL Sender —> FNAL-S —> FNAL-R —> FNAL-BR—> STARLIGHT-R ——> NERSC-R —> NERSC Receiver
     40G      100G       100G      100G         100G          100G            40G

S: switch; R: router; BR: border router

Both FNAL Sender and NERSC Receiver are configured with Mellanox 40GE NICs.

The FNAL-S 40GE line card has 4x10Gbps parallelism, instead of 1x40Gbps. Therefore, a single stream’s throughput is limited to10Gbps.

In a different test on the ESnet 40G testbed, the following results were produced.  Here is with default settings:

iperf3 -c 10.20.1.20 -A2,2 -t50 -w512M
Connecting to host 10.20.1.20, port 5201
[  4] local 10.20.1.8 port 52812 connected to 10.20.1.20 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   718 MBytes  6.02 Gbits/sec    0   43.5 MBytes       
[  4]   1.00-2.00   sec  1.99 GBytes  17.1 Gbits/sec  738    213 MBytes       
[  4]   2.00-3.00   sec  2.34 GBytes  20.1 Gbits/sec    0    213 MBytes       
[  4]   3.00-4.00   sec  2.16 GBytes  18.6 Gbits/sec    0    214 MBytes       
[  4]   4.00-5.00   sec  2.24 GBytes  19.3 Gbits/sec    0    215 MBytes       
[  4]   5.00-6.00   sec  2.19 GBytes  18.8 Gbits/sec    0    218 MBytes       
[  4]   6.00-7.00   sec  2.30 GBytes  19.8 Gbits/sec    0    221 MBytes       
[  4]   7.00-8.00   sec  2.25 GBytes  19.3 Gbits/sec    0    226 MBytes       
[  4]   8.00-9.00   sec  2.40 GBytes  20.6 Gbits/sec    0    231 MBytes       
[  4]   9.00-10.00  sec  2.36 GBytes  20.3 Gbits/sec    0    238 MBytes       
[  4]  10.00-11.00  sec  2.53 GBytes  21.7 Gbits/sec    0    245 MBytes       
[  4]  11.00-12.00  sec  2.51 GBytes  21.6 Gbits/sec    0    254 MBytes       
[  4]  12.00-13.00  sec  2.69 GBytes  23.1 Gbits/sec    0    263 MBytes       
[  4]  13.00-14.00  sec  2.72 GBytes  23.3 Gbits/sec    0    274 MBytes       
[  4]  14.00-15.00  sec  2.88 GBytes  24.8 Gbits/sec    0    285 MBytes       
[  4]  15.00-16.00  sec  2.96 GBytes  25.4 Gbits/sec    0    297 MBytes       
[  4]  16.00-17.00  sec  3.11 GBytes  26.7 Gbits/sec    0    309 MBytes       
[  4]  17.00-18.00  sec  3.22 GBytes  27.7 Gbits/sec    0    322 MBytes       
[  4]  18.00-19.00  sec  3.38 GBytes  29.0 Gbits/sec    0    334 MBytes       
[  4]  19.00-20.00  sec  3.48 GBytes  29.8 Gbits/sec    0    348 MBytes       
[  4]  20.00-21.00  sec  3.58 GBytes  30.7 Gbits/sec    0    360 MBytes       


And here is with FQ on:

iperf3 -c 10.20.1.20 -A2,2 -t50 -w512M
Connecting to host 10.20.1.20, port 5201
[  4] local 10.20.1.8 port 52824 connected to 10.20.1.20 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   709 MBytes  5.95 Gbits/sec    0   35.0 MBytes       
[  4]   1.00-2.00   sec  2.50 GBytes  21.4 Gbits/sec    0    885 MBytes       
[  4]   2.00-3.00   sec  3.55 GBytes  30.5 Gbits/sec    0    885 MBytes       
[  4]   3.00-4.00   sec  3.55 GBytes  30.5 Gbits/sec    0    885 MBytes       
[  4]   4.00-5.00   sec  3.54 GBytes  30.4 Gbits/sec    0    885 MBytes       
[  4]   5.00-6.00   sec  3.54 GBytes  30.4 Gbits/sec    0    885 MBytes       
[  4]   6.00-7.00   sec  3.54 GBytes  30.4 Gbits/sec    0    885 MBytes       
[  4]   7.00-7.72   sec  2.54 GBytes  30.4 Gbits/sec    0    885 MBytes     


Note that with FQ on, there is no burst of retransmits at the beginning, and it ramps up to full speed quickly.

 

Also note that for a 1500B MTU, just disabling TSO (more information available here) can lead to a 2x improvment on this path.