Packet Pacing
When sending from a faster host to a slower host, it is easy to overrun the receiver, leading to packet loss and TCP backing off. Similar problems occur when a 10G host sends data to a sub-10G virtual circuit, or a 40G host sending to a 10G host, or a 40G/100G host with a fast CPU sender to a 40G/100G host with a slower CPU. These issues are even more pronounced when using tools that use parallel streams, such as GridFTP. On some long paths (50-80ms RTT), we've seen TCP performance improvements of 2-4x after enabling packet pacing.
Fair Queuing (FQ)-based pacing, described below, is a very effective way of dealing with this issue.
Packet pacing techniques should be considered carefully, as any changes made will impact all traffic on a host. The following information should be considered only for system where the requirements and traffic patterns are well understood.
Packet Pacing using the FQ (Fair Queuing) scheduler
Starting with the Linux kernel 3.11 or higher (available starting in CentOS 7.2, Fedora 20, Debian 8, and Ubuntu 13.10), there is a new 'fair queuing' scheduler, which includes code that does a much better job of pacing packets out of a fast host. See https://lwn.net/Articles/564978/ for more details.
Shortly after that fq_codel was released, which built upon fq by incorporating concepts of fair queuing and delay-based queue management, but does not support pacing. fq_codel became the default queuing discipline starting with the 4.12 kernel in 2017.
However, for high-throughput TCP, we recommend fq over fq_codel, as it supports pacing, and is required for kernels versions less than 4.20 if you want to experiment with BBR congestion control.
More information on configuring FQ is available here and here.
To confirm your host is configured to use fq:
sysctl -a | grep qdisc
If it is not, add this to /etc/sysctl.conf:
net.core.default_qdisc = fq
To enable packet pacing:
tc qdisc add dev $ETH root fq maxrate Ngbit
For example, for a 10G data transfer node (DTN) running GridFTP, which uses 4 parallel streams be default, we recommend setting FQ as follows:
tc qdisc add dev $ETH root fq maxrate 2gbit
Other useful tc commands include 'show' and 'delete'. For example:
tc qdisc show dev $ETH
tc qdisc del dev #ETH root
For a 100G data transfer node (DTN) sending data to mostly 10G hosts, pacing is 2gbit is also recommended.
You can also add FQ-based pacing to your application using the 'setsockopt' system call with the SO_MAX_PACING_RATE option. This only works if the host is configured to use fq as the qdisc however. The iperf3 tool uses this for the --fq-rate option (v3.1.5 and higher).