When sending from a faster host to a slower host, it is easy to overrun the receiver, leading to packet loss and TCP backing off. Similar problems occur when a 10G host sends data to a sub-10G virtual circuit, or a 40G host sending to a 10G host, or a 40G/100G host with a fast CPU sender to a 40G/100G host with a slower CPU, and so on. These issues are even more pronounced when using data transfer tools that use parallel streams. On some long paths (50-80ms RTT), we've seen TCP performance improvements of 2-4x after enabling packet pacing.
Fair Queuing (FQ)-based pacing, described below, is a very effective way of dealing with this issue.
Both iperf2 and iperf3 support FQ-based pacing via the '--fq-rate' option, allowing you to see if pacing makes a difference in your environment.
Packet Pacing using the FQ (Fair Queuing) scheduler
Starting with the Linux kernel 3.11 or higher (available starting with CentOS 7.2, Fedora 20, Debian 8, and Ubuntu 13.10), Linux contains the 'fair queuing' scheduler, which includes code that does a much better job of pacing packets out of a fast host. See https://lwn.net/Articles/564978/ for more details.
Shortly after that fq_codel was released, which built upon fq by incorporating concepts of fair queuing and delay-based queue management. fq_codel became the default queuing discipline starting with the 4.12 kernel in 2017. However, for high-throughput TCP, we recommend fq over fq_codel, and is required for kernels versions less than 4.20 if you want to experiment with BBR congestion control.
To confirm your host is configured to use fq:
sysctl -a | grep qdisc
If it is not, add this to /etc/sysctl.conf:
net.core.default_qdisc = fq
To enable packet pacing for all traffic from your host:
tc qdisc add dev $ETH root fq maxrate Ngbit
For example, for a 10G data transfer node (DTN) running GridFTP, which uses 4-8 parallel streams by default, we recommend setting FQ as follows:
tc qdisc add dev $ETH root fq maxrate 2gbit
Other useful tc commands include 'show' and 'delete'. For example:
tc qdisc show dev $ETH
tc qdisc del dev $ETH root
For a 100G data transfer node (DTN) sending data to mostly 10G hosts, pacing at 2gbit is also recommended.
You can also add FQ-based pacing to your application using the 'setsockopt' system call with the SO_MAX_PACING_RATE option. This only works if the host is configured to use fq or fq_codel as the qdisc.
More information on configuring FQ is available here and here.