100G Network Tuning
For hosts with 100G (or higher) Ethernet NICs, in addition to the changes to sysctl.conf, there are some additional things you'll want to tune to maximize throughput.
The most important things to configure are:
- CPU governor to 'performance'
- TCP buffer size set to the maximum (2GB), and increase optmem_max.
- Make sure you are using the correct cores for both IRQ and user processes
- NIC tuning: Increase the ring buffer size to the max (8192), and confirm that interrupt coalescence is ON. Note that this recommendation is no longer true starting with the 6.11 kernel.
/usr/sbin/ethtool -G ethN rx 8192 tx 8192
/usr/sbin/ethtool -C ethN adaptive-rx on adaptive-tx on
- Disable Simultaneous Multithreading (SMT) (AKA Hyperthreads) in the BIOS. We've seen SMT lead to very inconsistent results, especially with AMD-based hosts.
- To confirm SMT is off, this command should return zero:
cat /sys/devices/system/cpu/smt/active
-
- To temporarily turn SMT on/off for testing, you can do this:
echo off > /sys/devices/system/cpu/smt/control
- Enable IOMMU if your hardware supports it. This is a very important setting, and can improve performance by up to 40%.
- Make sure that flow control (pause frames) is turned on, as not all NIC drivers have this on by default (e.g.: Intel ICE driver)
/usr/sbin/ethtool -A ethN rx on tx on
- Make sure that 'fair queuing' (fq) is enabled, and set a good pacing rate for your environment. Most newer versions of Linux set net.core.default_qdisc to fq_codel, which seems to work fine. Some older versions have a default of pfifo_fast, and do not support fq_codel. These should should be changed to fq.
Other tuning options to consider trying can be found here.
No other tuning should be needed for modern Linux OS's (systems with a 5.x or later kernel).
For more details on 100G tuning on older systems, see this presentation from September 2016.
CPU clock rate matters a lot for 100G flows If you care about the throughput of single flows, a higher CPU clock rate is important. In general, you need a CPU clock rate of at least 3GHz to achieve 30Gbps per flow.
On the ESnet 100G perfSONAR nodes we typically see around 30 Gbps single stream, and can easily get over 95Gbps using 8 streams with both iperf2 and the threaded version (v3.16+) of iperf3. Pacing is helpful so that the streams do not step on each other.
For information on DTN file system tuning, see DTN Tuning.
Other sources for high speed network tuning:
More information on NIC vendor-specific tuning recommendations:
- Intel 800 series NICs
- NVIDEA/Mellanox NICs. (also see Mellanox Tools)