40G/100G Network Tuning
For hosts with 40G/100G Ethernet NICs, there are some additional things you'll want to tune to maximize throughput.
The most important things to configure are:
- CPU governor to 'performance'
- TCP buffer size set to the maximum (2GB)
- Make sure you are using the correct cores for both IRQ and user processes
- Disable Simultaneous Multithreading (SMT) (AKA Hyperthreads) in the BIOS. We've seen SMT lead to very inconsistent results, especially with AMD-based hosts.
- Make sure that 'fair queuing' (fq) is enabled, and set a good pacing rate for your environment.
- Most newer versions of Linux set net.core.default_qdisc to fq_codel, which seems to work fine. Some older versions have a default of pfifo_fast, and do not support fq_codel. These should should be changed to fq.
- NIC tuning: Increase the ring buffer size to the max (8192), and confirm that interrupt coalescence is ON.
/usr/sbin/ethtool -G ethN rx 8192 tx 8192
/usr/sbin/ethtool -C ethN adaptive-rx on adaptive-tx on
- Enable IOMMU if your hardware supports it.
- Make sure that flow control (pause frames) is turned on, as not all NIC drivers have this on by default (e.g.: Intel ICE driver)
/usr/sbin/ethtool -A ethN rx on tx on
No other tuning should be needed for modern Linux OS's (systems with a 5.x kernel).
For more details on 100G tuning on older systems, see this presentation from September 2016.
CPU clock rate matters a lot for 40G/100G flows If you care about the throughput of single flows, a higher CPU clock rate is important. In general, you need a CPU clock rate of at least 3GHz to achieve 30Gbps per flow.
On the ESnet 100G perfSONAR nodes we typically see around 30 Gbps single stream, and can easily get over 95Gbps using 8 streams with both iperf2 and the new threaded version of iperf3. Pacing is helpful so that the streams do not step on each other.
For information on DTN file system tuning, see DTN Tuning.
More information on NIC vendor-specific tuning recommendations: