Menu

New network performance enhancements in the 6.X kernels

October 17, 2024

page under construction!

There are a number of new options for improving network performance starting in the 6.x Linux kernels that are covered in detail in our 2024 INDIS workshop paper, "Recent Linux Improvements that Impact TCP Throughput: Insights from R&E Networks".

A short summary of these improvements are the following:


Kernel upgrades

Using the newest Linux kernel (6.8 in September, 2024), we see up to 38% improved performance on the WAN, and 30% improvement on a LAN, compared with the 5.15 kernel.

6.8 is the default on Ubuntu 24, and with Ubuntu 22, it is quite easy to upgrade to the newest kernel using apt. 

# to install 6.5 on Ubuntu 22
apt install linux-generic-hwe-22.04

# to install 6.8 on Ubuntu 22
apt install linux-image-generic-hwe-22.04-edge

# to install 6.11 on Ubuntu 24 (coming soon)
apt install linux-image-generic-hwe-24.04-edge

You can find the latest available 6.X kernel available for your version of Ubuntu using this command:

     apt update; apt search linux-image-6 | grep generic

On RHEL-based systems, you can install the newest kernel from the elrepo project

 


Receiver HW GRO

New receiver side optimizations are available for Nvidia ConnectX-7 network cards with firmware >= 28.42.1000 on Linux 6.11, which include receiver side hardware accelerated GRO and header-data split. Other new NICs from Intel and Broadcom might support this as well.

Preliminary results from the developer suggests up to 60% throughput improvement for single stream tests. 

Our initial results show a 33% improvement on AMD hosts (40 Gbps vs 53 Gbps), and a 5% improvement (62 Gbps vs 65 Gbps) on Intel hosts after enabling hardware GRO on the receiver for single stream tests with a 9K MTU. 

For tests with a 1500B MTU on Intel hosts we saw an impressive 160% improvement in throughput (24 Gbps vs 62 Gbps). 

To enable HR GRO, do the following:

   # Note: cant do ring buffers > 4k with HW GRO. 2K buffers seem to work well.
/usr/sbin/ethtool -G eth100 tx 2048 rx 2048
/usr/sbin/ethtool -K eth100 rx-gro-hw on

Also note that when doing receiver HW GRO, using a MSS of 8K was around 5% faster than using the iperf3 default of MTU-40.

More details coming soon. 


BIG TCP

(more details coming soon. See the INDIS paper for results)

To enable BIG TCP, here are the commands for IPv4 and IPv6

  /usr/sbin/ip link set dev ethN gso_ipv4_max_size 150000 gro_ipv4_max_size 150000
/usr/sbin/ip link set dev ethN gro_max_size 185000 gso_max_size 185000

If you are using VLANs, you'll need to run that command for each VLAN as well.

More information at:  https://lwn.net/Articles/884104/ and https://netdevconf.info/0x15/slides/35/BIG%20TCP.pdf

Some recommend setting this when doing BIG TCP:

/usr/sbin/ethtool --set-priv-flags eth100 rx_striding_rq off

Overall we have not seen big throughput improvements with BIG TCP, but CPU load is somewhat less. A problem with BIG TCP is that to it requires a custom built kernel allowing for larger values for MAX_SKB_FRAGS to see the full advantage of BIG TCP.