Interrupt Binding
To fully maximize single stream performance (both TCP and UDP) on NUMA architecture systems (e.g.: Intel Sandy/Ivy Bridge motherboards), you need to pay attention to which CPU socket and core is being used.
As you can see in the figure on the right, the PCI slot for the NIC is directly attached only one of the two CPU sockets. There is a large performance penalty if either the interrupts or the application is on the wrong socket, because if that happens everything must cross the QPI bus. On a 40G/100G PCI gen-3 host, TCP and UDP performance can be up to 2x slower if you are using cores on the wrong CPU socket. It is important that both the NIC IRQs and the application are using the correct CPU socket.
To specify which core handles the NIC interrupts, you need to disable irqbalance and then bind the interrupts to a specific CPU socket. To do this, run the vendor supplied IRQ script at boot time. For example:
- Mellanox: /usr/sbin/set_irq_affinity_bynode.sh socket ethN
- Chelsio: /sbin/t4_perftune.sh
Where "ethN" is the name of your ethernet device. To find out which socket to pass to set_irq_affinity_bynode.sh, you can use this command:
cat /sys/class/net/ethN/device/numa_node
To determine the best cores to use for a given NIC, use this command:
cat /sys/class/net/ethN/device/local_cpulist
On Linux, you can use the sched_setaffinity() system call or the numuctl command line tool to bind a process to a core.
The network testing tools iperf3 and nuttcp both let you select the core from the command line. For iperf3, you can use the "-A" flag, and for nuttcp you can use the "-xc" flag to do this.
For other programs you use numactl. For example:
numactl -N socketID program_name
If you are using perfSONAR 4.x and using pscheduler to launch these tools, you don't need to worry about this, as pScheduler will automatically determine which CPU socket to use.
To test the effect of doing IRQ binding, use mpstat. For example:
mpstat -P ALL 1
Binding interrupts by hand
Here is information on how to bind the interrupts, in case you are using a NIC that does not come with a script for this.
First, identify the IRQs for the receiving queues for each interface:
grep eth2 /proc/interrupts
76: 23 50 245 66 20 125 10 0 0 0 5 0 PCI-MSI-X eth2-TxRx-0
84: 90 135 45 123 70 50 5 0 0 0 5 0 PCI-MSI-X eth2-TxRx-1
92: 165 65 55 65 128 35 0 0 0 5 0 5 PCI-MSI-X eth2-TxRx-2
100: 85 123 40 45 70 150 0 5 0 5 0 0 PCI-MSI-X eth2-TxRx-3
108: 105 40 20 153 110 80 0 0 0 0 10 5 PCI-MSI-X eth2-TxRx-4
116: 170 125 55 35 70 53 0 0 10 5 0 0 PCI-MSI-X eth2-TxRx-5
124: 85 115 43 45 70 150 0 15 0 0 0 0 PCI-MSI-X eth2-TxRx-6
132: 100 35 90 140 63 80 10 0 0 0 0 5 PCI-MSI-X eth2-TxRx-7
140: 165 108 50 95 55 35 0 0 0 5 0 10 PCI-MSI-X eth2-TxRx-8
172: 2 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-X eth2:lsc
The IRQ is the first column.
Then bind those IRQ to a given processor:
echo "proc_number" > /proc/irq/irq_number/smp_affinity
where proc_number is a bit mask of the core # (e.g.: core 2 = 04, core 3 = 08, core 4 = 16, etc) and irq_number is the IRQ (76,84...).
echo 04 > /proc/irq/76/smp_affinity
This will bind eth2-TxRx-0 to processor 2.
This sample boot script can be used to assign NIC interrupts to cores at boot time. For example:
# Bind eth2/eth3 Myricom IRQs to cores 2 and 3
/usr/local/bin/myri-irq-bind.sh eth2 4
/usr/local/bin/myri-irq-bind.sh eth3 8
You will also want to bind interrupts for your RAID controllers to other unused cores.