Menu

Interrupt Binding

To fully maximize single stream performance (both TCP and UDP), you'll probably need to pay attention to which core its being used. To get the best performance you want the NIC interrupts going to 1 core, and the application IO thread to a nearby core, but not the same core.

For hosts with Intel Sandy/Ivy Bridge motherboards this is even more important. As you can see in the figure on the right, the PCI slot for the NIC is directly attached only one of the two processors. There is a large performance penalty if either the interrupts or the application is on the wrong processor because if that happens everything must cross the QPI bus.

On a system with slow processors, or a 40G PIC gen-3 host,  TCP and UDP performance increases of up to 2x have been observed by ensuring that the NIC driver interrupts and applications threads are handled by the right cores.

On Linux, you can use the sched_setaffinity() system call or the numuctl command line tool to bind a process to a core. For iperf3,  you can use the "-A" flag, and for nuttcp you can use the "-xc" flag to do this.

To specify which core handles the NIC interrupts you need to disable irqbalance, and then bind the interrupts to a specfic core.

Some vendors provide scripts to do this IRQ binding at boot time.

  • Mellanox: /usr/sbin/set_irq_affinity_bynode.sh 1 ethN
  • Chelsio: /sbin/t4_perftune.sh
Binding interrupts by hand

Here is information on how to bind the interrupts, in case you are using a NIC that does not come with a script for this.

First, identify the irqs for the receiving queues for each  interface:

  grep eth2 /proc/interrupts
  76:         23         50        245         66         20        125         10          0          0          0          5          0       PCI-MSI-X  eth2-TxRx-0
84:         90        135         45        123         70         50          5          0          0          0          5          0       PCI-MSI-X  eth2-TxRx-1
92:        165         65         55         65        128         35          0          0          0          5          0          5       PCI-MSI-X  eth2-TxRx-2
100:         85        123         40         45         70        150          0          5          0          5          0          0       PCI-MSI-X  eth2-TxRx-3
108:        105         40         20        153        110         80          0          0          0          0         10          5       PCI-MSI-X  eth2-TxRx-4
116:        170        125         55         35         70         53          0          0         10          5          0          0       PCI-MSI-X  eth2-TxRx-5
124:         85        115         43         45         70        150          0         15          0          0          0          0       PCI-MSI-X  eth2-TxRx-6
132:        100         35         90        140         63         80         10          0          0          0          0          5       PCI-MSI-X  eth2-TxRx-7
140:        165        108         50         95         55         35          0          0          0          5          0         10       PCI-MSI-X  eth2-TxRx-8
172:          2          0          0          0          0          0          0          0          0          0          0          0       PCI-MSI-X  eth2:lsc

The IRQ is the first column.

Then bind those IRQ to a given processor:

  echo "proc_number" > /proc/irq/irq_number/smp_affinity

where proc_number is a bit mask of the core # (e.g.: core 2 = 04, core 3 = 08, core 4 = 16, etc) and irq_number is the irq (76,84...).

  echo 04 > /proc/irq/76/smp_affinity

This will bind eth2-TxRx-0 to processor 2.

This sample boot script can be used to assign NIC interrupts to cores at boot time. e.g.:

# Bind eth2/eth3 Myricom IRQs to cores 2 and 3
/usr/local/bin/myri-irq-bind.sh eth2 4
/usr/local/bin/myri-irq-bind.sh eth3 8

You will also want to bind interrupts for your RAID controllers to other, unused, cores.

To test the effect of doing IRQ binding, use mpstat. e.g:

    mpstat -P ALL 1

Or you can also use pidstat. e.g.:

   #show the cpu used by process id 3000 on 2 second intervals
   pidstat -u -p 3000 2  

For more information, see the Redhat Documentation and the Mellanox Performance Tuning Guide.