Menu

Interrupt Binding

To fully maximize single stream performance (both TCP and UDP) on NUMA architecture systems (e.g.: Intel Sandy/Ivy Bridge motherboards), you need to pay attention to which CPU socket and core is being used.

As you can see in the figure on the right, the PCI slot for the NIC is directly attached only one of the two CPU sockets. There is a large performance penalty if either the interrupts or the application is on the wrong socket, because if that happens everything must cross the QPI bus. On a 40G/100G PCI gen-3 host,  TCP and UDP performance can be up to 2x slower if you are using cores on the wrong CPU socket. It is important that both the NIC IRQs and the application are using the correct CPU socket.

To specify which core handles the NIC interrupts, you need to disable irqbalance and then bind the interrupts to a specific CPU socket. To do this,  run the vendor supplied IRQ script at boot time. For example:

  • Mellanox: /usr/sbin/set_irq_affinity_bynode.sh socket ethN
  • Chelsio: /sbin/t4_perftune.sh

Where "ethN" is the name of your ethernet device. To find out which socket to pass to set_irq_affinity_bynode.sh, you can use this command:

cat /sys/class/net/ethN/device/numa_node

To determine the best cores to use for a given NIC, use this command:

cat /sys/class/net/ethN/device/local_cpulist

On Linux, you can use the sched_setaffinity() system call or the numuctl command line tool to bind a process to a core.

The network testing tools iperf3 and nuttcp both let you select the core from the command line. For iperf3,  you can use the "-A" flag, and for nuttcp you can use the "-xc" flag to do this.

For other programs you use numactl. For example:

numactl -N socketID program_name

If you are using perfSONAR's bwctl to launch these tools, you can not pass those flags to the tools directly, but you need to modify /etc/bwctl-server/bwctl-server.conf to specify which core to use:

test_cpu_affinity 1-3

If you are using perfSONAR 4.x and using pscheduler to launch these tools,  you don't need to worry about this, as pScheduler will automatically determine which CPU socket to use.

 

To test the effect of doing IRQ binding, use mpstat. For example:

    mpstat -P ALL 1

Binding interrupts by hand

Here is information on how to bind the interrupts, in case you are using a NIC that does not come with a script for this.

First, identify the IRQs for the receiving queues for each  interface:

  grep eth2 /proc/interrupts
  76:         23         50        245         66         20        125         10          0          0          0          5          0       PCI-MSI-X  eth2-TxRx-0
84:         90        135         45        123         70         50          5          0          0          0          5          0       PCI-MSI-X  eth2-TxRx-1
92:        165         65         55         65        128         35          0          0          0          5          0          5       PCI-MSI-X  eth2-TxRx-2
100:         85        123         40         45         70        150          0          5          0          5          0          0       PCI-MSI-X  eth2-TxRx-3
108:        105         40         20        153        110         80          0          0          0          0         10          5       PCI-MSI-X  eth2-TxRx-4
116:        170        125         55         35         70         53          0          0         10          5          0          0       PCI-MSI-X  eth2-TxRx-5
124:         85        115         43         45         70        150          0         15          0          0          0          0       PCI-MSI-X  eth2-TxRx-6
132:        100         35         90        140         63         80         10          0          0          0          0          5       PCI-MSI-X  eth2-TxRx-7
140:        165        108         50         95         55         35          0          0          0          5          0         10       PCI-MSI-X  eth2-TxRx-8
172:          2          0          0          0          0          0          0          0          0          0          0          0       PCI-MSI-X  eth2:lsc

The IRQ is the first column.

Then bind those IRQ to a given processor:

  echo "proc_number" > /proc/irq/irq_number/smp_affinity

where proc_number is a bit mask of the core # (e.g.: core 2 = 04, core 3 = 08, core 4 = 16, etc) and irq_number is the IRQ (76,84...).

  echo 04 > /proc/irq/76/smp_affinity

This will bind eth2-TxRx-0 to processor 2.

This sample boot script can be used to assign NIC interrupts to cores at boot time. For example:

# Bind eth2/eth3 Myricom IRQs to cores 2 and 3
/usr/local/bin/myri-irq-bind.sh eth2 4
/usr/local/bin/myri-irq-bind.sh eth3 8

You will also want to bind interrupts for your RAID controllers to other unused cores.