IRQ Sharing Might Impact Performance (1290)
My server performance decreased when I upgraded from ESX Server 1 to ESX Server 2. What caused this?
I added new hardware to my existing server, and my performance decreased. What caused this?
Note: If you have newer version of ESX, see ESX Server 3.5 Might Display Performance Issues Due to IRQ Sharing (1003710).
ESX Server may experience performance problems due to shared interrupt lines. This problem happens most often with USB controllers. Because service console USB drivers are loaded by default in ESX Server 2.x, but were not supported in ESX Server 1.x, you may see a performance degradation when you upgrade to ESX Server 2.x.
What Causes the Problem
The performance impact is due to extra context switches, resulting from shared interrupts.
Ideally, each controller has a dedicated interrupt line to the processors. However, due to hardware limitations, several controllers often have to share a single interrupt line. In general, interrupt sharing is not a performance concern, but some ESX Server configurations interact with the hardware in such a way that repeated context switches occur.
When an interrupt line is shared, the processors cannot tell which controller is interrupting and they have to execute the interrupt handlers for all the controllers sharing that interrupt line. In most cases, the interrupt handler for a controller is quickly able to determine if its controller is interrupting; if it is not, it can return without having consumed too much processor time.
This is only a slight performance issue if all the controllers sharing the interrupt line are managed by the same entity -- either the service console or the VMkernel. In those cases, no context switch occurs. Note that the VMkernel manages devices shared between the service console and the VMkernel.
The problem occurs when an interrupt line is shared by two or more controllers such that at least one controller is managed by the service console and another is managed by the VMkernel. In this case, calling the controller drivers in sequence results in context switches between the service console and the VMkernel -- every time there is an interrupt on that line. This can have a significant performance impact.
How to Tell if You Have the Problem
To determine whether you may be affected by this problem, start by listing the interrupt line usage. In the service console, type:
This lists the interrupt usage. The output looks similar to this example:
Vector PCPU 0 PCPU 1 0x21: 163 0 COS irq 1 (ISA edge) 0x29: 0 0 <cos> 0x31: 1 0 VMK serial 0x39: 0 0 <cos> 0x41: 0 0 <cos> 0x49: 0 0 <cos> 0x51: 0 0 <cos> 0x59: 1 0 COS irq 12 (ISA edge) 0x61: 1885 0 COS irq 14 (ISA edge) 0x69: 1 0 COS irq 15 (ISA edge) 0x71: 30 0 COS irq 19 (PCI level), VMK aic7xxx 0x79: 1 52596 <cos>, VMK vmnic0 0x81: 66860 0 COS irq 16 (PCI level) 0x89: 42 46 <cos>, VMK aic7xxx 0xdf: 3588590 3589262 VMK timer 0xe1: 0 0 VMK ipi 0xe9: 4 1 VMK resched 0xf1: 3 0 VMK tlb 0xf9: 2871 0 VMK noop 0xfc: 0 0 VMK thermal 0xfd: 0 0 VMK lint1 0xfe: 0 0 VMK error 0xff: 0 0 VMK spurious
Examine the output for controllers with shared interrupts. Ignore devices in angle brackets; the service console does not load drivers for those devices. Any line with both a VMK entry and a COS entry (without angle brackets) indicates a possible problem with a shared interrupt.
The above example contains a line at vector 0x71 with both VMK and COS devices, another line at vector 0x79 with both VMK and COS (in angle brackets), and a third line at vector 0x89 with both VMK and COS (in angle brackets). You can ignore the latter two lines with the COS devices in angle brackets, and focus on the line at vector 0x71.
Next, list the PCI device assignments (VMkernel or service console) for the shared interrupt lines.
This lists all PCI devices present in the machine. The output looks similar to this:
Bus:Sl.F Vend:Dvid Subv:Subd Type Vendor ISA/irq/Vec P M Module Name Spawned bus 000:00.0 8086:1a21 1028:0096 Host/PCI Intel C 000:01.0 8086:1a23 0000:0000 PCI/PCI Intel 001 C 000:30.0 8086:2418 0000:0000 PCI/PCI Intel 002 C 000:31.0 8086:2410 0000:0000 PCI/ISA Intel C 000:31.1 8086:2411 8086:2411 IDE Intel C 000:31.2 8086:2412 8086:2412 USB Intel 11/ 19/0x71 D C 000:31.3 8086:2413 8086:2413 SMBus Intel 11/ 17/0x79 B C 001:00.0 10de:0150 10de:002e Display NVidia 9/ 16/0x81 A C 002:04.0 10b7:9200 1028:0096 Ethernet 3Com 16/ 16/0x81 A C 002:06.0 1013:6003 1028:0096 Audio 0x1013 10/ 18/0x89 A C 002:09.0 8086:1229 8086:000c Ethernet Intel 11/ 17/0x79 A V e100 vmnic0 002:14.0 1011:0024 0000:0000 PCI/PCI DEC 003 C 003:10.0 9005:00cf 1028:0096 SCSI Adaptec 10/ 18/0x89 A V aic7xxx vmhba0 003:10.1 9005:00cf 1028:0096 SCSI Adaptec 11/ 19/0x71 B V aic7xxx vmhba1
Use the interrupt vectors of the shared interrupt lines to index into the output, and determine the assignment modes of the affected controllers. The interrupt vector is found in the ISA/irq/Vec column of the output. The assignment mode is found in the M column.
A mode value of C indicates the device is dedicated to the service console. V indicates the device is dedicated to the VMkernel. S indicates the device is shared between VMkernel and service console.
You are affected by the problem if you identify a group of controllers that share the same Vec number and both of these are true:
- One controller has mode C (managed by the service console).
- Another controller has mode V or S (managed by the VMkernel).
You are not affected by the problem if, for every group of controllers that share the same Vec number:
- All controllers in the group have mode C (assigned to the service console).
- All controllers in the group have mode V (assigned to the VMkernel).
- All controllers in the group have mode S (shared between the VMkernel and the service console, but managed by the VMkernel).
- Some controllers in the group have mode V and others have mode S.
To continue the example, the only interrupt vector of concern was 0x71, which is found in two rows of this output. The affected controllers are the Intel USB and the Adaptec SCSI controller, which share interrupt 19 and vector 0x71.
USB has mode C (meaning it is assigned to the service console), while vmhba1 has mode V (meaning it is assigned to the VMkernel). In this example, the controllers that share the interrupt line are managed by different entities. This is likely to impact performance.
What You Can Do About the Problem
If possible, avoid sharing an interrupt line among several controllers. If you can't avoid interrupt sharing, then configure ESX Server so that the controllers sharing an interrupt are all managed by the same entity.
Try the following:
- If a controller sharing the interrupt line is not used (such as USB), it should be disabled. This prevents its driver from loading.
- If two or more controllers share the same interrupt line, configure them to be managed by the same entity. This means doing one of the following:
- Dedicate them all to the service console.
- Dedicate them all to the VMkernel.
- Share them all between the VMkernel and the service console.
- Dedicate some to the VMkernel, and share some between the VMkernel and the service console.
- Alternatively, try moving a controller card to a different PCI slot; the interrupt line it uses is determined by its physical location. Be sure to recheck all controllers for interrupt sharing after making this change.