Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

vHBAs and other PCI devices may stop responding in ESXi 5.x and ESXi/ESX 4.1 when using Interrupt Remapping (1030265)

Symptoms

When using Interrupt Remapping on some servers, you may experience these symptoms on ESXi 5.x and ESXi/ESX 4.1 hosts:
  • ESXi hosts are non-responsive
  • Virtual machines are non-responsive
  • HBAs stop responding
  • Other PCI devices stop responding
  • You may receive Degraded path for an Unknown Device alerts in vCenter Server
  • You may see an illegal vector error in the VMkernel or messages logs shortly before an HBA stops responding to the driver. The error is similar to:

    vmkernel: 6:01:34:46.970 cpu0:4120)ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40

  • For systems with QLogic HBA cards, the VMkernel or messages logs show that a card has stopped responding to commands:

    vmkernel: 6:01:42:36.189 cpu15:4274)<6>qla2xxx 0000:1a:00.0: qla2x00_abort_isp: **** FAILED ****
    vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: Failed mailbox send register test


  • The VMkernel or messages logs show the QLogic HBA card is offline:

    vmkernel: 6:01:47:36.383 cpu14:4274)<4>qla2xxx 0000:1a:00.0: ISP error recovery failed - board disabled

  • For systems with Emulex HBA cards, the VMkernel or messages logs show a card has stopped responding to commands:

    vmkernel: 6:22:52:00.983 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:(0):2530 Mailbox command x23 cannot issue Data: xd00 x2
    vmkernel: 6:22:52:32.408 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:0310 Mailbox command x5 timeout Data: x0 x700 x0x4100a2811820
    vmkernel: 6:22:52:32.408 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:0345 Resetting board due to mailbox timeout
    vmkernel: 6:22:53:02.416 cpu2:4684)<3>lpfc820 0000:15:00.0: 0:2813 Mgmt IO is Blocked d00 - mbox cmd 5 still active
    vmkernel: 6:22:53:02.416 cpu2:4684)<3>lpfc820 0000:15:00.0: 0:(0):2530 Mailbox command x23 cannot issue Data: xd00 x2
    vmkernel: 6:22:53:33.833 cpu0:4684)<3>lpfc820 0000:15:00.0: 0:0310 Mailbox command x5 timeout Data: x0 x700


  • For systems with LSI1064E (LSI1064, LSI1064E) or LSI1068E series SCSI controllers, if the ESXi host is connected to internal disks, the /var/log/vmkernel.log file shows errors similar to:

    ScsiDeviceIO: 2316: Cmd(0x41240074e3c0) 0x1a, CmdSN 0x12ee to dev "mpx.vmhba0:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
    ScsiDeviceIO: 2316: Cmd(0x41240074e3c0) 0x4d, CmdSN 0x12f1 to dev "mpx.vmhba1:C0:T8:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x35 0x1.


  • For systems with Megaraid 8480 SAS SCSI controllers, the VMkernel or messages logs show the controller has stopped responding to commands:

    vmkernel: 12:14:17:35.206 cpu15:4247)megasas: ABORT sn 94489613 cmd=0x2a retries=0 tmo=0
    vmkernel: 12:14:17:35.206 cpu15:4247)<5>0 :: megasas: RESET sn 94489613 cmd=2a retries=0
    vmkernel: 12:14:17:35.206 cpu4:4435)WARNING: LinScsi: SCSILinuxQueueCommand: queuecommand failed with status = 0x1055 Host Busy vmhba0:2:0:0 (driver name: LSI Logic SAS based Mega RAID driver)


  • As the messages log file rolls over quickly on an ESXi host, press Alt + F11 on the ESXi physical console. This error message appears in red:

    ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40

    Note: This message is cleared after a reboot.

Cause

ESXi/ESX 4.1 and later introduced interrupt remapping code that is enabled by default. This code is incompatible with some servers. This technology has been introduced by the vendor for more efficient IRQ routing and which should improve performance.

Note: If this issue occurs in the PCI device from which the ESXi/ESX host boots (either locally or using SCSI/RAID), or when the host boots from SAN using iSCSI/FC HBA, the APIC error(s) is not logged. To troubleshoot the issue in this case, enable and configure remote syslog logging. For more information, see Configuring syslog on ESXi 5.0 (2003322). Alternatively, you can test this by disabling IRQ remapping.

Resolution

Note: This issue only applies if you see this specific alert in the vmkernel/messages log files:

ALERT: APIC: 1823: APICID 0x00000000 - ESR = 0x40.

If you do not see this message, you are not experiencing this issue.

Several server vendors have released fixes in the form of Server BIOS updates. Contact your server vendor to see if they have a fix available. For IBM models, including but not limited to the IBM BladeCenter HS22 series and System x3400/x3500 and x3600 series systems, see the IBM Knowledge Base article MIGR-5086606 for a firmware update and additional information.

If a firmware fix is not available, work around this issue by disabling interrupt mapping on your ESXi/ESX 4.1 or ESXi 5.0 host and reboot the host to apply the settings.
 
Note: Disabling Interrupt Remapping also disables the VMDirectPath I/O Pass-through feature.

ESXi/ESX 4.1

To disable interrupt remapping on ESXi/ESX 4.1, perform one of these options:
  • Run this command from a console or SSH session to disable interrupt mapping:

    # esxcfg-advcfg -k TRUE iovDisableIR

    To back up the current configuration, run this command twice:

    # auto-backup.sh

    Note: It must be run twice to save the change.

    Reboot the ESXi/ESX host:

    # reboot

    To check if interrupt mapping is set after the reboot, run the command:

    # esxcfg-advcfg -j iovDisableIR

    iovDisableIR=TRUE


  • In the vSphere Client:

    1. Click Configuration > (Software) Advanced Settings > VMkernel.
    2. Click VMkernel.Boot.iovDisableIR, then click OK.
    3. Reboot the ESXi/ESX host.

ESXi 5.x

ESXi 5.x does not provide this parameter as a GUI client configurable option. It can only be changed using the esxcli command or via the PowerCLI.
  • To set the interrupt mapping using the esxcli command:

    List the current setting by running the command:

    # esxcli system settings kernel list -o iovDisableIR

    The output is similar to:

    Name          Type  Description                              Configured  Runtime  Default
    ------------  ----  ---------------------------------------  ----------  -------  -------
    iovDisableIR  Bool  Disable Interrupt Routing in the IOMMU   FALSE        FALSE    FALSE 


    Disable interrupt mapping on the host using this command:

    # esxcli system settings kernel set --setting=iovDisableIR -v TRUE

    Reboot the host after running the command.

    Note: If the hostd service fails or is not running, the esxcli command does not work. In such cases, you may have to use the localcli instead. However, the changes made using localcli do not persist across reboots. Therefore, ensure that you repeat the configuration changes using the esxcli command after the host reboots and the hostd service starts responding. This ensures that the configuration changes persist across reboots.

  • To set the interrupt mapping through PowerCLI:

    Note: The PowerCLI commands do not work with ESXi 5.1. You must use the esxcli commands as detailed above.

    PowerCLI> Connect-VIServer -Server 10.21.69.233 -User Administrator -Password passwd
    PowerCLI> $myesxcli = Get-EsxCli -VMHost 10.21.69.111
    PowerCLI> $myesxcli.system.settings.kernel.list("iovDisableIR")

    Configured  : FALSE
    Default     : FALSE
    Description : Disable Interrrupt Routing in the IOMMU
    Name        : iovDisableIR
    Runtime     : FALSE
    Type        : Bool

    PowerCLI> $myesxcli.system.settings.kernel.set("iovDisableIR","TRUE")
    true

    PowerCLI> $myesxcli.system.settings.kernel.list("iovDisableIR")

    Configured  : TRUE
    Default     : FALSE
    Description : Disable Interrrupt Routing in the IOMMU
    Name        : iovDisableIR
    Runtime     : FALSE
    Type        : Bool


  • After the host has finished booting, you see this entry in the /var/log/boot.gz log file confirming that interrupt mapping has been disabled:

    TSC: 543432 cpu0:0)BootConfig: 419: iovDisableIR = TRUE

Additional Information

See Also

Update History

08/03/2011 - Command for checking if interrupt remapping is set changed to "esxcfg-advcfg -j iovDisableIR" 04/21/2011 - Indicated issue occurs when using interrupt mapping. 10/29/ 2012 - Addded note regarding PowerCLI command for ESXi 5.1 02/24/2012 - Added note in the Cause section and a link to KB 2003322. 04/10/2012 - Added additional Symptom of output from Emulex HBA with the issue. Updated ESXi 4.1 procedure from the command line. 10/09/2012 - Added ESXi 5.1 to Products 03/15/2013 - Added link to IBM Knowledge Base article MIGR-5086606

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 144 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 144 Ratings
Actions
KB: