Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

Controlling LUN queue depth throttling in VMware ESX/ESXi (1008113)

Symptoms

  • If the ESX host detects a queue full condition, it may abort the SCSI commands
  • Queue Full may show up as QFULL or Task Set Full state
  • If QFULL conditions exist, the ESX VMkernel log may contain entries similar to:
    • In ESX 3.x

      status = 40/0 0x## 0x## 0x##

    • In ESX/ESXi 4.0 and later:
      • H:0x0 D:0x28 P:0x0 Valid sense data: 0x## 0x## 0x##
      • H:0x0 D:0x08 P:0x0 Valid sense data: 0x## 0x## 0x##
The hexadecimal 28 or decimal 40 in the error is the SCSI status code for the queue full state. The value 0x08 in the above error is the SCSI Status code indicating a device busy state.
  • In ESX/ESXi 4.0 and later, you also see a device busy error

Resolution

VMware ESX 3.5 Update 4 introduces an adaptive queue depth algorithm that adjusts the LUN queue depth in the VMkernel I/O stack. This algorithm is activated when the storage array indicates I/O congestion by returning a BUSY or QUEUE FULL status. These status codes may indicate congestion at the LUN level or at the port (or ports) on the array. When congestion is detected, VMkernel throttles the LUN queue depth. The VMkernel attempts to gradually restore the queue depth when congestion conditions subside.

This algorithm can be activated by changing the values of the QFullSampleSize and QFullThreshold parameters. When the number of QUEUE FULL or BUSY conditions reaches the QFullSampleSize value, the LUN queue depth reduces to half of the original value. When the number of good status conditions received reaches the QFullThreshold value, the LUN queue depth increases one at a time.

Note: Careful consideration is needed if multiple hosts access the same LUN or array ports. For the adaptive queue depth algorithm to be effective, all hosts accessing the LUN/port must have some form of adaptive queue depth algorithm. If some hosts run the adaptive queue depth algorithm while other hosts do not, the hosts that are not running the algorithm may consume the resources/slots on the array that are freed up by the adaptive hosts. This causes the hosts running the algorithm to exhibit lower disk I/O throughput. This may also increase the I/O congestion that initially triggered the adaptive algorithm.

If hosts running operating systems other than ESX/ESXi are connected to array ports that are being accessed by ESX/ESXi hosts, and the ESX/ESXi hosts are configured to use the adaptive algorithm, either make sure the operating systems use an adaptive queue depth algorithm or isolate those hosts on different ports on the storage array.

VMware ESX versions 3.5-5.0

In ESX/ESXi versions 3.5-5.0,  QFullSampleSize and QFullThreshold are system-wide configuration parameters. Even if the system has different vendor storage arrays connected, all the arrays experience the throttling effect if one LUN/port from any array returns the QUEUE FULL or BUSY errors.

To enable the algorithm:

  1. Use the vSphere Client or vSphere Web Client to navigate to the Configuration tab of the VMware ESX host you want to modify.

  2. Click Advanced Settings under the Software section.

  3. Click Disk in the left side pane.

  4. Set QFullSampleSize to a value greater than zero. The usable range is 0 to 64.
    • For 3PAR, NetApp and IBM XIV storage arrays, set the QFullSampleSize value to 32.
    • For other storage arrays, contact your storage vendor.

  5. Set QFullThreshold to a value lesser than or equal to QFullSampleSize. The usable range is 1 to 16.
    • For 3PAR storage arrays, set the QFullThreshold value to 4.
    • ForNetApp and IBM XIV storage arrays, set the QFullThreshold value to 8.
    • For other storage arrays, contact your storage vendor.
The settings take effect immediately. You do not need to reboot the ESX/ESXi host.

VMware ESXi 5.1

In ESXi releases earlier than ESXi 5.1,  QFullSampleSize and QFullThreshold are set globally, that is, on all devices seen by the ESXi host. In VMware ESXi 5.1, these parameter are not set globally because different vendors have different optimal values for their arrays. You set QFullSampleSize and QFullThreshold on a per-device level.

Run the following ESXCLI command:.
esxcli storage core device set --device  device_name --queue-full-threshold 
Q
--queue-full-sample-size S


Settings are persistent across reboots.
You can retrieve the values for a device by using the corresponding list command.

esxcli storage core device list

The command supports an optional --device parameter.

esxcli storage core device list --device device

The recommended values are the same as in earlier releases.
QFullSampleSize:
  • For 3PAR, NetApp and IBM XIV storage arrays, set the QFullSampleSize value to 32.
  • For other storage arrays, contact your storage vendor.
QFullThreshold:
  • For 3PAR storage arrays, set the QFullThreshold value to 4.
  • ForNetApp and IBM XIV storage arrays, set the QFullThreshold value to 8.
  • For other storage arrays, please contact your storage vendor.
The settings take effect immediately. You do not need to reboot the ESX/ESXi host.

VMware ESXi 5.1 Patch 1 and later versions

In ESXi 5.1, QFullSampleSize and QFullThreshold are not set globally because different vendors have different optimal values for their arrays. You set QFullSampleSize and QFullThreshold on a per-device level. With ESXi 5.1 Patch 1 and later versions, the earlier option to set QFullSampleSize and QFullThreshold globally on all devices seen by the ESXi host, is available again. If you apply both the global option and also set one of the parameters for a specific device, the setting for the specific device always takes precedence over the global setting.

Note: Devices managed by the global parameter will return incorrect LUN queue depth values when queried using the command esxcli storage core device list. Instead, these devices will return these a value of zero:

Queue Full Sample Size: 0
Queue Full Threshold: 0

Do not be alarmed, the correct global LUN queue depth values are being applied to these devices, this can be confirmed using esxtop, for details see: Checking the queue depth of the storage adapter and the storage device (1027901).

Tags

queue-depth

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 33 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (4000 or fewer characters)
  • 33 Ratings
Actions
KB: