VMware
 

Knowledge Base

Search the Knowledge Base:

Products:
Search In:
 

Controlling LUN queue depth throttling in VMware ESX for 3PAR storage arrays

Symptoms

When a Queue Full condition is detected, the host may abort SCSI commands.
 
Queue Full may show up as QFULL or Task Set Full state.
The SCSI Status Code for this state is hexadecimal 28, or decimal 40
 
So, when QFULL conditions exist, you are likely to find ESX log entries similar to:
"status = 40/0 0x## 0x## 0x##"  (ESX 3.x)
"H:0x0 D:0x28  P:0x0 Valid sense data: 0x## 0x## 0x##" (ESX 4.0)
 
Queue throttling algorithm feature is available (disabled by default).
When enabled, it works by examining the completed I/O operations that fit within a window of Disk.QFullSampleSize operations.
If the count of queue full errors is equal to or more than the value of Disk.QFullThreshold  within that window, then the queue depth is throttled.

Purpose

How can I throttle the queue depth in VMware ESX 3.5 (U4 or later) or ESX 4.0 for LUNs on a 3PAR storage array?

Resolution

VMware ESX 3.5 Update 4 introduces an adaptive queue depth algorithm that adjusts the LUN queue depth in the VMkernel I/O stack. This algorithm is activated when the storage array indicates I/O congestion by returning a BUSY or QUEUE FULL status. These status codes may indicate congestion at the LUN level or at the port (or ports) on the array. When congestion is detected,VMkernel throttles the LUN queue depth. VMkernel attempts to gradually restore the queue depth when congestion conditions subside.

By default, this algorithm is disabled. To enable it:
  1. Using the VI client, navigate to the Configuration tab of the VMware ESX host you want to modify.

  2. Click the Advanced Settings link under the Software section.

  3. Select Disk in the left-hand-side pane.

  4. Set QFullSampleSize to a value larger than zero. (The usable range is between 0 and 64).

    This algorithm was well tested against 3PAR arrays and hence limiting the configuration to 3PAR arrays only for now.

    For 3PAR arrays, it is recommend to use value of 32.

  5. Set QFullThreshold to a value smaller than or equal to QFullSampleSize (the usable range is between 1 and 16).

    The QFullSampleSize and QFullThreshold are system-wide configuration parameters. If the system has different vendor storage arrays connected, all the arrays will experience the throttling effect if one LUN/port from any arrays returns the qfull/busy errors in a sliding window manner.
Setting these variables will take effect immediately and do not require rebooting the VMware ESX host.

Note: Careful consideration is needed if multiple hosts access the same LUN or array ports. The adaptive queue depth algorithm works as expected if and only if all the hosts accessing the LUN/port are running the same form of algorithm. In the case where some hosts run the adaptive queue depth algorithm while others do not, the hosts not running the algorithm may consume the resources/slots on the array that are freed up by the adaptive hosts. This will cause the hosts running the algorithm to exhibit lower disk I/O throughput. This may also lead to worsening of the I/O congestion that initially triggered the adaptive algorithm.

If hosts running operating systems other than ESX are connected to array ports that are being accessed by ESX hosts, while the latter are configured to use the adaptive algorithm, make sure those operating systems use an adaptive queue depth algorithm as well or isolate them on different ports on the storage array.

Feedback

Rating: 1 - Lowest 2 3 4 5 - Highest (2 Ratings)   

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.
What can we do to improve this information? (2000 or fewer characters)
Submit
Rating: 1 - Lowest 2 3 4 5 - Highest (2 Ratings)   
Actions