Search the VMware Knowledge Base (KB)
Products:
View by Article ID

ESXi S.M.A.R.T. health monitoring for hard drives (2040405)

  • 21 Ratings

Symptoms

  • The server reports a hard drive warning in POST (Power On Self Test)
  • Virtual machines cannot power on due to VMFS corruption on local hard drives
  • Very poor performance on local hard drives

Purpose

This article provides steps to:
  • Help diagnose a local hard drive fault
  • Read the S.M.A.R.T. status of a hard drive (Self-Monitoring, Analysis, and Reporting Technology)

Resolution

In ESXi 5.1, VMware added S.M.A.R.T. functionality to monitor hard drive health. The S.M.A.R.T. feature records various operation parameters from physical hard drives attached to a local controller. The feature is part of the firmware on the circuit board of a physical hard disk (HDD and SSD).


To read the current data from a disk:

  1. Open a console or SSH session to the ESXi host. For more information, see Using ESXi Shell in ESXi 5.x (2004746).
  2. Determine the device parameter to use by running the command:

    # esxcli storage core device list

  3. Read the data from the device:

    # esxcli storage core device smart get -d device

    Where device is a value found in step 1.

  4. The expected output is a list with all SCSI devices seen by the ESXi host. For example:

    t10.ATA_____WDC_WD2502ABYS2D18B7A0________________________WD2DWCAT1H751520

    Note: External FC/iSCSI LUNs or virtual disks from a RAID controller might not report a S.M.A.R.T. status.

This table breaks down some example output:

Parameter Value Threshold Worst
Health Status OK N/A N/A
Media Wearout Indicator 0 0 0
Write Error Count N/A N/A N/A
Read Error Count 118 50 118
Power-on Hours 0 0 0
Power Cycle Count 100 0 100
Reallocated Sector Count 100 3 100
Raw Read Error Rate 118 50 118
Drive Temperature 27 0 34
Driver Rated Max Temperature N/A N/A N/A
Write Sectors TOT Count N/A N/A N/A
Read Sectors TOT Count N/A N/A N/A
Initial Bad Block Count N/A N/A N/A

Note: A physical hard drive can have up to 30 different attributes (the example above supports only 13). For more information, see How does S.M.A.R.T. function of hard disks Work?

Note: The preceding link was correct as of September 2, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.


A raw value can have two possible results:
  • A number between 0-253
  • A word (for example, N/A or OK)

Column descriptions

Note: The values returned and their meaning for each of these columns can vary by manufacturer. For more information, please consult your hardware supplier.
  • Parameter

    This is a translation from the attribute ID to human-readable text. For example:

    hex 0xE7 = decimal 231 = "Drive Temperature"

    For more information, see the Known ATA S.M.A.R.T. attributes section of the S.M.A.R.T. Wikipedia article.

    Note: The preceding link was correct as of September 2, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.

  • Value

    This is the raw value reported by the disk. To illustrate a simple Value using the example above, the Drive Temperature is reported as 27, which means 27 degrees Celsius.

    A Value can either be a number (0-253) or a word (for example, N/A or OK).

  • Threshold

    The (failure) limit for the attribute.

  • Worst

    The highest Value ever recorded for the parameter.

smartd daemon

ESXi 5.1 also has the /sbin/smartd daemon in the DCUI installed. This tool does not have any command line switches or interaction with the console. If you run the command in the shell, a S.M.A.R.T. status is reported in the /var/log/syslog.log file.

For example:

XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____SanDisk_SDSSDX120GG25___________________120506403552________: below MEDIA WEAROUT threshold (0)
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____SanDisk_SDSSDX120GG25___________________120506403552________: above TEMPERATURE threshold (27 > 0)
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____WDC_WD2502ABYS2D18B7A0________________________WD2DWCAT1H751520: above TEMPERATURE threshold (113 > 0)


Notes:
  • You can stop the daemon by typing Ctrl+c.
  • Logged events should be viewed with caution. As can be seen in the example, all three warnings are irrelevant. The output can vary greatly between manufacturers and disk models.

Additional Information

The vm-support bundle also captures S.M.A.R.T. details in the smartinfo.sh.txt file. The file can be found in the commands/ directory.

See Also

Update History

09/02/2014 - Added Step 1 under the Resolution section

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 21 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 21 Ratings
Actions
KB: