Troubleshooting issues resulting from locked virtual disks

Products

VMware vSphere ESXi

Issue/Introduction

This article provides information on troubleshooting locked virtual disks. This is a recurring issue when using snapshot-based backup solutions, unsupported disk formats, or due to general storage issues.

As multiple causes for locks are possible, this article guides you through a step-by-step process with multiple points at which you may finish your troubleshooting successfully. These points are marked as Consolidation/Power On should now be possible. If you arrive at one of these points and the issue is still present, continue with the next steps.

Symptoms:

Starting in vSphere 8.0 U2, when viewing a failed "Power On virtual machine" task status, the UI shows:

Unable to access file <file path to vmdk> since it is locked
KB 2107795
filePath: <file path to vmdk>
host: <hostname> , <host IP>
mac: ['<mac address>']
id: <id number>
worldName: vmx
lockMode: Exclusive

Powering on a virtual machine fails
Errors similar to:

Unable to access a file filename since it is locked

Unable to access virtual machine configuration

Consolidating virtual machine snapshot fails

An error occurred while consolidating disks: msg.snapshot.error-DISKLOCKED

An error occurred while consolidating disks: msg.fileio.lock.

After successfully running a snapshot base backup, the virtual machine overview tab shows a message similar to:

Virtual Disk Consolidation is required

Environment

VMware vSphere ESXi 8.0.x
VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.7
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.5

Cause

Locked virtual disks can occur as a result of, but are not limited to, the following:

A powered-on virtual machine contains locks on all files in use by the owning ESXi host to facilitate read and write access.
Other locks may be created by hot-adding disks to snapshot-based backup appliances during the backup process.
Failure to create a lock / start a virtual machine can occur if an unsupported disk format is used or if a lock is already present.

Resolution

Troubleshooting Locked Virtual Disks on VMFS Volumes

Write details to the logs.

Snapshot and power-on operations are written to /vmfs/volumes/<vm-datastore>/<vm-name>/vmware.log file. Snapshots only write to this file if the virtual machine is powered on.

If there is a consolidation issue and the virtual machine is powered on, trigger a consolidate task to write a new set of log entries to assist in troubleshooting the problem.

If the virtual machine fails to power on, skip this item and Step 3.

To identify a disk lock, clear all expected locks on the disks.

An exclusive lock through the owning ESXi host protects a powered-on virtual machine. Power down the virtual machine, if applicable.

All remaining locks are considered unexpected and need investigation.

Find the locked file.

Note: Starting in vSphere 8.0 U2, the owner of the lock may be found by viewing the Status of the failed 'Power On virtual machine' task.

Refer to VMware virtual machine file lock on VMFS datastore (84475) before moving on (for VMFS datastore).
Refer to Understanding the NFS .lck lock file to understand the ESX host and NFS filename it refers to (2136521) (for NFS datastore).

If the above troubleshooting does not resolve your issue, there may be another type of lock not caused by hot-add or another running virtual machine with access to the same disk. Proceed to Step 4.

Find the VM/process/service holding the lock

SSH to the ESXi host via root from Step 3 that the returned MAC address has identified.
Find the process responsible for the lock by running:

ps | grep -i vm-name

If this command returns a non-empty output, there is a process locking the disks. If it is empty / no output, skip to Step 4L.
In the above output, in the third column, look for vmm0, note the corresponding ID in the first column, and run this command to output the World ID.

esxcli vm process list | grep -i <ID> -B1

The output will return the World ID of the virtual machine corresponding to the process ID. This virtual machine is holding the lock.
Remove the disk from this virtual machine or power down the virtual machine.

Consolidation/Power On should now be possible.

If the above step fails, find the service/task holding the lock by running the below command:

lsof | grep -i <vm-name>

If this returns an output, a service or task is locking the disks. This should only happen on the same ESXi host where the virtual machine is registered.
To find more information about the task, you can run the below command:

vim-cmd vmsvc/getallvms | grep -i vm-name
Note the ID from the above step's output (number at the beginning of the line) and run the below command:

vim-cmd vmsvc/get.tasklist <ID>

The output should look like:

vim-cmd vmsvc/get.tasklist 109
(ManagedObjectReference) [
'vim.Task:haTask-109-vim.vm.Snapshot.remove-92744898'
]

Note the task ID after ":"
To get more information about the specific task, run the below command:

vim-cmd vimsvc/task_info <Task ID>

For example:

vim-cmd vimsvc/task_info haTask-109-vim.vm.Snapshot.remove-92744898

This outputs useful information (shortened) to identify the task:

task = 'vim.Task:haTask-109-vim.vm.Snapshot.remove-92744898',
state = "running",
cancelled = false,
cancelable = true,
progress = 75,
startTime = "2014-12-06T03:47:26.30322Z",
Restart the management agents of this ESXi host to clear the service/task. For more information, see Restarting the Management agents on an ESXi or ESX host (1003490).

Note: Deactivate HA Host Monitoring first to prevent an unwanted VM failover.

After a couple of seconds, run Step H (vim-cmd vmsvc/get.tasklist <ID>)again; it should return an empty output.

Consolidation/Power On should now be possible.

Sometimes, this process needs more clean-up work. Rerun the first command again to verify the virtual machine is still registered (alternatively, check within the vCenter Server inventory via UI if the VM's state is displaying invalid).

vim-cmd vmsvc/getallvms | grep <ID>

Example:

Skipping invalid VM '109'

Note: Skip the rest of this section if you do not get an invalid VM output.

This output shows that there is a conflict. The virtual machine might still be running, but the ID is unassigned. Run the following command:

esxcli vm process list | grep -i <vm-name> -B5

The output shows the virtual machine listed and additional information. Note the World ID of this virtual machine.

To kill the virtual machine's process (hard shutdown), run the below command:

esxcli vm process kill -t force -w <World ID>

Note: This kills the virtual machine process (hard shutdown); use at your own risk. Alternatively, try to RDP to this virtual machine and shut it down from the Guest-OS level if the virtual machine is responsive.

Run the above esxcli vm process list command again (after a few seconds); the output should now be empty. Remove and re-add the virtual machine from/to the vCenter inventory.

Consolidation/Power On should now be possible.

If you still cannot consolidate or power on the virtual machine, open a support request with VMware. For more information, see How to file a Support Request in Customer Connect (2006985).

Troubleshooting Locked Virtual Disks on NFS Volumes

Locking issues on NFS datastores differ from locking issues on VMFS datastores due to the difference in the locking mechanism. NFS does not provide block-level access, preventing SCSI locks. NFS locks are implemented by creating lock files on the NFS server. Browsing an NFS datastore to show hidden files, a number of .lck-#### files will be seen. Due to this locking mechanism, you cannot use the same command line tools to determine lock holders.

Power down the virtual machine, backup appliances, and other virtual machines that could access the virtual disks.

Find the lock

SSH as root to the ESXi host where the affected virtual machine is registered and browse the datastore.
Run this command to show the hidden .lck-#### files:

ls -lha

Note: If the VM is powered down and there is no other access to any of the virtual disks, there should be no .lck-#### file.

To get more information about the lock

If there is a .lck-#### file, you can get further information on its origin by looking at its content by running this command:

hexdump -C .lck-#### (replace with correct filename)

This gives you the hostname of the lock owner. For example: esxi2.domain.local

Remove the lock

You can then delete this file using the rm command (only if the virtual machine is powered off).

rm .lck-#### (replace with correct filename)

Do the same ls -lha command a couple of seconds later to check if the lock was rewritten.

If it is rewritten, you should investigate which virtual machines this ESXi host owns to find the virtual machine causing this issue (usually a backup appliance or an ISO from NFS mounted as CD/DVD).

If it is not rewritten, Consolidation/Power On should now be possible.

Issue is not due to .lck-#### files but due to general connectivity issues

This article does not consider any issues that might arise due to general NFS connectivity issues. For general troubleshooting, refer to Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts (1003967).

Additional Information

ESXi 6.x and later uses NFSv4.1 compatible and support Kerberos 5 authentication among other benefits.

For vSphere 6.x and later, VMware recommends the following when mounting NFS datastores on different ESXi host:

Do not mix NFS protocol versions on the ESXi hosts.
Configure the Network Attached Storage (NAS) to use only one protocol version.
Do not mix IPv4 or IPv6 for all ESXi hosts connection with NFS.

Failed to power on virtual machine

Determining if a virtual disk is attached to another virtual machine
Unable to delete the virtual machine snapshots
Unable to perform operations on a virtual machine with a locked disk
Restarting the Management agents in ESXi
Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts
[INTERNAL] Committing snapshots in vSphere when more than 32 levels of snapshots are present fails with the error: Too many levels of redo logs
Investigating virtual machine file locks on ESXi
Snapshot removal task stops at 99% in ESXi/ESX
Powering on a virtual machine on NFS or trying to remove an NFS Datastore fails with errors "Unable to access a file since it is locked" or "Resource is in use"
Types of supported Virtual Disks on ESXi/ESX hosts
Delete all Snapshots and Consolidate Snapshots feature FAQ
How to file a Support Request in Customer Connect
Unable to delete the virtual machine snapshots
Powering on a virtual machine on an upgraded host fails with the error: File [VMFS volume] VM-name/VM-name.vmdk was not found
Cannot power on a virtual machine with mounted twoGbMaxExtentSparse disks
Estimate the time required to consolidate virtual machine snapshots
排除锁定的虚拟磁盘导致的故障
ロックされた仮想ディスクが原因である問題のトラブルシューティング