Prolonged switchover and Storage vMotion failure with large virtual memory size
search cancel

Prolonged switchover and Storage vMotion failure with large virtual memory size

book

Article ID: 313861

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Reduced application availability (greater than 1 second) during Storage vMotion switchover.
  • Storage vMotion failure if switchover time approaches 2 minutes, for virtual machines with hundreds of GB of memory.
  • vmkernel logs will show "Swap copy failed during switchover" after Storage vMotion failure.
  • vmware.log will show no occurrence of "MigrateSetState: Transitioning from state 3 to 4" immediately (or within a few seconds) after "SVMotionMirroredMode: Disk copy phase completed".
  • In case of Storage vMotion failure, vmware.log will instead show "MigrateSetState: Transitioning from state 3 to 6", 2 minutes after "SVMotionMirroredMode: Disk copy phase completed".
  • In case of Storage vMotion success accompanied by prolonged switchover time, vmware.log will show "MigrateSetState: Transitioning from state 3 to 4" more than 10 seconds after "SVMotionMirroredMode: Disk copy phase completed".
  • Storage vMotion success upon retry, especially if retried within 10 minutes of such failures.


Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 7.0.0
VMware vSphere ESXi 6.7

Cause

A virtual machine might encounter prolonged Storage vMotion switchover time (greater than 1 second), which would lead to temporarily reduced application availability. Furthermore, switchover time of a virtual machine with hundreds of GB of memory could approach 2 minutes, which would cause migration failure in the first such migration attempt. A second Storage vMotion attempt within 10 minutes of the first migration failure should succeed.

Resolution

The issue is resolved in vSphere ESXi 7.0 U3f .

Workaround:

There are multiple possible workarounds:

  1. Migrate the powered off virtual machines, they do not encounter these issues.
  2. For environments that are *not* using memory overcommit disable memory sharing on the Virtual Machines impacted by this issue set the advanced Virtual Machine config option "sched.mem.pshare.enable" to "FALSE".

* Advanced Virtual Machine Attributes
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.resmgmt.doc/GUID-F8C7EF4D-D023-4F54-A2AB-8CF840C10939.html

* Set Advanced Virtual Machine Attributes
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.resmgmt.doc/GUID-4C27B0A4-0064-4C21-9208-CEBBB689A093.html

The virtual machine should be powered off in order to set this configuration parameter.

  1. To avoid this issue without powering off the virtual machine, change both host and datastore in one migration, then vMotion back to the original host in a second migration.
  2. In case another host is not available, to avoid waiting for a lengthy disk copy process twice (in case the first Storage vMotion might fail), migrate the home datastore and the disks in two separate migrations (a home-only Storage vMotion migration and a disks-only live migration respectively), in either order.
  3. In case of Storage vMotion failure due to prolonged switchover of 2 minutes, retry Storage vMotion immediately, or within approximately 10 minutes.