Troubleshooting vmkernel ports used for vMotion
search cancel

Troubleshooting vmkernel ports used for vMotion

book

Article ID: 342818

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
Cross vCenter vMotion allows a running VM to be moved from one vCenter instance to another. Long Distance vMotion (LDVM) is a feature of vSphere 6.x optimizes the move of a running VM between two hosts with latency below 150ms. This is often desired at greater than campus distances. At a mechanical level, this means "moving" the running VM instance from a sending ESXi instance to a receiving ESXi instance, for more information see Understand vMotion networking requirements.

When such migrations fail, which typically happens due to intermediate routers or firewalls, it may be necessary to troubleshoot specific vmkernel ports used by LDVM.

Environment

VMware vSphere ESXi 6.0
VMware vCenter Server 6.0.x
VMware vSphere ESXi 6.5
VMware vCenter Server 6.5.x

Cause

LDVM transfer failures require specific syntax to troubleshoot using directed egress of packets.

Resolution

The LDVM is effected by the transfer of two distinct kinds of data. These flows may use two different communications paths. For more information see Understand vMotion networking requirements.
 
When LDVM does not succeed, troubleshoot that the required ports communicate properly.

From a command line on a given ESXi host, diagnose that basic routing is correctly configured on the network between ESXi hosts. Exploit network diag (the successor to vmkping) between ESXi-Seattle and ESXi-Dallas, for example:

esxcli network diag ping -I vmk0 --netstack=defaultTcpipStack -H 10.81.20.x

In this case, the default TCP/IP stack at vmk0 is used for Management of the ESXi host, which we will use for packet egress of "pings." These packets are aimed towards the Management vmkernel port of a remote ESXi Server at 10.81.20.x, in this example. Use the real IP address of your target ESXi host. Perform this test in both directions. Note: the Management vmkernel is NOT required directly between ESXi servers for LDVM - but the success/failure of this "ping" is still a useful data point.

esxcli network diag ping -I vmk1 --netstack=vmotion -H 10.81.30.x

In this case, the vmotion stack at vmk1 is used for vMotion, which we will use for packet egress of "pings." These packets are aimed towards the vMotion vmkernel port of a remote ESXi Server at 10.81.30.x, in this example. Use the real IP address of your target ESXi host. Perform this test in both directions.This netstack is used to transfer the memory state of the running VM, plus certain other "hot" items.
Note: The real IP address of your target ESXi host is the vMotion vmkernel IP of the target ESXi host.

esxcli network diag ping -I vmk2 --netstack=vSphereProvisioning -H 10.81.40.x
 
In this case, the vSphereProvisioning stack at vmk2 is used for so-called provisioning of VMs, which we will use for packet egress of "pings." These packets are aimed towards the vSphereProvisioning vmkernel port of a remote ESXi Server at 10.81.40.x, in this example. Use the real IP address of your target ESXi host. Perform this test in both directions.

This netstack is used to transfer the bulk of disk blocks of the running VM, plus certain other "cold" items.
Note: vmkernel diagnostic "pings" are really kernel-directed TCP/IP packets, not ICMP.

"VMs located on datastores other than internal storage (including Fibre Channel, iSCSI, NFS, or a vMSC stretched cluster) will fail to copy their VM disks over a non-default vSphereProvisioning vmkernel port using NFC, regardless of hypervisor configuration. As such, in any hardened architectures with VLANs that are air-gapped by purpose, a long-distance vMotion (LDVM) may fail over non-default vmkernel ports. This is a known bug."