NUMA imbalance after vMotion of virtual machines
search cancel

NUMA imbalance after vMotion of virtual machines

book

Article ID: 309217

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • After performing vMotion operations on virtual machines, NUMA imbalance occurs on servers with a large number of cores per processor
  • All virtual machines appear to be running on a single NUMA node
  • Slow performance of virtual machines on a lightly-loaded ESX/ESXi host
  • esxtop reports high CPU RDY values for virtual machines


Environment

VMware ESXi 4.1.x Embedded
VMware ESX 4.1.x
VMware ESXi 4.1.x Installable

Cause

To avoid performance problems, the ESX/ESXi NUMA scheduler is coded to initially place virtual machines on a node where the most memory is available. If the nodes have very little memory, the scheduler uses a round robin algorithm to place virtual machines onto NUMA nodes. This typically only occurs when powering on a virtual machine.
With vMotion, the copy of the memory happens before the worlds are completely initiated. A single node is chosen as the home node to preserve memory locality and virtual machines are all placed on the single node. Performance issues can occur if there is a light workload running on the ESX/ESXi host. Under light workloads, NUMA node migrations do not occur as there is not enough load to warrant a migration.

Resolution

This issue is resolved in ESX/ESXi 4.1 Update 2. For more information, see the Resolved Issues section of the VMware ESX/ESXi 4.1 Update 2 Release Notes.

Note: In some instances, this issue may still be encountered in ESX/ESXi 4.1 Update 2. VMware is aware of this issue.

To work around this issue when you do not want to upgrade to ESX/ESXi 4.1 Update 2, disable NUMA on the system by enabling Node Interleaving from the BIOS of the server. For more information on disabling node interleaving, see Performance Best Practices for VMware vSphere 4.1.