VMware vSAN Network Health Check for MTU check fails in a Stretched Cluster
search cancel

VMware vSAN Network Health Check for MTU check fails in a Stretched Cluster

book

Article ID: 317670

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

When you have a VMware vSAN Stretched Cluster setup where vSAN Data nodes have MTU of 9000 and the Witness host/appliance is placed across the WAN which has a MTU limit of 1500, you experience these symptoms:

  • vSAN Network Health Check for MTU check in a Stretched Cluster fails

  • vSAN Network Health status shows that the Hosts large ping test (MTU check) has failed



Cause

This is an expected behavior due to the mismatch in MTUs.
 
This issue occurs because the MTU set on the vSAN Witness ESXi host/Appliance network does not match with the vSAN data node network in the vSAN cluster. vSAN Data and Witness traffic shares the same vSAN vmknic, therefore, shares a single MTU setting.

Resolution

To resolve this issue, use any one of the options:
  • Reconfigure the vSAN Witness host/appliance network MTU to 9000.

    Note: This may not be feasible due to WAN limitations. Consult your Network team and seek advice.
  • Change vSAN Data nodes network MTU to 1500.
 
  • Separate witness and vSAN traffic by creating new VMKs with 1500 MTU, or use a VMK that has connectivity to the witness and is already set for 1500 MTU and tag them for witness traffic. This will resolve the issue by using that vmk for the health check to the witness instead of the vSAN vmks which are set to 9000 MTU.
 Tagging for Witness:
esxcli vsan network ip add -i vmkx -T=witness   
(x= which vmk you want to tag)
  • vSAN MTU health check will send large package size like 9000 to target host, and the actual package size will exceed 9000 with additional headers.  Hence the package will be fragmented. But in some scenarios, package fragmentation is not support by WAN providers. This is a known issue, and we reduce the package size as 8952 since vSphere 7.0 p01 release. If such health check warning is observed, Run the vmkping command as below on ESXi host.
    host1> vmkping -I <target_vmk> <target_IP> -s 8952 -d
    As long as vmkping succeeds, there will not be any performance issues and I/O errors. Warning can be safely ignored.
  • MTU changes require careful planning. An MTU misconfiguration may cause network disconnects and I/O failures. Before making any changes, consult your Network Team.
  • If MTUs are configured incorrectly, it may cause performance issues or I/O errors and can also lead to virtual machine deployment failures on vSAN. For stability of the vSAN cluster, VMware recommends that the MTU configuration must be consistent in the vSAN network including the WAN & Witness Host/Appliance.
  • For more information, see vSAN Health Service - Network Health - Hosts small ping test (connectivity check) and Hosts large ping test (MTU check) (2108285)
  • As of vSAN 6.7u1 mixed MTU is supported when using witness traffic separation. For more information on this see the below link.
  • ​​​​​​If the witness VM is connected to vSAN Data nodes over WAN network, the vSAN Admin should ensure the WAN supports 9000 MTU ( If the data - witness traffic is configured for 9000 MTU ). If WAN does not support 9000 MTU, the vSAN "Witness traffic" should be set to 1500 MTU end to end with both sites. However the Data Nodes can continue to connect over 9000 MTU with each other for Data Traffic ( provided the end to end path supports 9000 MTU between Data sites ). Please note that vSAN witness traffic and Data traffic should be configured on different VMkernel Port when mixed mode of MTU is used. You may refer to below link for  vSAN stretched cluster network design and requirements.
  • When using witness traffic separation (Data traffic with jumbo packet and Witness traffic with default packet size) , to avoid issues you could :
    • Set your data nodes to communicate over 9000 MTU (Should be set 9000 in entire path end to end )
    • The other VMkernel used for witness traffic may communicate over 1500 MTU (Should be set 1500 in entire path end to end ) 


Additional Information