Poor performance for VMs on Overlay Logical Switch when TCP Segmentation Offload (TSO) is enabled on the VMs
search cancel

Poor performance for VMs on Overlay Logical Switch when TCP Segmentation Offload (TSO) is enabled on the VMs

book

Article ID: 318708

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • Poor performance for VMs on Overlay Logical Switch when TCP Segmentation Offload (TSO) is enabled on the VMs.
  • When the same VMs are VLAN Logical Switch or DVS/VSS portgroups, no performance issue are seen.
  • The following Network cards are used: QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
#esxcfg-nics -l
Name    Driver      ENS Capable   ENS Driven    MAC Address       Description
vmnic0  qedentv     False         False         aa:aa:aa:aa:0b:bc QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic1  qedentv     False         False         aa:aa:aa:aa:0b:bd QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic2  qedentv     False         False         aa:aa:aa:aa:0b:be QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
vmnic3  qedentv     False         False         aa:aa:aa:aa:0b:bf QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter
  • ESXi host logs (vmkernel.log and/or vmkwarning.log) display message(s) similar to:
2019-10-12T01:46:08.994Z cpu38:2098179)WARNING: [qedentv_set_pkt_csum:1928(vmnic1)]Error in Outer L4 Checksum
2019-10-12T01:46:08.994Z cpu38:2098179)WARNING: [qedentv_set_pkt_csum:1975(vmnic1)]RX: Invalid checksum


Environment

VMware NSX-T Data Center 2.x
VMware NSX-T
VMware NSX-T Data Center
VMware NSX-T Data Center 2.5.x

Cause

When a VM on an Overlay Logical Switch sends a large segment leveraging TSO and the segment exit the NIC, the NIC needs to perform two operations: TCP segmentation and Overlay/Geneve encapsulation.

With the QLogic Corp. QLogic FastLinQ QL41xxx 1/10/25 GbE Ethernet Adapter and drivers pre version qedentv 3.11.16.0 there is an issue where the resulting packets (after TCP segmentation and Overlay/Geneve encapsulation) end up with an incorrect UDP checksum.

When those Overlay packets with an incorrect UDP checksum are received, they get dropped with a checksum error. This will trigger the guest VM to retransmit those packets and result in poor performance.

Resolution

The issue is resolved in the following driver version: qedentv 3.11.16.0 available from the VMware Hardware Compatibility List.

Workaround:
As a workaround you can either disable TSO on the guest OS or disable Hardware Geneve offload and enable Software Geneve offload.

To disable TSO on the guest OS refer to guest OS documentation.

To disable  Hardware Geneve offload and enable Software Geneve offload, follow the steps below:
Note 1: enabling Software Geneve offload will reduce performance, as such upgrading the driver to a version with the fix should always be preferred. 
Note 2: when Software/vmkernel Geneve offload is enabled, it will take precedence over Hardware Geneve offload but you should always perform step 1. and reboot the ESXi host.

1. Disable Geneve offload on the qedentv driver
#esxcli system module parameters set -m qedentv -p  "disable_tpa=0 enable_geneve_ofld=0”
To verify:
#esxcli system module parameters list  -m qedentv 

2. Enable Software/vmkernel Geneve offload:
#esxcli network nic software set -n vmnicX --geneveoffload=1
To verify:
#vsish -e get /net/pNics/vmnicX/hwCapabilities/CAP_GENEVE_OFFLOAD

3. Reboot the ESXi host so the driver settings are applied.