In ESXi 6.7 U2 or later Linux VMs fails unmap and windows VMs can hang or have slow IO performance
search cancel

In ESXi 6.7 U2 or later Linux VMs fails unmap and windows VMs can hang or have slow IO performance

book

Article ID: 317651

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To resolve the slow guest I/O performance and unmap failures.

Symptoms:
  • In ESXi 6.7 U2 or later Linux VMs fails unmap  and  windows VMs can hang or have slow  IO performance
  • After deleting the snapshot and doing a rescan in Linux VMs the DISC-MAX SIZE  is 32M.
  • After creating a snapshot, rebooting the VM the DISC-MAX size is 2G
  • Once the snapshot is deleted  and unmap is tried, the Linux Guest OS logs in /var/log/messages will show similar entries
May  8 23:08:25 promb-gen-dhcp33 kernel: [ 3654.168449] sd 2:0:1:0: [sdb] tag#0 Send: scmd 0x00000000a4fc20d0
May  8 23:08:25 promb-gen-dhcp33 kernel: [ 3654.168450] sd 2:0:1:0: [sdb] tag#0 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
May  8 23:08:25 promb-gen-dhcp33 kernel: [ 3654.168491] sd 2:0:1:0: [sdb] tag#0 Done: ADD_TO_MLQUEUE Result: hostbyte=DID_OK driverbyte=DRIVER_OK
May  8 23:08:25 promb-gen-dhcp33 kernel: [ 3654.168509] sd 2:0:1:0: [sdb] tag#0 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
May  8 23:08:25 promb-gen-dhcp33 kernel: [ 3654.168511] sd 2:0:1:0: [sdb] tag#0 scsi host busy 1 failed 0
May  8 23:08:25 promb-gen-dhcp33 kernel: [ 3654.168512] sd 2:0:1:0: [sdb] tag#0 Inserting command 00000000a4fc20d0 into mlqueue
 
  •   Running this command time fstrim -av provides below output
       fstrim: /mnt/sdb: FITRIM ioctl failed: Input/output error    
  •     In case of Windows, VM slows down after migration from VVol to VMFS6 or vice-versa causing Windows VM to  hang , slow IO performance and frequent VSCSI resets.
  • Latency is not seen on esxtop from datastore view "u" or VM view "v"
Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment.

Cause

Typically OS don't refresh unmap granularities and keep sending unmap based on older value. These unmaps are rejected at VSCSI as descriptor limit is reached resulting in guest OS trying unmap again. This  slows unmap ,trim performance and also IOs.

Resolution

This issue is resolved in VMware vSphere 6.7 Patch ESXi670-202008001. To download go to the Customer Connect Patch Downloads page.

Workaround:
To workaround this issue reboot the VM.

In case of Linux VMs using partition refresh the Guest OS using command echo 1 > /sys/block/sd<XXX>/device/rescan
NOTE: If you are using LVM, you can mark the LV offline and online again to get the refreshed values.

Additional Information

Impact/Risks:
Requires VM reboot.