Guest Introspection VM disconnect on VDI infrastructure
search cancel

Guest Introspection VM disconnect on VDI infrastructure

book

Article ID: 336540

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
After upgrading NSX from 6.4.0 to 6.4.1, 6.4.2 or 6.4.3, you may encounter issues with Guest Introspection deployment status and VM connectivity with AV. 
Intermittent Errors message may be seen on ESXi hosts "Guest Introspection Host Status". 
If you have not recently upgraded, these errors may occur if there are a large number of VMs are on each host and if DRS is enabled.

Environment

VMware NSX for vSphere 6.4.x

Cause

The NSX Manager has a timeout of 30 seconds to get the update from MUX process on the ESXi host, when a configuration update has been applied.
The configuration update is applied to the namespace database, each VM has these since NSX 6.4.1, these updates are taking a large amount of time around 45 seconds or more to update all the VMs on that host.
Since this time is more than 30 seconds, the NSX Manager will reschedule the same update. 
The ESXi host MUX process has no way of knowing that this is the same update, therefore it will push the update to all the VMs once again. 
This means there will be another time out, lasting more than 30 seconds (NSX manager time limit) and it then goes into a loop.

This issue is dependent on the load and activity on the hosts, which means it is not consistent.
When this loop occurs, the MUX process is always busy and therefore unable to process events from VMs and that is why we are seeing the random disconnects.

Resolution

This issue has been resolved in NSX 6.4.4.

Also improvements were made to ESXi 6.5 P03 and 6.7, therefore to gain the full benefit, it is recommended to upgrade to these ESXi versions also.

Additional Information

Impact/Risks:
The affected VMs were may not be getting scanned with AV.
This can be checked using the Eicar test on the affected VMs: http://2016.eicar.org/86-0-Intended-use.html
Remote may be unable to login to VDI desktops.
Slower VM connectivity may be observed from remote users.
There are usually more than 120 VMs per affected host. Please note, this figure includes powered off VMs.

Note: In NSX 6.4.4 and ESXi 6.5 P03 and 6.7, 
- Max Powered on Virtual Machines per Host = 150
- Max Concurrent vMotions to or from a Host = 50
- Max Concurrent Virtual Machines Booting on a Host = 50
- Co-existence of powered off /on VMs: 250 (out of which 150 in powered on state (when there is no boot storm, bulk power-on recommended in batched of 50)

Attachments

NSX-V GI 6.4.4 Improvements get_app