Troubleshooting NSX Guest Introspection (Linux)
search cancel

Troubleshooting NSX Guest Introspection (Linux)

book

Article ID: 343346

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

VMware Technical Support routinely requests diagnostic information or a support bundle when a support request is handled. This diagnostic information contains logs and configuration files for your virtual machines.

 

This article provides the procedures for obtaining this diagnostic information.

 

The diagnostic information collected is then uploaded to VMware Technical Support. To uniquely identify your information, use the Support Request (SR) number you received when you opened your Support Request.

Overview

The Endpoint Security solution consists of 3 primary components:

 

Thin Agent

 

Linux Thin Agent is an user space daemon which runs as a system service. t’s a multi-threaded application. Thin agent runs only in privileged mode. It leverages Linux kernel Fanotify calls and register itself to intercept file events and passes them to EPSecLib through the Mux. The Thin agent communicates with the Mux over VMCI (a proprietary host-only communication mechanism similar to IP).

 

Mux

 

The Mux is a ESXi User World component (analogous to a Unix process) which passes events from the Thin Agent to SVMs using EPSecLib. The Mux gets its configuration through REST API calls to the vShield Manager. Communication from the Mux to EPSecLib is done over TCP/IP. The Mux is essentially just another driver that is installed on the ESXi host.

 

EPSecLib

 

EPSeclib is a library used by partner solution SVMs to receive events from Thin Agents and perform operations (read/modify) on those files. EPSecLib can also block file operations. The EPSecLib exists as a virtual machine which is deployed by NSX and runs on on each ESXi host that is prepared for Endpoint. This virtual machine is called the SVA (Security Virtual Appliance).

 

General flow of a Guest introspection File scan:

  1. A file needs to have an action performed on it, open, close, read, write etc.

  2. The Thin Agent gets open and close event and holds that event, scans the file, and send it up to the SVA for further investigation.

  3. The Thin Agent communicates to the epsec-mux driver on the ESXi host through VMCI to pass this information onward.

  4. The SVA communicates to the epsec-mux driver on the ESXi host through TCP/IP and scans the file, provides information on the contents, then sends back information.

  5. Once information is gathered on the SVA, the SVA tells the Thin Agent to either delete or ignore the file.


Environment

VMware NSX for vSphere 6.1.x
VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.2.x
VMware NSX for vSphere 6.4.x
VMware NSX for vSphere 6.0.x

Resolution

Troubleshooting the Thin Agent, Mux and EpsecLib components

 

Thin Agent

 

If a particular virtual machine is slow for file read and write operations, you find that its slow to unzip files, save files or perform backup jobs with a particular virtual machine, then you may be having issues specifically with the Thin agent. 
 

  • When Troubleshooting Endpoint, the first thing you should check would be the compatibility of all the components involved. Compatibility is one of the main issues with Endpoint. You need the build numbers for ESXi, vCenter Server, NSX Manager, and which ever Security solution you have chosen (Trend Micro, McAfee, Kaspersky, Symantec etc). Once all of this data has been collected, you can compare the compatibility of the vSphere components. For more information, see the VMware Product Interoperability Matrixes.

  • After ensuring that every VMware component works, check the Partner Compatibility Matrix to determine full compatibility.

  • Trend Micro
  • McAfee
  • Symantec
  • Sophos
  • Kaspersky
  • Bitdefender

    Note: If you cannot find your Vendor compatibility information listed above, create a support ticket with your vendor.

 

  1. Ensure that File Introspection is installed on the system.

  2. Verify that thin agent is running by running service vsepd status. Once this command is executed you should see the vsep service in running state.

  3. If you believe that the Thin Agent is causing a performance issue with the system, stop the service by running the command:

 

            service vsepd stop

 

  • Then perform a test to get a baseline. You can then start the vsep service and perform another test by running this command: 
    service vsepd start

  • If you do verify that there is a performance problem with the Thin agent, please report to vmware with all the required logs collected.

  • You can enable Debug logging for the Linux Thin agent.
    To turn on full logging:

 

  1. Open /etc/vsep/vsep.conf file
  2. Change DEBUG_LEVEL=4 to DEBUG_LEVEL=7 for all logs
  3. This can be set to DEBUG_LEVEL=6 for moderate logs
  4. The default log destination(DEBUG_DEST=2) is vmware.log (on host) to change it to guest (i.e /var/log/message or /var/log/syslog) set DEBUG_DEST=1

 

The thin agent periodically checks for this level and logs accordingly. Hence there is no need to restart the thin agent after changing the log level.

   

Note: It is advised to not enable full logging unless absolutely necessary, as this can result in a heavy log activity flooding the vmware.log file. Also the size of vmware.log file can potentially grow to be very large. Hence please exercise caution while enabling full logging and disable it as soon as done.   

 

Mux

 

If you see that all virtual machines on an ESXi host are not working with Endpoint, or you see alarms on a particular host regarding communication to the SVA, then it could be a problem with the MUX module on the ESXi host.

  1. Check to see if the service is running on the ESXi host by running this command:
    # /etc/init.d/vShield-Endpoint-Mux status
    For example:
    # /etc/init.d/vShield-Endpoint-Mux status
    vShield-Endpoint-Mux is running

  1. If you see that the service is not running, you can restart it or start it with this command:
    /etc/init.d/vShield-Endpoint-Mux start
    or
    /etc/init.d/vShield-Endpoint-Mux restart 

    Note: It is safe to restart this service during production hours as it does not have any great impact, and restarts in a couple of seconds.

 
  1. If you want to get a better idea of what the Mux module is doing or check the communication status, you can check the logs on the ESXi host. Mux logs are written to the host /var/log/syslog file. This is also included in the ESXi host support logs. 

    For more information, see Collecting diagnostic information for ESX/ESXi hosts and vCenter Server using the vSphere Web Client (2032892).

 
  1. The default logging option for Mux is info and can be raised to debug to gather more information: 

    For more information, see Collecting diagnostic information for the NSX Guest Introspection MUX VIB (2094267).

 
  1. Re-installing the Mux module can also fix many issues especially if the wrong version is installed, or the ESXi host was brought into the environment which previously had Endpoint installed on it. This needs to be removed and re-installed. 
    To remove the VIB, run this command: esxcli software vib remove -n epsec-mux

    Note: You must reboot the ESXi host for this change to take effect. After the ESXi host has been rebooted, re-prepare the host again for Endpoint.

 
  1. If you run into issues with the VIB installation, check the /var/log/esxupdate.log  file on the ESXi host. This log shows the most clear information as to why the driver did not successfully get installed. This is a common issue for Mux installation issues. For more information, see Installing NSX Guest Introspection services (MUX VIB) on the ESXi host fails in VMware NSX for vSphere 6.x (2135278).

    Another common reason for an installation failure is a corrupt ESXi image. If this is the case:

  2. Look for an error message similar to:
    esxupdate: esxupdate: ERROR: Installation Error: (None, 'No image profile is found on the host or image profile is empty. An image profile is required to install or remove VIBs. To install an image profile, use the esxcli image profile install command.')

 
  1. You can verify if there is corruption:

a. Run this command cd /vmfs/volumes on the ESXi host.

b. Search for the imgdb.tgz file by running this command:
     find * | grep imgdb.tgz
Note: This command normally results in two matches.

For example:   
0ca01e7f-cc1ea1af-bda0-1fe646c5ceea/imgdb.tgz or edbf587b-da2add08-3185-3113649d5262/imgdb.tgz

c. On each match, run this command:

  ls -l match_result

    For example:

  > ls -l 0ca01e7f-cc1ea1af-bda0-1fe646c5ceea/imgdb.tgz -rwx------   1 root root  26393 Jul 20 19:28 0ca01e7f-cc1ea1af-bda0-1fe646c5ceea/imgdb.tgz

  > ls -l edbf587b-da2add08-3185-3113649d5262/imgdb.tgz -rwx------   1 root root   93 Jul 19 17:32 edbf587b-da2add08-3185-3113649d5262/imgdb.tgz

 

   The default size for the imgdb.tgz file is far greater than the other file or if one of the files is only a couple of bytes, it indicates that the file is corrupt. The only supported way to resolve this is to re-install ESXi for that particular ESXi host.

 

EPSecLib

 

The NSX Manager handles the deployment of this virtual machine. In the past (with vShield), the third party SVA solution handles the deployment. That solution now connects to the NSX Manager. The NSX Manager handles the deployment of this SVA. If there are alarms on the SVA's in the environment, try and re-deploy them through the NSX Manager.

 

Notes:

  • Any configuration is lost as this is all stored inside the NSX Manager.

  • It is better to re-deploy the SVA virtual machines, instead of rebooting them.

  • NSX relies on EAM for deploying and monitoring VIBs and SVMs on host such as the SVA.

  • EAM is the source of truth for determining the Install Status.

  • The Install status in NSX User Interface (UI) can only tell if the VIBs are installed or not, or if the SVM is powered on.

  • The Service status in NSX UI indicates if the functionality in the virtual machine is working  

 

SVA deployment and relationship between NSX and vCenter Server Process

  1. When the Cluster is selected to be prepared for Endpoint, an Agency is created on EAM to deploy the SVA.

  2. EAM then deploys the ovf to the ESXi host with the agency info it created.

  3. NSX Manager verifies if ovf was deployed by EAM.

  4. NSX Manager verifies if virtual machine was powered on by EAM.

  5. NSX Manager communicates to the Partner SVA Solution Manager that the virtual machine was powered on and registered.

  6. EAM sends an event to NSX to indicate that installation was complete.

  7. Partner SVA Solution Manager sends an event to NSX to indicate that the service inside the SVA virtual machine is up and running.

  8. If you are having an issue with the SVA, there are two places you can look at the logs. You can check the EAM logs, as EAM handles the deployment of these virtual machines. For more information, see Collecting diagnostic information for VMware vCenter Server 4.x, 5.x and 6.0 (1011641). Alternatively, look at the SVA logs. 
    For more information, see Collecting logs in VMware NSX for vSphere 6.x Guest Introspection Universal Service Virtual Machine (USVM) (2144624).

  9. If there is a problem with the SVA deployment, it is possible that there is an issue with EAM and the communication to NSX Manager. You can check the EAM logs, and the simplest thing to do is to restart the EAM Service. For more information, seeTroubleshooting vSphere ESX Agent Manager (EAM) with NSX (2122392).

  10. If all of the above seems to be working but you actually want to test the Endpoint functionality, you can test this with an Eicar Test file.

    1. Create any new text file with any label. 
      For example: eicar.test.

    2. The contents of the file should only be the following string: 
      X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*.

    3. Save the file. Upon saving, you should see that the file is deleted. This verifies that the Endpoint solution is working. For more information about eicar, see the Eicar page.

Collect environment and workload details

  • Determine if NSX Guest Introspection is used in the customer environment. If it is not, remove the Guest Introspection service for the virtual machine, and confirm the issue is resolved. Troubleshoot a Guest Introspection issue only if Guest Inspection is required.

  • Collect environment details:

  • ESXi build version - Run the command uname –a on the ESXi host or click on a host in the vSphere Web Client and look for the build number at top of the right-hand pane.

  • Linux Product version and Build number
    /usr/sbin/vsep -v will give the production version

    Build number
    ------------------
    Ubuntu 
    dpkg -l | grep vmware-nsx-gi-file
    SLES12 and RHEL7
    rpm -qa | grep vmware-nsx-gi-file

  1. NSX for vSphere version.

    1. Partner solution name and version number.

    2. EPSec Library version number used by the partner solution: Log into the SVM and run #strings path to EPSec library/libEPSec.so | grep BUILD.

    3. Guest operating system in the virtual machine.

    4. Any other third-party applications or file system drivers.

 
  1. ESX host component (MUX) version - run the command esxcli software vib list | grep epsec-mux.

  2. Collect workload details, such as the type of server.
    For example:
    Web or database

  3. Collect ESXi host logs. For more information, see Collecting diagnostic information for VMware ESX/ESXi (653).

  4. Collect service virtual machine (SVM) logs from the partner solution. Reach out to your partner for more details on SVM log collection.

  5. Collect a suspend state file while the problem is occurring, see Suspending a virtual machine on ESX/ESXi to collect diagnostic information (2005831)

Collect diagnostic information

See these links for more information:

Troubleshoot specific issues

These sections describe how to isolate and troubleshoot specific issues.

 

In general, NSX GI partners provide the first level of technical support. VMware recommends contacting the partner where possible, particularly for performance and interoperability issues.

Troubleshooting Thin Agent crash


If thin agent crashes its core file will be generated at / directory. Collect the core dump file (core) from location / directory.  
Note: Please use “file” command to check if core is generated by vsep.

For example:

# file core

core: ELF 64-bit LSB  core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/vsep'


Virtual machine crash
Enable the crashdump on linux system using the vendor specific method published by Ubuntu 14.04 TLS, Rhel 7 and Sles 12 respectively. After crashdump is enabled, system crash dump would be saved at the location specified by the vendor.

Virtual machine hang or freeze

Collect the VMware vmss file of the virtual machine in a suspended state, see Suspending a virtual machine on ESX/ESXi to collect diagnostic information (2005831) or crash the virtual machine and collect the full memory dump file. VMware offers a utility to convert an ESXi vmss file to a core dump file. See the Vmss2core fling from VMware.