HTAware Mitigation Tool Overview and Usage
search cancel

HTAware Mitigation Tool Overview and Usage

book

Article ID: 328935

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Complete Mitigation of the L1 Terminal Fault (L1TF) vulnerability CVE-2018-3646 requires enabling the ESXi Side-Channel-Aware Scheduler as documented in KB55806. There are two versions of the Side-Channel Aware Scheduler (SCA). The initial version of the scheduler, SCAv1, will schedule on only one logical processor (Hyper-Thread) of a Hyper-Thread-enabled core. As a result, when enabling this version of the scheduler, available host capacity may be reduced, and VM performance may be impacted, depending on current available host capacity. Starting with ESXi 6.7u2, SCAv2 was introduced and offers performance improvements over SCAv1 while protecting from VM to VM and VM to Hypervisor information leakage. Please refer to KB55806 for guidance on choosing between SCAv1 and SCAv2, including the security and performance characteristics of the two schedulers.

For the purposes of this article, we will describe enabling the ESXi Side-channel Aware Scheduler as enabling the Hyper-Thread-aware portion of the L1TF mitigation and name this “HTAware Mitigation”.

You should assess the impact of enabling this scheduler on their vSphere hosts and clusters before enabling it. The HTAware Mitigation Tool is intended to assist in determining the potential impact of subsequently enabling the Side-Channel-Aware Scheduler. The HTAware Mitigation Tool is intended to assist in determining the potential impact of subsequently enabling the Side-Channel-Aware Scheduler v1 (SCAv1). The tool performs the following checks:
  • Scans the virtual infrastructure for CPU utilization across Clusters, Hosts, and VMs to identify heavily utilized resources.
  • Identifies VMs which may be unable to run on their current host after the mitigation is applied.
  • Identifies hosts that are likely safe candidates for mitigation. This list of hosts can be provided as input to the second stage of the tool to enable the HTAware Mitigation.
As indicated above, the HTAware Mitigation Tool analyzes the impact of enabling SCAv1 and does NOT analyze the impact of enabling SCAv2. In general, SCAv2 is expected to offer greater system throughput over SCAv1. However, the magnitude of the improvement varies by workload. Please refer to KB55767 for impact analysis comparing the two schedulers.

It is important to note that the information provided by the Tool is advisory. The Tool is not intended to replace your own analysis of CPU utilization across the infrastructure.

Key features of the HTAware Mitigation Tool
  • Collect and output historical CPU utilization information stored by vCenter for the Cluster and Host
  • Identify the load impact of enabling the HTAware Mitigation on the scanned hosts. The tool also considers the load impact of reduced host capacity during rolling cluster upgrades
  • Identify VMs whose total count of vCPUs is greater than the number of physical cores on the running host. Such VMs will be too “wide” to run on that host when the HTAware Mitigation is enabled.
  • Identify VMs which utilize the vCPU pinning feature. The PCPU (physical CPU) numbers may no longer be valid once the scheduler is enabled.
  • Provides automation functionality to apply HTAware Mitigation across vSphere clusters and/or individual hosts.
The HTAware Mitigation Tool is fully supported by VMware and issues can be reported through Service Requests.

The Update History section of this article will be revised when there is a significant change. Please click Subscribe to Article in the Actions box to be alerted when new information is added to this document and sign up at our Security-Announce mailing list to receive new and updated VMware Security Advisories.

Resolution

The remainder of the KB is divided into two parts. Part one will outline the usage of the tool, the PowerCLI functions available, and how the generated output files can be used. Part two of the KB will go into detail on the factors which the tool uses to determine whether a host may be recommended for enabling the HTAware Mitigation, or whether workloads may need to be moved off the host before the scheduler can be enabled.

Part 1 RUNNING THE HTAWARE MITIGATION TOOL

Prerequisites to using the HTAware Mitigation Tool

Requirements:
  • PowerShell 3.0 or greater
  • PowerCLI 6.3 (supported on Windows, Linux & MacOS)
  • Get-* commands indicated below require System.View privilege
  • Set-* commands indicated below require privileges of Get-* commands plus Host.Config.AdvancedConfig privilege on the host being modified
Installation:
  1. Extract the HTAware Mitigation PowerCLI module from the attached archive HTAwareMitigation.zip
  2. Import the HTAware Mitigation PowerCLI module by running Import-Module .\HTAwareMitigation.psd1
To list the available functions in the HTAware Mitigation PowerCLI module, run Get-Command -Module HTAwareMitigation
 
Note: This command can also be used to display the current version of the HTAware Mitigation PowerCLI module

For detailed usage of each function including examples, run Get-Help <Name of Function> -Detailed 

HTAware Mitigation Load Analysis Usage:
  1. Connect to a vCenter Server(s) by using the Connect-VIServer cmdlet
  2. Run the analysis tool in one of three ways:
  • Scan all connected vCenter Server(s), no arguments required: Get-HTAwareMitigationAnalysis
Note: Users will be prompted to confirm the vCenter Server(s) before proceeding
  • Scan a specific vCenter Server by specifying the name of the vCenter Server: Get-HTAwareMitigationAnalysis -Server vCenter_Server_Name
  • Scan a specific vSphere Cluster by specifying the name of a vSphere Cluster: Get-HTAwareMitigationAnalysis -ClusterName vSphere_Cluster_Name
The analysis may take some time depending on the size of the vSphere environment and retention period of the historical statistics.
 Once the script has completed, the following files will be generated:
  • VC_NAME.json.gz - Raw collected data
  • output.csv - Processed results in CSV format
  • output.html - Detailed report
  • output.json.gz - Processed raw data
Review the results found in the detailed output.html report to help determine whether the HTAware Mitigation can be enabled without impacting existing workloads. 

For hosts that are not impacted by enabling HTAware Mitigation, the output.csv can be used as input to the remediation functions which will be covered in the next section. 

HTAware Mitigation Remediation Usage:
Retrieve HTAware Mitigation configuration

The current HTAware Mitigation configuration can be retrieved by specifying either a vSphere Cluster or an individual ESXi host.
  • Retrieve HTAware Mitigation settings Cluster by specifying the name of a vSphere Cluster: Get-HTAwareMitigationConfig -ClusterName vSphere_Cluster_Name
  • Retrieve HTAware Mitigation settings Cluster by specifying the name of an individual ESXi host: Get-HTAwareMitigationConfig -VMHostName ESXi_Host_Name
Here is an example output of running against an ESXi 6.7u2 host.
The output includes the following information:
  • Hostname or IP Address of the ESXi host
  • Configured value on the SCA scheduler version to enable.
  • Runtime value of the SCA scheduler currently in use
  • Versions of the scheduler available for selection on the host
  • Configured value on whether any SCA scheduler is enabled (default is false, which means mitigation is disabled)
  • Runtime value on whether any SCA scheduler is enabled (default is false, which means mitigation is disabled)
  • Whether the HTAware Mitigation warning message is suppressed in the UI (default False, message is displayed in the UI)
  • ESXi version and build details
Enable or Disable the HTAware Mitigation (HTAware Mitigation) configuration

The HTAware Mitigation can be enabled or disabled by specifying either a) the CSV output file that was generated from Get-HTAwareMitigationAnalysis function, b) the name of the vSphere Cluster, or c) an individual ESXi host. 
  • Enable the HTAware Mitigation setting by specifying the CSV output file: Set-HTAwareMitigationConfig -InputFile path_to_CSV_file -Enable
  • Enable the HTAware Mitigation setting by specifying the name of a vSphere Cluster: Set-HTAwareMitigationConfig -ClusterName vSphere_Cluster_Name -Enable
  • Enable the HTAware Mitigation setting by specifying the name of an individual ESXi host: Set-HTAwareMitigationConfig -VMHostName ESXi_Host_Name -Enable
Notes:
  • When using the CSV input file method, you will be prompted to confirm the list of ESXi hosts to attempt enabling the HTAware Mitigation. Not all hosts listed in the output will be remediated, for example hosts that are not applicable or do not contain the HTAware Mitigation settings will simply be ignored. 
To disable the confirmation prompt, you can specify the -Confirm:$false argument in the command-line. 
  • You can perform a dry run to see which ESXi hosts would be modified by specifying -WhatIf argument in the command line 
Here is an example output of enabling the HTAware Mitigation setting using the CSV input file:
To confirm the changes were made before rebooting the ESXi host (required for the changes to go into effect) we can run the Get-HTAwareMitigationConfig function. 

Here is an example output of retrieving the HTAware Mitigation setting after enabling mitigation:
Enable or Disable HTAware Mitigation UI warning message

Using Set-HTAwareMitigationAnalysis -Enable as noted above will configure the host to use SCAv1. This is the scheduler which will only schedule on one of the two Hyper-Threads on the core. The same thing can be accomplished by using Set-HTAwareMitigationAnalysis -SCAv1.

Using Set-HTAwareMitigationAnalysis -SCAv2 will configure the host to use SCAv2, the updated scheduler introduced in ESXi 6.7u2. The Set command will scan through the provided host names and set the ConfiguredScheduler field to SCAv2. A reboot will be required for the host to switch to the new scheduler.

Using Set-HTAwareMitigationAnalysis -Disable will configure the host to use the Unmitigated Scheduler. The same command can also be accomplished by using Set-HTAwareMitigationAnalysis -Unmitigated. This will switch ESXi to use the original scheduler behavior which does NOT offer protection against L1TF. While this scheduler can provide full Hyper-Threading throughput, the customer is advised to review the security implications from the L1TF disclosure in KB55806

Here are some examples of selecting the various schedulers and their outputs:

To assist administrators, a User Interface (UI) warning message can be shown to flag ESXi hosts which have not yet enabled HTAware Mitigation. The message can be enabled or disabled (suppressed) by using the Set-HTAwareMitigationSuppression function and specifying either a vSphere Cluster or an individual ESXi host. 
  • Enable HTAware Mitigation warning message suppression by specifying the name of a vSphere Cluster: Set-HTAwareMitigationSuppression ClusterName vSphere_Cluster_Name -Enable
  • Enable HTAware Mitigation warning message suppression by specifying the name of an individual ESXi host: Set-HTAwareMitigationSuppression -VMHostName ESXi_Host_Name -Enable
Here is an example of enabling the HTAware Mitigation suppression warning message:
To confirm our changes, we can run the Get-HTAwareMitigationConfig function. A reboot is not required for making changes to the UI warning suppression message. 
 

Part 2 UNDERSTANDING THE OUTPUT OF THE HTAWARE MITIGATION TOOL

Issues detected by the HTAware Mitigation Tool
The following section describes the issues that the HTAware Mitigation Tool will scan for and advise on. It also summarizes the assumptions and analysis performed to arrive at these conclusions. The HTAware Mitigation Tool will scan through the infrastructure and advise on several classes of problems that may occur after enabling the HTAware Mitigation:

Note: For scenarios 1-4 listed below, the Tool will display a red warning message to notify the user of these conditions. For scenario 5, a red or yellow warning message will be displayed depending on the conditions triggered
  1. A VM whose total count of vCPUs is greater than the number of physical cores on the running host will not power-on after the HTAware Mitigation has been enabled. The VM is considered too wide to run on the existing host and must be corrected by reconfiguring the VM to user fewer vCPUs or to move to a host with more cores prior to mitigation. Reconfiguration of the VM must be done while the VM is powered off.
  2. VMs pinned to specific PCPU on the host may not power-on after the HTAware Mitigation has been enabled. The PCPU numbers may need to be altered prior to enabling the HTAware Mitigation as the former physical designations used may no longer be valid. Reconfiguration of the VM must be done while the VM is powered off.
  3. After enabling the HTAware Mitigation, the cluster may not have the spare capacity for rolling upgrades. The Tool looks at the CPU utilization across the clusters and tries to estimate whether it will exceed the cluster capacity during rolling upgrade once the HTAware Mitigation is enabled. To do this, it assumes a uniform capacity from each host in the cluster and looks at average usage of the hosts. If the current usage would require using a second Hyper-Thread of any core during rolling upgrade, this issue is flagged.
  4. A VM with configured with the high latency sensitivity setting cannot be honored once the HTAware Mitigation is enabled if the number of high-latency sensitive vCPUs >= (number of cores– 1). This is due to the host having insufficient capacity to run its own jobs.
  5. ​Hosts that are using the second Hyper-Thread on a core to satisfy their load may not be able to satisfy the load without suffering throughput or response time degradation after enabling the HTAware Mitigation. Here the Tool tries to estimate the load on the host that may occur once the HTAware Mitigation is enabled. By examining the CPU utilization metric, cpu.usage, it can determine if a second thread is required to meet the demands of the existing load.
 
Upon examination of the cpu.usage metric, the Tool will categorize each host as Green, Yellow, Red: 
  • Green: the workload running on the host is not expected to experience performance degradation after enabling HTAware Mitigation based on the historical usage information provided
  • Yellow: workload running on the host may experience limited or some performance degradation after enabling HTAware Mitigation based on the historical usage information provided
  • Red:  workload running on the host is highly likely to experience some or significant performance degradation after enabling HTAware Mitigation based on the historical usage information provided
The Tool arrives at the color advisories based on the calculations listed in the table below:
 
Collection PeriodCollection TriggerCPU UsageHost Color Advisory
Yearly1 x Daily Samplex ≥ 90%Red
Yearly1 x Daily Sample70 ≤ x < 90%Yellow
Monthly2 x 2-hour Samplesx ≥ 90%Red
Monthly2 x 2-hour Samples70 ≤ x < 90%Yellow
Weekly4 x half-hour Samplesx ≥ 90%Red
Weekly4 x half-hour Samples70 ≤ x < 90%Yellow
Daily8 x 5-minute Samplesx ≥ 90%Red
Daily8 x 5-minute Samples70 ≤ x < 90%Yellow
Houlry15 x 20-second Samplesx ≥ 90%Red
Hourly15 x 20-second Samples70 ≤ x < 90%Yellow

 

The host is marked green if none of the above conditions are triggered.
 
Example: When the Tool examines the cpu.usage metric consolidated in the Yearly collection period, and it detects there is 1 daily sample with CPU usage between 70%-90%, it will flag the host as Yellow. Note that a daily sample at 80% means that CPU usage averaged over the full 24 hours was 80%.

Example: When the Tool examines the cpu.usage metric consolidated in the Weekly collection period, and it detects there are 2 or more 2-hour samples with CPU usage greater than 90%, it will flag the host as Red.

Example: When the Tool examines the cpu.usage metric consolidated in the Daily collection period, and it detects there are 8 or more 5-minute samples with CPU usage between 70-90%, it will flag the host as Yellow.

Example: The host is flagged as Green if none of the Yellow or Red heuristics are triggered during any of the collection period.

A note on cpu.usage: VMware vSphere has been computing the cpu.usage statistic in a way that can identify systems that might encounter performance problems after enabling the HTAware Mitigation. This statistic is derived by weighting how many cycles are used by each Hyper-Thread. When Intel Turbo Boost is active on a core, cpu.usage can be higher than the actual processor core utilization, and can exceed 100% as seen in esxtop running locally on the host (cpu.usage values retrieved by the Tool through the vSphere API are capped at 100%). During times where cpu.usage approaches 100%, enabling the HTAware Mitigation will impact the throughput of the host.

The specific load boundaries defined in the table above are:
  • If cpu.usage is over 90%: HTAware Mitigation will degrade the throughput or response times of the virtual machines running on the system. The value of 90% was chosen as close to maximum – system with such usage is already using all available resources, and perhaps could use more even without HTAware Mitigation.
  • If cpu.usage is between 70% and 90%: HTAware Mitigation will likely degrade the throughput or response times of the virtual machines running on the system. The value of 70% was chosen with the assumption that the workload benefits 25% from Hyper-Thread, and that once HTAware Mitigation is enabled, the workload will be affected.
  • If cpu.usage is under 70%: the system should have sufficient headroom for enabling HTAware Mitigation without affecting workload performance.
 
  1. ​The HTAware Mitigation Tool will skip hosts or clusters for further analysis once the HTAware Mitigation has been applied to the respective hosts or clusters. Customers are advised to consult VMware’s vSphere Performance and Monitoring documentation, and/or other system monitoring utilities to perform ongoing resource analysis.
Limitations of the HTAware Mitigation Tool
 
The tool primarily leverages the cpu.usage metric available through vCenter. This limits the type and scope of analysis that is possible:
  1. The assumptions used by the Tool to designate hosts as Green, Yellow or Red are based on collection triggers and CPU usage. Individual workload may follow more conservative (or less conservative) rules when running workloads. It is important that the guidance provided by the Tool are used as additional input into the analysis of desired infrastructure capacity and utilization
  2. Host utilization does not take DRS or manual load balancing into account. This may cause the Tool to be conservative or miss load spikes
  3. The Tool performs estimates of compute throughput. Hyper-Thread does help responsiveness and the impact to responsiveness cannot be estimated
Table of informational and warning advisories displayed by the Tool and their description
Cluster Advisories
Message ID
Description
Advisory Color
ok.cluster.noht
The hosts in this cluster are not configured to use Hyper-Thread. Enabling HTAware Mitigation on these hosts is not expected to have performance or functionality impact.
Green
ok.cluster.nocve
The hosts in this cluster do not use Intel processors and are not impacted by CVE-2018-3646. HTAware Mitigation does not need to be enabled on these hosts.
Green
ok.cluster.mitigated
The hosts in this cluster are patched and have already enabled HTAware Mitigation. After the the cluster is mitigated, the HTAware Mitigation Tool will no longer consider it for further analysis.
Green
ok.cluster
Enabling HTAware Mitigation on the hosts in this cluster is not expected to have performance or functionality impact.
Green
warning.cluster.upgrade.capacity
Enabling HTAware Mitigation on this cluster in a rolling upgrade will cause the cpu.usage of the cluster to be between 70% and 90%. Workload running on the cluster may experience limited or some performance degradation during the upgrade process.
Yellow
warning.cluster.hosts-with-issues
This message summarizes the number of hosts in this cluster that is expected to experience limited or some performance degradation if HTAware Mitigation is enabled on these hosts. These hosts have been flagged yellow. Additional detail on each of these hosts can be expanded and viewed.
Yellow
error.cluster.hosts-with-issues
This message summarizes the number of hosts in this cluster that are highly likely to experience some or significant performance degradation if HTAware Mitigation is enabled on these hosts. These hosts have been flagged red. Additional detail on each of these hosts can be expanded and viewed.
Red
error.cluster.upgrade.capacity
Enabling HTAware Mitigation on this cluster in a rolling upgrade will cause the cpu.usage of the cluster to exceed 90%. Workload running on the cluster is highly likely to experience some or significant performance degradation during the upgrade process.
Red
info.cluster.upgrade.capacity
This message indicates that a rolling upgrade is not possible because either this cluster has only 1 host or it is a standalone host. VMs will need to be powered off prior to upgrading.
Gray
Host Advisories
Message ID
Description
Advisory Color
ok.host.noht
The current host is not configured to use Hyper-Thread. Enabling HTAware Mitigation is not expected to have performance or functionality impact on this host.
Green
ok.host.nocve
The current host does not use Intel processor. HTAware Mitigation does not need to be enabled on this host.
Green
ok.host.mitigated
The current host is patched and has already enabled HTAware Mitigation. After the host has been mitigated, the HTAware Mitigation Tool will no longer consider it for further analysis.
Green
ok
Enabling HTAware Mitigation on the current host is not expected to have performance or functionality impact.
Green
warning.host.vms-with-issues
This message summarizes the number of hosts which are running VMs that may encounter issues when HTAware Mitigation is enabled. Additional detail on each of these VMs can be expanded and viewed.
Yellow
warning.host.needs-ht
The cpu.usage metric on the host is between 70% and 90% during one of the collection periods as defined in the Tool threshold table. Workload running on the current host may experience limited or some performance degradation after enabling HTAware Mitigation based on the historical usage information provided.
Yellow
warning.host.no-data
The current host in inaccessible. It may be powered off or unreachable. Please recheck host.
Yellow
error.host.vms-with-issues
This message summarizes the number of hosts which are running VMs that will encounter issues when HTAware Mitigation is enabled. Additional detail on each of these VMs can be expanded and viewed.
Red
error.host.ls-high.overflow
This message summarizes the total number of VMs with high latency sensitivity setting which can no longer run on the current host when HTAware Mitigation.
Red
error.host.needs-ht
The cpu.usage metric on the host is over 90% during one of the collection periods as defined in the Tool threshold table. Workload running on the current host is highly likely to experience some or significant performance degradation after enabling HTAware Mitigation based on the historical usage information provided.
Red
info.host.needs-upgrade
This message indicates the current host will need to be patched with VMSA-2018-0020 before HTAware Mitigation can be enabled. This patch puts in place ESXi Side-channel Aware Scheduler but does not enable it by default.
Gray
info.host.ls-high
This message summarizes the total number of VMs with high latency sensitivity setting running on the current host.
Gray
info.host.no.vms
The current host has no running VMs. An assessment of the impact of enabling HTAware Mitigation on this host has not been made.
Gray
VM Advisories
Message ID
Description
Advisory Color
warning.vm.affinity
This VM uses vCPU affinity (pinning) setting. The assigned pCPU numbers will be valid after HTAware Mitigation is enabled on the current host. However, it is recommended that the assignments be reviewed again given the reduced pCPUs available.
Yellow
error.vm.too-wide
This VM is configured with more vCPUs than can be supported on the current host when HTAware Mitigation is enabled. In order to continue running this VM on the current host, the number of vCPUs will need to be reduced to be <= the number of physical cores, or the VM will need to be migrated to another host with more cores.
Red
error.vm.ls-high.overflow
This VM is configured to use high latency sensitivity setting. The vCPU resources needed by this VM cannot be provided by the current host when HTAware Mitigation is enabled. To continue running this VM on the current host, the number of vCPUs will need to be reduced to <= (the number of cores minus 1), or the VM will need to be migrated to another host with more cores.
Red
error.vm.vcpu-affinity
This VM uses vCPU affinity (pinning) setting on an individual vCPU basis. This message summarizes the number of vCPUs using affinity setting of the current VM and these settings may be incompatible with enabling HTAware Mitigation. Please review the current settings.
Red
error.vm.affinity
This VM uses vCPU affinity (pinning) setting. The assigned pCPU numbers will be invalid after HTAware Mitigation is enabled on the current host. The pCPU numbers must be reassigned for the VM to run on the current host.
Red
 
HTAware Mitigation Tool Attachment

4/11/2019: v1.0.0.19

Changelog:

  • Added ability to scan through the environment and display the current scheduler in use, including SCAv2 introduced as part of ESXi 6.7u2
  • Added ability enable SCAv2 on the selected hosts

MD5: 2d7ee19e81a04c520a84e7cbb7f9be64
SHA1: 183f57c0d8cd944604e9e2f0eb6ed369bbaa58cf
SHA256: c99f706c7a419ae346ec993b08e8f60dfe35d1ca07eeb7d887e7954f0cfa9f51
SHA512: 90bbe082c3c50721f70812e73706bca84bc25ff5bf22c2778cf1f3d1372714354071807beaffd56a0b7be17e968770d1209c89230f681d0e9915ec1a48305f41

9/26/2018: v1.0.0.16 - No longer available for download

Changelog:

  • Enforcement of availability and minimum version PowerCLI 6.3
  • Introduced signed PowerShell modules
  • Fixed issue of Set-HTAwareMitigationConfig not working for individual hosts or hosts in maintenance mode
  • Fixed version number in report for selected PowerShell versions


Additional Information

For more information about L1TF, see: L1 Terminal Fault (L1TF)

Attachments

HTAwareMitigation-1.0.0.19.zip get_app
HTAwareMitigation-1.0.0.19.zip get_app