Aggregating and Filtering Alerts in vCenter Operations Manager 5.6 (Custom UI) (2039020)
Defining the Consolidation Rules
You define the alert consolidation rules in file /usr/lib/vmware-vcops/user/conf/analytics/consolidated-alert-definitions.xml The file contains a readme section with explanations, and an example.
For vCenter Operations Manager vApp, use the consolidated-alert-definitions.xml file that is located in the Analytics VM.
In the consolidated-alert-definitions.xml file, consolidation rules are called definitions. A definition consists of 2 parts: alert filtering criteria and trigger conditions. The filter is used to select alerts, for example critical KPI HT alerts on virtual machines. The trigger defines the critical mass over which the consolidated alert will be generated, for example the virtual machines that have active alerts are over 50% of the total number of virtual machines. vCenter Operations Manager supports the following filtering criteria for consolidated alerts:
- Criticality (for example, critical, warning …)
- Type (for example, Classical, Resource, …)
- Subtype (for example, KPI HT breach, Notification …)
- Duration (for example, more than 10 minutes)
- Alert's info as regular expression (for example, contains "capacity")
- Attribute key (for example, cpu|usage_average)
- Resource kind key of the alert's resource (for example, VirtualMachine)
- number of the affected resources
- percent of the affected resources
As part of the trigger, you can define the number of cycles to wait before triggering the consolidated alert, and the number of cycles to wait before canceling it after the conditions are no longer satisfied. This works the same way as in defining hard thresholds.
You can specify both a number of resources and percentage of resources that trigger a consolidated alert. In such a case, an alert is triggered when any of the conditions is satisfied. For example, generate alert if more than 20 resources are affected or more than 10% of the resources are affected.
If you specify a resource kind in the filter criteria, in case of the %-based threshold, vCenter Operations Manager considers the % of the resources of this resource kind.
- The triggers are based on the number of affected resources, and not on the number of alerts. A single virtual machine can have 2 KPI HT alerts on the CPU and memory. Therefore, in the following examples you could have 10 KPI HT alerts but only 5 affected virtual machines.
- Only the active alerts are considered.
- If a filter is based on a super metric, you must use the name of the super metric instead of its attribute key. For example, Super Metrics|Average VM CPU Usage instead of Super Metrics|sm_2.
Each rule definition must have a unique name and can be assigned to resources. You can assign a rule either explicitly or by specifying a resource kind. If you specify a resource kind, the rule is assigned to all the resources of this resource kind.
When applying the filter and counting the affected resources, vCenter Operations Manager accounts for all the indirect children of the resource on which the rule is defined.
Note: The consolidated-alert-definitions.xml file is reloaded on predefined intervals, so you do not have to restart analytics for the changes to take effect. The reload period is specified in file /usr/lib/vmware-vcops/user/conf/analytics/advanced.properties. The name of the property is consolidatedAlertDefinitionsUpdateInterval, and the default value is 15 (minutes).
Example: Triggering of Consolidated Alerts
A cluster resource contains 10 hosts and 80 virtual machines - a total of 90 child resources. Consider the following rules:
- Alert triggering rules: critical, KPI HT alerts; threshold 10%
A consolidated alert is triggered if at least 10% of the child resources (=9) have critical KPI HT alert. These can be either 9 hosts or 9 virtual machines, or 5 hosts and 4 virtual machines.
- Alert triggering rules: critical, KPI HT alerts on VM resources; threshold 10%
A consolidated alert is triggered if 10% of the virtual machines (=8) have critical KPI HT alert.
Filtering Consolidated Alerts
You can filter consolidated alerts from other alert types by using the designated buttons and check-boxes that are available in the Custom UI.
The new alert type has a designated icon to allow you spot quickly a consolidated alert in a list of alerts.
Viewing Summary and Details for Consolidated Alerts
You can view a summary of a consolidated alert by double-clicking the alert in the Alerts Overview list. The Alert Summary page displays the name of the alert consolidation rule, the filter criteria and trigger condition, and the actual number of resources with active alerts that triggered the consolidated alert.
You can view a list of all resources that triggered the alert, and the following information for each resource:
- Date added — Тhe date and time when the resource was first considered impacted - which is when alert on this resource was first included in the consolidated alert.
- Date removed — The date and time when the resource was no longer consider impacted - which is when all of the alerts on this resource that were included in the master alert were canceled or were no longer satisfying the conditions (for example, change in criticality). In case the resource is still impacted, the Date removed column is empty.
- Resource details — Navigate to the details page of the selected resource in the list resource.
- Plot metric — Plot the metrics for the selected resources. The metric to plot is either the health or the metric specified in the rule's filter.
This button is available only in the alert details mode. To activate this mode, click the Troubleshooting button on the Alert Summary page.
- Plot the metrics of the currently impacted resources — Plot the metrics for all resources that are still impacted from the alert (the Removed On column is empty).
Note: Because a consolidated alert could contain thousands of impacted resources, the number of metrics that can be plotted is limited to 25 by default. You can modify this number by changing the value of the maxMetricsPerAlertToPlot property in the /usr/lib/vmware-vcops/user/conf/web/web.properties file.
Changing the State of Consolidated Alerts
If you cancel a consolidated alert, this will also cancel all the alerts included in this consolidated alert.
Changing the state, such as taking ownership, of a consolidated alert will propagate the new state to all of the included alerts.
Several default email templates are added to the templates folder. All previously supported placeholders work with the consolidated alerts, and the following new placeholders are added:
- number of currently affected resources
- number of resources added with the last update
- number of resources removed with the last update
- change in the number of the affected resources