Knowledge Base

|
Assessing commonalities of an outage affecting multiple virtual machines (1019000)
Symptoms
- Clients connected to services running in one or more virtual machines are no longer accessible.
- Applications dependent on services running in one or more virtual machines are reporting errors.
- One or more virtual machines are no longer responding to network connections.
- One or more virtual machines are no longer responding to user interaction at the console.
Purpose
An outage affecting multiple virtual machines may have broader scope than at first apparent, due to a root cause in some aspect of their common infrastructure. Identifying a pattern of affected components is helpful when attempting to narrow down potential causes.
This article provide guidance to determine what infrastructure multiple virtual machines have in common.
Note: A correlation of multiple issues does not always imply causation of one by another, but may instead suggest a common cause. It is also possible that two issues are unrelated and have no common cause. These results are merely a guide, rather than a certain indication.
Resolution
Process Overview
- Compile two lists:
- A list of all virtual machines that are experiencing the outages
- A list of virtual machines that are not experiencing the outages
- Use the vCenter maps to review the relationships between virtual machines on the two lists and their backing infrastructure.
A vCenter map is a visual representation of the vCenter Server topology. Maps show the relationships between virtual and physical resources available to vCenter Server. This can be used to relate objects to each other. For more information, see the map documentation for your version of vCenter:
- vCenter 5.0: The Using vCenter Maps section of the vCenter Server and Host Management Guide
- vCenter 4.1: The Using vCenter Maps section of the vSphere Datacenter Administration Guide
- vCenter 4.0: The Using vCenter Maps section of the vSphere Basic Administration Guide
- VirtualCenter 2.x: The Resource Maps section of the Basic Administration Guide
To navigate to the Maps tab in vCenter Server:
- Open the vSphere Client and connect to the vCenter Server.
- Provide administrator credentials when prompted.
- Ensure that you are in the Hosts & Clusters view.
- Select the root of the tree on the left pane (the hostname or domain name of the vCenter Server).
- Click the Maps tab.
- Use the intersection of the two lists from step 1, and the maps from step 2, to identify which components of your infrastructure are fully functional and which must be investigated. This chart shows the idea:

Any components that are used by both virtual machines experiencing the outages and virtual machines not experiencing the outages can be deduced to be functional. Any components which are used only by virtual machines experiencing the outages cannot be determined to be functional, and must be investigated further.
Examine each of the components of unknown functionality in turn. This article includes these sections:
Common Element: Host Compute Infrastructure
Note: Host Compute refers to the common elements of one physical host server, where the CPU and RAM are considered together.
- Create a vCenter map showing the host tier:
- Open the Maps tab.
- Ensure that only the Host to VM option is selected.
- Click Apply Relationships.
- Identify the affected and unaffected virtual machines in the map.
- Determine whether the affected virtual machines rely on the same hosts.
- If multiple virtual machines experiencing the outages are all on the same host, investigate further:
Note: If there are virtual machines on the same host that are not experiencing the outage, then the host is not at fault.
- If the host with the affected virtual machines is itself unresponsive, the scope is larger than initially assumed. Troubleshoot the unresponsive host instead. For more information, see Determining why a host is labeled as Not Responding and multiple virtual machines are labeled as Disconnected (1019082).
- Validate whether the problem is specific to a host. Try to migrate the virtual machine to another host that is known to be functional, using vMotion, and observe whether the problem follows the virtual machine. For more information, see Migrating Virtual Machines in the Basic System Administration Guide for your version of ESX/ESXi.
- If troubleshooting multiple virtual machine failures, determine whether they happened on the same physical CPU or CPU package. For more information, see:
- If the host with the affected virtual machines is itself unresponsive, the scope is larger than initially assumed. Troubleshoot the unresponsive host instead. For more information, see Determining why a host is labeled as Not Responding and multiple virtual machines are labeled as Disconnected (1019082).
Multiple virtual machines on the same host may all experience similar symptoms if there is an upstream network or storage issue that only affects the one host, such as a network or SCSI interface connectivity issue. Continue with the Storage and Network Infrastructure sections of this article.
Common Element: Storage Infrastructure
- Create a vCenter map showing the storage tier:
- Open the Maps tab.
- Ensure that only the VM to Datastore option is checked.
- Click Apply Relationships.
- Identify the affected and unaffected virtual machines in the map.
- Determine whether the affected virtual machines rely on the same datastores. If the affected virtual machines rely on multiple datastores, determine whether those datastores use a common storage fabric, array, or spindles.
- If multiple virtual machines experiencing similar symptoms are all on the same datastore(s), investigate further:
Note: If there are virtual machines on the same datastore(s) that are not experiencing the outage, then the common datastore is not at fault.
- If the virtual machines are all on the same datastore and host, and that datastore is shared among multiple hosts, examine the connectivity from that host to the shared storage infrastructure first. The connectivity itself (for example, fibre channel HBAs, iSCSI initiators, NFS network interfaces) could be at fault.
- Determine whether the affected virtual machines are all on the same storage array, storage group/pool, or spindles.
- Investigate and troubleshoot storage infrastructure issues. For more information, see:
Common Element: Network Infrastructure
Consider whether the affected virtual machines utilize a common network infrastructure, and whether there are any unaffected virtual machines using the same network infrastructure:
- Create a vCenter map showing the network tier:
- Open the Maps tab.
- Ensure that only the VM to Network option is checked.
- Click Apply Relationships.
- Identify the affected and unaffected virtual machines in the map.
- Determine whether the affected virtual machines rely on the same network port groups.
- If multiple virtual machines experiencing similar symptoms are utilizing the same port groups, investigate further:
Note: If there are virtual machines on the same network port groups that are not experiencing the outage, then those port groups are not at fault.
- Consider whether the affected virtual machines all use the same upstream physical network connection, and whether there are any unaffected virtual machines, hosts or other physical servers which use the same network link.
- Investigate and troubleshoot network infrastructure issues. For more information, see Troubleshooting virtual machine network connection issues (1003893).
Additional Information
If you have gone through all of the steps and cannot identify a common shared resources between all of the virtual machines, troubleshoot each virtual machine independently. For more information, see Troubleshooting a virtual machine that has stopped responding (1007819).
Tags
See Also
- Determining if virtual machine and ESX host unresponsiveness is caused by hardware issues
- Identifying Fibre Channel, iSCSI, and NFS storage issues on ESX/ESXi hosts
- Verifying that ESX/ESXi virtual machine storage is accessible
- Troubleshooting virtual machine network connection issues
- Troubleshooting VMFS-3 datastore issues
- Troubleshooting a virtual machine that has stopped responding
- Troubleshooting a virtual machine that has stopped responding
- Determining why a single virtual machine is inaccessible on an ESX/ESXi host or vCenter Server system
- Troubleshooting an unresponsive host and multiple Disconnected virtual machines
- Virtual machine and ESX/ESXi host outage pattern analysis across physical CPUs
Request a Product Feature
- Updated:
- Categories:
- Languages:
- Product Family:
- Product(s):
- Product Version(s):

