Knowledge Base

The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
 
Search the VMware Knowledge Base (KB)   View by Article ID
 

Implementing VMware vSphere Metro Storage Cluster (vMSC) using EMC VPLEX (2007545)

Purpose

This article provides information about deploying a Metro Storage Cluster across two data centers using EMC VPLEX Metro 5.0 and above. With vSphere 5.x/6.0, a Storage Virtualization Device can be supported in a Metro Storage cluster configuration.

Resolution

What is VPLEX?

EMC VPLEX is a federated solution that provides simultaneous access to storage devices at two geographically separate sites. One or more VPLEX Distributed Virtual Volumes can be provisioned for sharing between the two site's ESXi hosts. These volumes can be used as Raw Device Mapping (RDM) disks or as a shared VMFS datastore. The RDM can be used for exclusive access by the virtual machine and the VMFS datastore can be used for provisioning virtual machines and carving out additional vDisks.

The VPLEX cluster at each site itself is designed to be highly available. A VPLEX cluster can scale from two directors to eight directors. Each director is protected by redundant power supplies, fans, and interconnects, making the VPLEX highly resilient.

What is vMSC?

vSphere Metro Storage Cluster (vMSC) is new configuration. A storage device configured in the MSC configuration is supported after successful vMSC certification. All supported storage devices are listed on the VMware Storage Compatibility Guide.

VPLEX Witness

VPLEX Witness is a VPLEX component provisioned as a virtual machine on an ESXi host that is typically deployed at a third site. Deploying a VPLEX Metro Solution with VPLEX Witness gives continuous availability to the storage volumes in case of a site failure or inter-cluster link failure.

When a VPLEX Distributed Virtual Volume is provisioned, a per volume preferred site flag may be enabled or Distributed Virtual Volumes with the same preferred site settings may be placed in the same consistency group. The preference criteria can be things like the availability, presence of monitoring staff, or location. VPLEX Witness failure handling semantics apply only to the Distributed Virtual Volumes within a consistency group.

Configuration Requirements

These requirements must be satisfied to support this configuration:
  • The maximum round trip latency on both the IP network and the inter-cluster network between the two VPLEX clusters must not exceed 5 milliseconds round-trip-time for a non-uniform host access configuration and must not exceed 1 millisecond round-trip-time for a uniform host access configuration. The IP network supports the VMware ESXi hosts and the VPLEX Management Console. The interface between two VPLEX clusters can be Fibre Channel or IP. Round-trip-time for a non-uniform host access configuration is now supported up to 10 milliseconds for VPLEX Geosynchrony 5.2 and later and ESXi 5.5 and later with NMP and PowerPath. For more information on detailed supported configuration, see the latest VPLEX EMC Simple Support Matrix (ESSM) on support.emc.com.
  • For management and vMotion traffic, the ESXi hosts in both data centers must have a private network on the same IP subnet and broadcast domain. Preferably management and vMotion traffic are on separate networks.
  • Any IP subnet used by the virtual machine that resides on it must be accessible from ESXi hosts in both data centers. This requirement is important so that clients accessing virtual machines running on ESXi hosts on both sides are able to function smoothly upon any VMware HA triggered virtual machine restart events.
  • The data storage locations, including the boot device used by the virtual machines, must be active and accessible from ESXi hosts in both data centers.
  • vCenter Server must be able to connect to ESXi hosts in both data centers.
  • The VMware datastore for the virtual machines running in the ESX Cluster are provisioned on Distributed Virtual Volumes.
  • The maximum number of hosts in the HA cluster must not exceed 32 hosts for 5.x and 64 hosts for 6.0.
  • The configuration option auto-resume for VPLEX Cross-Connect consistency groups must be set to true.

Notes:

  • The ESXi hosts forming the VMware HA cluster can be distributed on two sites. HA Clusters can start a virtual machine on the surviving ESXi host, and the ESXi host access the Distributed Virtual Volume through storage path at its site.
  • VPLEX 5.0 and later version and ESXi 5.x/6.0 are tested in this configuration with the VPLEX Witness.

For information on any additional requirement for VPLEX Distributed Virtual Volumes, see the EMC VPLEX best practices document.

Note: The preceding links were correct as of June 17, 2015. If you find a link is broken, provide feedback and a VMware employee will update the link.

Solution Overview

A VMware HA/DRS cluster is created across the two sites using ESXi 5.0 hosts and managed by vCenter Server 5.0. The vSphere Management, vMotion, and virtual machine networks are connected using redundant a network between the two sites. It is assumed that the vCenter Server managing the HA/DRS cluster can connect to the ESXi hosts at both sites. This diagram provides an overview:
 
 
Based on the host SAN connections to the VPLEX storage cluster, there are two different types of deployments possible:
  • Non-uniform Host Access – This type of deployment involve the hosts at either site see the storage volumes through the same site storage cluster only. This diagram provides an example:


  • Uniform Host Access (Cross-Connect) – This deployment involves establishing a front-end SAN across the two sites, so that the hosts at one site could see the storage cluster at the same site as well as the other site. These best practices must be performed for this type of deployment:
    • The front-end zoning should be done in such a manner that an HBA port is zoned to either the local or the remote VPLEX cluster.
    • The path policy should be set to FIXED to avoid writes to both legs of the distributed volume by the same host.
This diagram provides an example:


A VPLEX Metro solution federated across the two data centers provides the distributed storage to the ESXi hosts. It is assumed that the ESXi boot disk is located on the internal drives specific to the hosts and not on the Distributed Virtual Volume itself.
The virtual machine is ideally run on the preferred site of the Distributed Virtual Volume.

This table outlines tested scenarios:
 

Scenario

VPLEX Behavior

Impact/Observed VMware HA Behavior

Single VPLEX back-end (BE) path failure

VPLEX continues to operate using an alternate path to the same BE Array. Distributed Virtual Volumes exposed to the ESXi hosts have no impact.

None.

Single front-end (FE) path failure

The ESXi server is expected to use alternate paths to the Distributed Virtual Volumes.

None.

BE Array failure at site-A

VPLEX continues to operate using the array at site-B. When the array is recovered from the failure, the storage volume at site-A is resynchronized from site-B automatically.

None.

BE array failure at site-B

VPLEX continues to operate using the array at site-A. When the array is recovered from the failure, the storage volume at site-B is resynchronized from site-A automatically.

None.

VPLEX director failure

VPLEX continues to provide access to the Distributed Virtual Volume through other directors on the same VPLEX cluster.

None.

Complete site-A failure
 
(The failure includes all ESXi hosts and the VPLEX cluster at site-A.)

VPLEX continues to serve I/O on the surviving site (site-B). When the VPLEX at the failed site (site-A) is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-B).

Virtual machines running at the failed site fail. VMware HA automatically restarts them on the surviving site.

Complete site-B failure

(The failure includes all ESXi hosts and the VPLEX cluster at site-B.)

VPLEX continues to serve I/O on the surviving site (site-A). When the VPLEX at site-B is restored, the Distributed Virtual Volumes are synchronized automatically from the active site (site-A).

Virtual machines running at the failed site fail. VMware HA automatically restarts them on the surviving site.

Multiple ESXi host
failure(s) – Power off

None.

VMware HA restarts the virtual machines on any of the surviving ESXi hosts within the VMware HA Cluster.

Multiple ESXi host
failure(s) – Network disconnect

None.

HA continues to exchange cluster heartbeat through the shared datastore. No virtual machine failovers occur.

ESXi host experiences APD (All Paths down) –
 
Encountered when the ESXi host loses access to its storage volumes (in this case, VPLEX Volumes).

None.

In an APD (All Paths Down) scenario, the ESXi host must be rebooted to recover. If the ESXi Server is restarted, this will cause VMware HA to restart the failed virtual machines on other surviving ESXi Servers within the VMware HA cluster.

VPLEX inter-site link failure; vSphere cluster management network intact

VPLEX transitions Distributed Virtual Volumes on the non-preferred site to the I/O failure state. On the preferred site, the Distributed Virtual Volumes continue to provide access.

Virtual machines running in preferred site are not impacted.

Virtual machines running in non-preferred site experience I/O failure and show a PDL error. HA fails over these virtual machines on the other site.
In a uniform host access configuration, the virtual machines run without any impact since the ESXi host can still access the distributed volume through the preferred site.
VPLEX cluster failure
 
(The VPLEX at either site-A or site-B has failed, but ESXi and other LAN/WAN/SAN components are intact.)

The I/O continues to be served on all the volumes on the surviving site.

The ESXi hosts located at the failed site experience an APD condition. The ESXi hosts needs to be rebooted to recover from the failure.
In a uniform host access configuration, the virtual machines run without any impact since the ESXi host can still access the distributed volume through the preferred site.

Complete dual site failure

Upon restoration of the two sites, the VPLEX continues to serve I/O. The best practice is to bring up the BE storage arrays first, followed by VPLEX.

All virtual machines fail since both sites are down.
The ESXi hosts should be brought up only after the VPLEX is fully recovered and the Distributed Virtual Volumes are synchronized.
On powering on the ESXi hosts at each site, the virtual machines are restarted and resume normal operations.
The same impact occurs in a uniform hosts access configuration since both sites are down.
Director failure at one site
(preferred site for a given Distributed Virtual Volume) and BE array failure at the other site (secondary site for a given Distributed Virtual Volume)

The surviving VPLEX directors within the VPLEX cluster with the failed director continue to provide access to the Distributed Virtual Volumes.

VPLEX continues to provide access to the Distributed Virtual Volumes using the preferred site BE array.

None.

VPLEX inter-site link intact; vSphere cluster management network failure

None.

Virtual machines on each site continue running on their respective hosts since the HA cluster heartbeats are exchanged through the shared datastore.

VPLEX inter-site link failure; vSphere cluster management network failure

VPLEX fails I/O on the non-preferred site for a given Distributed Virtual Volume. The volumes continue to have access on the Distributed Virtual Volume on its preferred site.

For virtual machines running in preferred site, powered-on virtual machines continue to run.

This is an HA split-brain situation. The non-preferred site thinks that the hosts of the preferred site are dead and tries to restart the powered-on virtual machines of the preferred site.

For virtual machines running in a non-preferred site, these virtual machines see their I/O as failed and the virtual machines fail. These virtual machines can be registered and restarted on the preferred site.
In a uniform hosts access configuration, the virtual machines run without any impact since the ESXi host can still access the distributed volume through the preferred site. The HA heartbeats are exchanged through the datastore.

VPLEX Storage volume is unavailable (for example, it is accidentally removed from the storage view or the ESXi initiators are accidentally removed from the storage view)

VPLEX continues to serve I/O on the other site where the Volume is available.

If the I/O is running on the lost device, ESXi detects a PDL (Permanent Device Loss) condition. The virtual machine is killed by virtual machine monitor and restarted by HA on the other site.

VPLEX inter-site WAN link failure and simultaneous Cluster Witness to site-B link failure

The VPLEX fails I/O on the Distributed Virtual Volumes at site-B and continue to serve I/O on site-A.

It has been observed that the virtual machines at the site-B fail. They can be restarted at site-A .
In a uniform hosts access configuration, the virtual machines run without any impact since the ESXi hosts at Site-B can still access the distributed volume through Site-A.

VPLEX inter-site WAN link failure and simultaneous Cluster Witness to site-A link failure

The VPLEX fails I/O on the Distributed Virtual Volumes at site-A and continues to serve I/O on site-B.

It has been observed that the virtual machines at the site-A fail. They can be restarted at site-B.
In a uniform hosts access configuration, the virtual machines run without any impact since the ESXi hosts at Site-A can still access the distributed volume through Site-B.

VPLEX Cluster Witness failure

VPLEX continues to serve I/O at both sites.

None.

VPLEX Management Server failure

None.

None.

vCenter Server failure

None

No impact to the running virtual machines or HA. However, the DRS rules and virtual machine placements are not in effect.

Additional Information

See Also

Update History

02/06/2015 - Added ESXi 5.1 and 5.5 to Products 06/12/2015 - Added ESXi 6.0 to Products 12/01/2015 - Updated Configuration Requirements section with 6.0 maximum HA cluster host limit

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 110 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 110 Ratings
Actions
KB: