Knowledge Base

|
Implementing vSphere Metro Storage Cluster using HP LeftHand Multi-Site (2020097)
Purpose
This article provides information about deploying a vSphere Metro Storage Cluster (vMSC) across two datacenters or sites using HP LeftHand Multi-Site storage. With vSphere 5.0, a Storage Virtualization Device can be supported in a Metro Storage Custer configuration.
Details
What is vMSC?
vSphere Metro Storage Cluster (vMSC) is a new certified configuration for stretched storage cluster architectures. A vMSC configuration is designed to maintain data availability beyond a single physical or logical site. A storage device configured in the MSC configuration is supported after successful vMSC certification. All supported storage devices are listed on the VMware Storage Compatibility Guide.
What is HP LeftHand Multi-Site?
HP LeftHand storage is a scale out, clustered, iSCSI storage solution. HP LeftHand Multi-Site is a feature of the LeftHand operating system, commonly known as SAN/iQ software, which is included with all HP LeftHand SANs. This technology allows for storage clusters to be stretched across sites to provide high availability beyond failure domains defined by the administrator. Traditionally, in Metro Storage Cluster configurations, these failure domains are distinct geographic locations. However,the technology can be used to protect against the failure of a logical site that may be a rack, room, or floor in the same building, as well as buildings within a campus or data centers that are separated by as much as 100 KM or more, provided the link satisfies the bandwidth and latency requirements established by VMware and HP.
HP LeftHand Failover Manager
The HP LeftHand Failover Manager (FOM) is a SAN/iQ component provisioned as a virtual machine that is typically deployed at a third site. In HP LeftHand Multi-Site solutions, the Failover Manager allows for access to the storage volumes to be maintained in the event of a site failure or inter-site link (ISL) failure.
Configuration Requirements
- ESXi hosts in vMSC configurations should be configured with at least two distinct, isolated IP networks. One of these networks should be dedicated as the storage network. The storage network will handle iscsi traffic between ESXi hosts and the LeftHand SAN as well as replication traffic between the storage nodes in the cluster to support Network RAID replication. The second network (VM network) will support virtual machine traffic as well management functions for the ESXi hosts. Users may choose to configure additional networks for other functionality such as vMotion. This is recommended as a best practice but is not a strict requirement of a Multi-Site/vMSC configuration. Additionally, users may choose to further segregate IP traffic by separating host management from virtual machine traffic for example.
- The maximum round trip latency on the storage network between sites should not exceed 2 milliseconds (ms) RTT.
- The storage network must support a minimum of 1gbps throughput between the sites. Please refer to the HP LeftHand Multi-Site User’s Guide for details on recommended sizing for inter-site links in Multi-Site configurations.
- Network connectivity between the FOM and the storage nodes should support bandwidth of at least 100 mbps and round trip latency should not exceed 50ms RTT.
- The ESXi hosts in both data centers must have a private network on the same IP subnet and broadcast domain.
- Any IP subnet used by the virtual machine must be accessible from ESXi hosts in both datacenters. This requirement is important so that clients accessing virtual machines running on ESXi hosts on both sides are able to function smoothly upon any VMware HA triggered virtual machine restart events.
- When there is one or more node failure at the backend, the I/O response time must be less than 60 seconds
- For vMSC certified configurations, sites should be connected via a redundant storage network consisting of two physical links.
- The data storage locations, including the boot device used by the virtual machines, must be active and accessible from ESXi hosts in both datacenters.
- vCenter Server must be able to connect to ESXi hosts in both datacenters.
- The VMware datastores for the virtual machines running in the ESXi Cluster are provisioned on Network RAID-10 volumes.
- vMSC configurations with HP LeftHand should use single subnet, single VIP network design
- The maximum number of hosts in the HA cluster must not exceed 32 hosts.
- An HP LeftHand Failover Manager virtual machine should be configured in a third site and must be able to communicate with the LeftHand storage nodes at both sides of the cluster. To survive the total failure of either site in a two-site Multi-Site configuration, a FOM must be deployed in a third site.
- vMSC certification testing for HP LeftHand was conducted with SAN/iQ 9.5 and ESXi 5.0.
- This document describes requirements and supported configurations specifically for HP LeftHand Multi-Site in a vMSC environment. HP may support Multi-Site configurations beyond those outlined in this document.
- All management, vMotion, and VM networks should be configured per VMware best practices.
Solution Overview
The HP LeftHand Multi-Site solution uses SAN/iQ Network RAID technology to stripe two copies of data across a storage cluster. When deployed in a Multi-Site configuration, SAN/iQ ensures that a full copy of the data resides on each site, or each side of the cluster. In Multi-Site/vMSC configurations, data remains available in the event of a site failure of loss of link between sites.

Managing Inter-Site Links
The Inter-Site link is a crucial component of any vMSC solution. The minimum required bandwidth for HP LeftHand Multi-Site is 1 Gbps and latency should not exceed 2ms RTT for optimal performance. Larger configurations may require additional bandwidth for the ISL. For more information on recommended sizing for inter-site links in Multi-Site configurations, see the HP LeftHand Multi-Site User’s Guide.
Sample tested scenarios
| Scenario | HP LeftHand P4000 Array Behavior | VMware HA Behavior |
| Single storage node single path failure | P4000 node path failover occurs. All volumes remain connected. All ESXi sessions remain active. | No impact observed |
| ESXi Single storage path failure | No impact on volume availability. ESXi storage path fails over to the alternative path. All sessions remain active. | No impact observed |
| Site-1 Single Storage node failure | Volume availability remains unaffected. ESXi iSCSI sessions affected by node failure, failover to surviving nodes. After failed node comes back online, all affected volumes resync automatically. Quorum is maintained. Note: Volumes associated with failed node may or may not show unprotected in Centralized management Console depending on the Data Protection Level configured for the volume. | No impact observed |
| Site-2 Single Storage node failure | Volume availability remains unaffected. ESXi iSCSI sessions affected by node failure, failover to surviving nodes. After failed node comes back online, all affected volumes resync automatically. Quorum is maintained. Note: Volumes associated with failed node may or may not show unprotected in Centralized management Console depending on the Data Protection Level configured for the volume. | No impact observed |
| Site-1 All storage node failure | Volume availability remains unaffected. ESXi iSCSI sessions affected by node failure, failover to surviving nodes. After failed node comes back online, all affected volumes resync automatically. Quorum is maintained. Note: Volumes associated with failed node may or may not show unprotected in Centralized management Console depending on the Data Protection Level configured for the volume. | No impact observed |
| Site-2 All storage node failure | Volume availability remains unaffected. ESXi iSCSI sessions affected by node failure, failover to surviving nodes. After failed node comes back online, all affected volumes resync automatically. Quorum is maintained. Note: Volumes associated with failed node may or may not show unprotected in Centralized management Console depending on the Data Protection Level configured for the volume. | No impact observed |
| Failover Manager Failure | No impact on volume availability. All sessions remain active. | No impact observed |
Complete Site 1 failure, including ESXi and storage arrays | Volume availability remains unaffected. Quorum is maintained. iSCSI sessions to surviving ESXi nodes remain active. After failed node comes back online, all affected volumes resync automatically. | Virtual machines on failed ESXi nodes fail. HA restarts failed virtual machines on ESXi hosts on Site 2. |
Complete Site 2 failure, including ESXi and storage arrays | Volume availability remains unaffected. Quorum is maintained. iSCSI sessions to surviving ESXi nodes remain active. After failed node comes back online, all affected volumes resync automatically. | Virtual machines on failed ESXi nodes fail. HA restarts failed virtual machines on ESXi hosts on Site 1. |
| Single ESXi failure (shutdown) | No impact. Array continues to function normally. | Virtual machines on failed ESXi node fail. HA restarts failed virtual machines on surviving ESXi hosts. |
| Multiple ESXi host management network failure | No impact. Array continues to function normally. | No impact. As long is the storage heartbeat is on and virtual machines are accessible, HA does not initiate failover |
| Single Storage Inter-Site Link failure | No impact. Array continues to function normally. Note: Redundant Inter-Site Links for storage network are required for this use case. | No Impact observed |
| Site 1 and Site 2 simultaneous failure (shutdown) and restoration | Arrays boot up and resync. All volumes become available. All iSCSI sessions to ESXi hosts are re-established and virtual machines restarted successfully. As a best practice, P4000 arrays should be powered on first and allow the LUNs to become available before powering on the ESXi hosts. | No Impact observed |
| Management ISL failure | No impact to P4000 array. Volumes remain available | If the HA host isolation response is set to Leave Powered On, virtual machines at each site continue to run as storage heartbeat is still active. Partitioned Hosts on site that does not have a Fault Domain Manager elect a new Master. |
| CMC-Management Server failure | No impact. Array continues to function normally. Array management functions however cannot be performed until CMC is up and running again. | No Impact observed |
| vCenter Server failure | No impact. Array continues to function normally | No Impact on HA. However, the DRS rules cannot be applied. |
Request a Product Feature
- Updated:
- Categories:
- Languages:
- Product Family:
- Product(s):
- Product Version(s):

