VMware vSAN – Stretched Cluster & 2 Node Guide
VMware Virtual SAN 6.1, shipping with vSphere 6.0 Update 1, introduced a new feature called VMware Virtual SAN Stretched Cluster. Virtual SAN Stretched Cluster is a specific configuration implemented in environments where disaster/downtime avoidance is a key requirement. This guide was developed to provide additional insight and information for installation, configuration and operation of a Virtual SAN Stretched Cluster infrastructure in conjunction with VMware vSphere. This guide will explain how vSphere handles specific failure scenarios and discuss various esign considerations and operational procedures.
Virtual SAN Stretched Clusters with Witness Host refers to a deployment where a user sets up a Virtual SAN cluster with 2 active/active sites with an identical number of ESXi hosts distributed evenly between the two sites. The sites are connected via a high bandwidth/low latency link.
The third site hosting the Virtual SAN Witness Host is connected to both of the active/active data-sites. This connectivity can be via low andwidth/high latency links.
Each site is configured as a Virtual SAN Fault Domain. The nomenclature used to describe a Virtual SAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness hosts at site C. Data sites are where virtual machines are deployed. The minimum supported configuration is 1+1+1(3 nodes). The maximum configuration is 15+15+1
(31 nodes). In Virtual SAN Stretched Clusters, there is only one witness host in any configuration.
A virtual machine deployed on a Virtual SAN Stretched Cluster will have one copy of its data on site A, a second copy of its data on site B and any witness components placed on the witness host in site C. This configuration is achieved through fault domains alongside hosts and VM groups, and affinity rules. In the event of a complete site failure, there will be a full copy of the virtual machine data as well as greater than 50% of the components available. This will allow the virtual machine to remain available on the Virtual SAN datastore. If the virtual machine needs to be restarted on the other site, vSphere HA will handle this task.
Virtual SAN Stretched Cluster configurations require vSphere 6.0 Update 1 (U1) or greater. This implies both vCenter Server 6.0 U 1 and ESXi 6.0 U1. This version of vSphere includes Virtual SAN version 6.1. This is the minimum version required for Virtual SAN Stretched Cluster support.
vSphere & Virtual SAN
Virtual SAN version 6.1 introduced features including both All-Flash and Stretched Cluster functionality. There are no limitations on the edition of vSphere used for Virtual SAN. However, for Virtual SAN Stretched Cluster functionality, vSphe re DRS is very desi rable . DRS will provide initial placement assistance, and will also automatically migrate virtual machines to their co rrect site in accordance to Host/VM affinity rules. It can also help will locating virtual machines to their co rrect site when a site recovers after a failure. Otherwise the administrator will have to manually carry out these tasks. Note that DRS is only available in Enterprise edition and higher of vSphere.
Hybrid and All-Flash Support
Virtual SAN Stretched Cluster is supported on both hybrid configurations (hosts with local storage comprised of both magnetic disks for capacity and flash devices for cache) and all-flash configurations (hosts with local storage made up of flash devices for capacity and flash devices for cache).
VMware supports Virtual SAN Stretched Cluster with the v2 on-disk format only. The v1 on-disk format is based on VMFS and is the original on-disk format used for Virtual SAN. The v2 on-disk format is the version which comes by default with Virtual SAN version 6.x. Customers that upgraded from the original Virtual SAN 5.5 to Virtual SAN 6.0 may not have upgraded the on-disk format for v1 to v2, and are thus still using v1. VMware ecommends upgrading the on-disk format to v2 for improved performance and scalability, as well as stretched cluster support. In Virtual SAN 6.2 clusters, the v3 on-disk format allows for additional features, discussed later, specific to 6.2.
Witness Host as an ESXi VM
Both physical ESXi hosts and virtual ESXi hosts (nested ESXi) are supportedfor the witness host. VMware provides a Witness Appliance for those customers who wish to use the ESXi VM. A witness host/VM cannot be shared between multiple Virtual SAN Stretched Clusters.
Features Supported on VSAN but not VSAN Stretched
The following are a list of products and features support on Virtual SAN but not on a stretched cluster implementation of Virtual SAN.
- SMP-FT, the new Fault Tolerant VM mechanism introduced in vSphere 6.0, is supported on standard VSAN 6.1 deployments, but it is not supported on stretched cluster VSAN deployments at this time. *The exception to this rule, is when using 2 Node configurations in the same physical location.
- The maximum value for NumberOfFailuresToTolerate in a Virtual SAN Stretched Cluster configuration is 1. This is the limit due to the maximum number of Fault Domains being 3.
- In a Virtual SAN Stretched Cluster, there are only 3 Fault Domains. These are typically referred to as the Preferred, Secondary, and Witness Fault Domains. Standard Virtual SAN configurations can be comprised of up to 32 Fault Domains.
- The Erasure Coding feature introduced in Virtual SAN 6.2 requires 4 Fault Domains for RAID5 type protection and 6 Fault Domains for RAID6 type protection. Because Stretched Cluster configurations only have 3 Fault Domains, Erasure Coding is not supported on Stretched Clusters at this time.
Features Supported on vMSC but not VSAN Stretched
The following are a list of products and features support on vSphere Metro Storage Cluster (vMSC) but not on a stretched cluster implementation of Virtual SAN.
- RR-FT, the original (and now deprecated) Fault Tolerant mechanism for virtual machines is supported on vSphere 5.5 for vMSC. It is not supported on Setched cluster Virtual SAN.
- Note that the new SMP-FT, introduced in vSphere 6.0 is not supported on either vMSC or stretched cluster VSAN, but does work on standard VSAN deployments.
Virtual SAN Stretched Clusters Versus Fault Domain
A common question is how stretched cluster differs from Fault Domains, which is a Virtual SAN feature that was introduced with Virtual SAN version 6.0. Fault domains enable what might be termed “rack awareness” where the components of virtual machines could be distributed amongst multiple hosts in multiple racks, and should a rack failure event occur, the virtual machine would continue to be available. However, these racks would typically be hosted in the same data center, and if there was a data center wide event, fault domains would not be able to assist with virtual machines availability.
Stretched clusters essentially build on what fault domains did, and now provide what might be termed “data center awareness”. Virtual SAN Stretched Clusters can now provide availability for virtual machines even if a data center suffers a catastrophic outage.
The Witness Host
The witness host is a dedicated ESXi host (or appliance) whose purpose is to host the witness component of virtual machines objects. The witness must have connection to both the master Virtual SAN node and the backup Virtual SAN node to join the cluster. In steady state operations, the master node resides in the “preferred site”; the backup node resides in the “secondary site”. Unless the witness host connects to both the master and the backup nodes, it will not join the Virtual SAN cluster.
Read Locality in Virtual SAN Stretched Cluster
In traditional Virtual SAN clusters, a virtual machine’s read operations are distributed across all replica copies of the data in the cluster. In the case of a policy setting of NumberOfFailuresToTolerate=1, which results in two copies of the data, 50% of the reads will come from replica1 and 50% will come from replica2. In the case of a policy setting of Number Of Failures To Tolerate=2 in non-stretched Virtual SAN clusters, results in three copies of the data, 33% of the reads will come from replica1, 33% of the reads will come from replica2 and 33% will come from replica3.
In a Virtual SAN Stretched Cluster, we wish to avoid increased latency caused by reading across the inter-site link. To insure that 100% of reads, occur in the site the VM resides on, the read locality mechanism was introduced. Read locality overrides the NumberOfFailuresToTolerate=1 policy’s behavior to distribute reads across the two data sites.
DOM, the Distributed Object Manager in Virtual SAN, takes care of this. DOM is responsible for the creation of virtual machine storage objects in the Virtual SAN cluster. It is also responsible for providing distributed data access paths to these objects. There is a single DOM owner per object. There are 3 roles within DOM; Client, Owner and Component Manager. The DOM Owner coordinates access to the object, including reads, locking and object configuration and reconfiguration. All objects changes and writes also go through the owner. The DOM owner of an object will now take into account which fault domain the owner runs in a Virtual SAN Stretched Cluster configuration, and will read from the replica that is in the same domain.
There is now another consideration with this read locality. One must avoid unnecessary vMotion of the virtual machine between sites. Since the read cache blocks are stored on one site, if the VM moves around freely and ends up on the remote site, the cache will be cold on that site after the move. Now there will be sub-optimal performance until the cache is warm again. To avoid this situation, soft affinity rules are used to keep the VM local to the same site/fault domain where possible. The steps to configure such rules will be shown in detail in the vSphere DRS section of this guide.
Virtual SAN 6.2 introduced Client Cache, a mechanism that allocates 0.4% of host memory, up to 1GB, as an additional read cache tier. Virtual machines leverage the Client Cache of the host they are running on. Client Cache is not associated with Stretched Cluster read locality, and runs ndependently.
VMware vCenter Server
A Virtual SAN Stretched Cluster configuration can be created and managed by a single instance of VMware vCenter Server. Both the Windows version and the Virtual Appliance version (Linux) are supported for configuration and management of a Virtual SAN Stretched Cluster.
A Witness Host
In a Virtual SAN Stretched Cluster, the witness components are only ever placed on the witness host. Either a physical ESXi host or a special witness appliance provided by VMware, can be used as the witness host.
If a witness appliance is used for the witness host, it will not consume any of the customer’s vSphere licenses. A physical ESXi host that is used as a witness host will need to be licensed accordingly, as this can still be used to provision virtual machines should a customer choose to do so.
It is important that witness host is not added to the VSAN cluster. The witness host is selected during the creation of a Virtual SAN Stretched Cluster.
The witness appliance will have a unique identifier in the vSphere web client UI to assist with identifying that a host is in fact a witness appliance (ESXi in a VM). It is shown as a “blue” host, as highlighted below:
Note this is only visible when the appliance ESXi witness is deployed. If a physical host is used as the witness, then it does not change its appearance in the web client. A witness host is dedicated for each stretchd cluster.
When Virtual SAN is deployed in a stretched cluster across multiple sites using fault domains, there are certain networking requirements that must be adhered to.
Layer 2 and Layer 3 Support
Both Layer 2 (same subnet) and Layer 3 (routed) configurations are used in a recommended Virtual SAN Stretched Cluster deployment.
- VMware recommends that Virtual SAN communication between the data sites be over stretched L2.
- VMware recommends that Virtual SAN communication between the data sites and the witness site is routed over L3.
Note: A common question is whether L2 for Virtual SAN traffic across all sites is supported. There are some considerations with the use of a stretched
L2 domain between the data sites and the witness site, and these are discussed in further detail in the design considerations section of this guide. Another common question is whether L3 for VSAN traffic across all sites is supported. While this can work, it is not the VMware recommended network topology for Virtual SAN Stretched Clusters at this time.
Virtual SAN traffic between data sites is multicast. Witness traffic between a data site and the witness site is unicast.
Supported Geographical Distances
For VMware Virtual SAN Stretched Cluste rs, geographical distances are not a support concern. The key requirement is the actual latency numbers between sites.
Data Site to Data Site Network Latency
Data site to data site network refers to the communication between non-witness sites, in other words, sites that run virtual machines and hold virtual machine data. Latency or RTT (Round Trip Time) between sites hosting virtual machine objects should not be greater than 5msec (< 2.5msec one-way).
Data Site to Data Site Bandwidth
Bandwidth between sites hosting virtual machine objects will be workload dependent. For most workloads, VMware recommends a minimum of 10Gbps or greater bandwidth between sites. In use cases such as 2 Node configurations for Remote Office/Branch Office deployments, dedicated 1Gbps bandwidth can be sufficient with less than 10 Virtual Machines.
Please refer to the Design Considerations section of this guide for further details on how to determine bandwidth requirements.
Data Site to Witness Networklatency
This refers to the communication between non-witness sites and the witness site.
In most Virtual SAN Stretched luster configurations, latency or RTT (Round Trip Time) between sites hosting VM objects and the witness nodes should not
be greater than 200msec (100msec one-way).
In typical 2 Node configurations, such as Remote Office/Branch Office deployments, this latency or RTT is supported up to 500msec (250msec one-way).
The latency to the witness is dependent on the number of objects in the cluster. VMware recommends that on Virtual SAN Stretched lusterconfigurations up to 10+10+1, a latency of less than or equal to 200 milliseconds is acceptable, although if possible, a latency of less than or equal to 100 milliseconds is preferred. For configurations that are greater than 10+10+1, VMware recommends a latency of less than or equal to 100 milliseconds is required.
Data Site to Witness Network Bandwidth
Bandwidth between sites hosting VM objects and the witness nodes are dependent on the number of objects residing on Virtual SAN. It is important to size data site to witness bandwidth appropriately for bo th availability and growth. A standard rule of thumb is 2Mbps for every 1000 objects on Virtual SAN.
Please refer to the Design Considerations section of this guide for further details on how to determine bandwidth requirements.
Inter-Site MTU Consistency
It is important to maintain a consistent MTU size between data nodes and the witness in a Stretched Cluster configuration. Ensuring that each VMkernel interface designated for Virtual SAN traffic, is set to the same MTU size will prevent traffic fragmentation. The Virtual SAN Health Check checks for a uniform MTU size across the Virtual SAN data network, and reports on any inconsistencies.
Virtual Machines Per Host
The maximum number of virtual machines per ESXi host is unaffected by the Virtual SAN Stretched Cluster configuration. The maximum is the same as for normal VSAN deployments.
VMware recommends that customers should run their hosts at 50% of maximum number of virtual machines supported in a standard Virtual SAN cluster to accommodate a full site failure. In the event of full site failures, the virtual machines on the failed site can be restarted on the hosts in the surviving site.
Hosts Per Cluster
The minimum number of hosts in a Virtual SAN Stretched Cluster is 3. In such a configuration, site 1 will contain a single ESXi host, site 2 will contain a single ESXi host and then there is a witness host at the third site, the witness site. The nomenclature for such a configuration is 1+1+1. This is commonly referred to as a 2 Node configuration.
The maximum number of hosts in a Virtual SAN Stretched Cluster is 31. Site 1 contains ESXi 15 hosts, site 2 contains 15 ESXi hosts, and the witness host on the third site makes 31. This is referred to as a 15+15+1 configuration.
There is a maximum of 1 witness host per Virtual SAN Stretched Cluster. The witness host requirements are discussed in the design considerations section of this guide. VMware provides a fully supported witness virtual appliance, in Open Virtual Appliance (OVA) format, for customers who do not wish to dedicate a physical ESXi host as the witness. This OVA is essentially a pre-licensed ESXi host running in a virtual machine, and can be deployed on a physical ESXi host on the third site.
Number Of Failures to Tolerate
Because Virtual SAN Stretched Cluster configurations effectively have 3 fault domains, the Number Of Failures To Tolerate (FTT) policy setting, has a maximum of 1 for objects. Virtual SAN cannot comply with FTT values that are greater than 1 in a stretched cluster configuration.
Other policy settings are not impacted by deploying VSAN in a stretched cluster configuration and can be used as per a non-stretched VSAN cluster.
Fault domains play an important role in Virtual SAN Stretched Cluster. Similar to the Number Of Failures To Tolerate (FTT) policy setting discussed previously, the maximum number of fault domains in a Virtual SAN Stretched Cluster is 3. The first FD is the “preferred” data site, the second FD is the “secondary” data site and the third FD is the witness host site.