Chapter 3 - vSAN Installation and Configuration
This chapter describes in detail the installation and configuration process, as well as all initial preparation steps that you might need to consider before proceeding with a vSAN cluster deployment. You will find information on how to correctly set up network and storage devices, as well as some helpful tips and tricks on how to deploy the most optimal vSAN configuration.
Network connectivity is the heart of any vSAN cluster. vSAN cluster hosts use the network for virtual machine (VM) I/O and also communicate their state between one another. Consistent and correct network configuration is key to a successful vSAN deployment. Because the majority of disk I/O will either come from a remote host, or will need to go to a remote host, VMware recommends leveraging a 10 GbE infrastructure. Note that although 1 GbE is fully supported in hybrid configurations, it could become a bottleneck in large-scale deployments.
VMware vSphere provides two different types of virtual switch, both of which are fully supported with vSAN:
- The VMware standard virtual switch (VSS) provides connectivity from VMs and VMkernel ports to external networks but is local to an ESXi host.
- A vSphere Distributed Switch (VDS) gives central control of virtual switch administration across multiple ESXi hosts. A VDS can also provide additional networking features over and above what a VSS can offer, such as network I/O control (NIOC) that can provide quality of service (QoS) on your network. Although a VDS normally requires a particular vSphere edition, vSAN includes a VDS independent license of the vSphere edition you are running.
VMkernel Network for vSAN
All ESXi hosts participating in a vSAN network need to communicate with one another. A new VMkernel type called vSAN traffic was introduced in vSphere 5.5. A vSAN cluster will not successfully form until a vSAN–VMkernel port is available on each ESXi host participating in the vSAN cluster. The vSphere administrator must create a vSAN–VMkernel port on each ESXi host in the cluster before the vSAN cluster forms (see Figure 3.1).
Figure 3.1 - VMkernel interfaces used for intra-vSAN cluster traffic
Without a VMkernel network for vSAN, the cluster will not form successfully. If communication is not possible between the ESXi hosts in the vSAN cluster, only one ESXi host will join the vSAN cluster. Other ESXi hosts will not be able to join. This will still result in a single vSAN datastore, but each host can only see itself as part of that datastore. A warning message will display when there are communication difficulties between ESXi hosts in the cluster. If the cluster is created before the VMkernel ports are created, a warning message is also displayed regarding communication difficulties between the ESXi hosts. Once the VMkernel ports are created and communication is established, the cluster will form successfully.
vSAN Network Configuration: VMware Standard Switch
With a VSS, creating a port group for vSAN network traffic is relatively straightforward. By virtue of installing an ESXi host, a VSS is automatically created to carry ESXi network management traffic and VM traffic. You can use an already-existing standard switch and its associated uplinks to external networks to create a new VMkernel port for vSAN traffic. Alternatively, you may choose to create a new standard switch for the vSAN network traffic VMkernel port (see Figure 3.2), selecting some new uplinks for the new standard switch.
Figure 3.2 - Add networking wizard: virtual switch selection/creation
In this example, we have decided to create a new standard switch or vSwitch. As you progress through the “add networking” wizard, after selecting the appropriate uplinks for this new standard switch, you will get to the port properties shown in Figure 3.3. This is where the appropriate network service for the VMkernel port is selected. For vSAN the “vSAN traffic” service should be enabled.
Figure 3.3 - Enabling the vSAN traffic service on a port
Complete the wizard and you will have a standard switch configured with a VMkernel port group to carry the vSAN traffic. Of course, this step will have to be repeated for every ESXi host in the vSAN cluster.
vSAN Network Configuration: vSphere Distributed Switch
In the case of a VDS, a distributed port group needs to be configured to carry the vSAN traffic. Once the distributed port group is created, VMkernel interfaces on the individual ESXi hosts can then be created to use that distributed port group. The sections that follow describe this process in greater detail.
Step 1: Create the Distributed Switch
Although the official VMware documentation makes no distinction regarding which versions of Distributed Switch you should be using, the authors recommend using the latest version of the Distributed Switch with vSAN. This is the version that the authors used in their vSAN tests. Note that all ESXi hosts attaching to this Distributed Switch must be running the same version of ESXi when a given distributed switch version has been selected, preferably the version of the selected Distributed Switch should be the same as the ESXi/vSphere version. Earlier versions of ESXi will not be able to utilize this Distributed Switch.
One of the steps when creating a Distributed Switch is to select whether NIOC is enabled or disabled. We recommend leaving this at the default option of enabled. Later on, we discuss the value of NIOC in a vSAN environment.
Step 2: Create a Distributed Port Group
The steps to create a distributed port group are relatively straightforward:
Using the vSphere web client, navigate to the VDS object in the vCenter Server inventory.
Select the option to create a new distributed port group.
Provide a name for the distributed port group.
Set the characteristics of the port group, such as the type VLAN, the VLAN ID, the type of binding, allocation, and the number of ports that can be attached to the port group, as shown in Figure 3.4.
Figure 3.4 - Distributed port group settings
One important consideration with the creation of the port group is the port allocation settings and the number of ports associated with the port group. Note that the default number of ports is eight and that the allocation setting is elastic by default. This means that when all ports are assigned, a new set of eight ports is created. A port group with an allocation type of elastic can automatically increase the number of ports as more devices are allocated. With the port binding set to static, a port is assigned to the VMkernel port when it connects to the distributed port group. If you plan to have a 16-host or larger vSAN cluster, you could consider configuring a greater number of ports for the port group instead of the default of eight. This means that in times of maintenance and outages, the ports always stay available for the host until such time as it is ready to rejoin the cluster, and it means that the switch doesn’t incur any overhead by having to delete and re-add the ports.
When creating a Distributed Switch and distributed port groups, there are a lot of additional options to choose from, such as port binding type. These options are well documented in the official VMware vSphere documentation, and although we discussed port allocation in a little detail here, most of the settings are beyond the scope of this book. Readers who are unfamiliar with these options can find explanations in the official VMware vSphere documentation. However, you can simply leave these Distributed Switch and port groups at the default settings and vSAN will deploy just fine with those settings.
Step 3: Build VMkernel Ports
Once the distributed port group has been created, you can now proceed with building the VMkernel ports on the ESXi hosts. The first step when adding networking to an ESXi host is to select an appropriate connection type. For vSAN network traffic, VMkernel network adapter is the connection type, as shown in Figure 3.5.
Figure 3.5 - VMkernel connection type
The next step is to select the correct port group or distributed port group with which to associate this VMkernel network adapter. We have previously created a distributed port group, so we select that distributed port group, as shown in Figure 3.7.
Once the distributed port group has been selected, it is now time to select the appropriate connection settings for this VMkernel port. In the first part of the connection settings, the port properties are populated. This is where the services associated with the VMkernel port are selected. In this case, we are creating a VMkernel port for vSAN traffic, so that would be the service that should be selected, as shown in Figure 3.8. By default, there are three TCP/IP stacks to choose from when creating a VMkernel adapter. However, only one TCP/IP stack, namely the default TCP/IP stack, can be used for the vSAN network. The TCP/IP provisioning stack can only be used for provisioning traffic and the vMotion TCP/IP stack can only be used for vMotion. You will not be able to select these stacks for vSAN traffic. Options for configuring different network stacks may be found in official VMware documentation and are beyond the scope of this book, but suffice to say that different network stacks can be configured on the ESXi host and have different properties such as default gateways associated with each network stack. The different TCP/IP stacks are shown in Figure 3.6.
Figure 3.6 VMkernel
Currently there is no dedicated TCP/IP stack for vSAN traffic. There is also no support for the creation of a customer vSAN TCP/IP stack. During normal vSAN configurations, this does not need consideration. However, when stretched clusters are discussed in Chapter 8, we will talk about vSAN network considerations in more detail, and how ESXi hosts in a stretched vSAN cluster can communicate over L3 networks.
Figure 3.7 - VMkernel target device
Figure 3.8 - VMkernel port properties
Once the correct service (vSAN traffic) has been chosen, the next step is to populate the IPv4 settings of the VMkernel adapter, as shown in Figure 3.9. IPv6 is fully supported as of vSAN 6.2. You have two options available for both IPv4 and IPv6 settings: DHCP or static. The dynamic host configuration protocol (DHCP) is a standardized network protocol that is used to provide network configuration details to other devices over the network. If DHCP is chosen, a valid DHCP server needs to exist on the network to provide valid IPv4/IPv6 information to the ESXi host for this VMkernel port. In this example, we have chosen to go with a static IPv4 configuration, as that is most common, so a valid IP address and subnet mask must be provided.
Figure 3.9 - VMkernel IPv4 settings
With all the details provided for the VMkernel port, you can double-check the configuration before finally creating the port, as illustrated in Figure 3.10.
This VMkernel port configuration must be repeated for each of the ESXi hosts in the vSAN cluster. When this configuration is complete, the network configuration is now in place to allow for the successful formation of the vSAN cluster.
Figure 3.10 - VMkernel ready to complete
Possible Network Configuration Issues
If the vSAN VMkernel is not properly configured, a warning will be displayed in the vSAN > health section on the monitor tab of your vSAN cluster object. If you click the warning for the particular tests that have failed, further details related to the network status of all hosts in the cluster will display, as shown in Figure 3.11. In this scenario a single host in an eight-host cluster is part of a different IP subnet, causing connectivity issues as expected.
Figure 3.11 - Network configuration warning
For vSAN 6.1 and earlier another place to observe vSAN communication issues is in the summary view of the ESXi host as shown in Figure 3.12. If the host cannot communicate with the rest of the vSAN cluster, the message displayed in the summary tab reads: “Host cannot communicate with all other nodes in the vSAN enabled cluster.” At this point, you need to revisit the VMkernel port properties and ensure that it has been set up correctly.
Figure 3.12 - Host cannot communicate
Another issue that has surprised a number of customers is the reliance on multicast traffic. One of the requirements for vSAN is to allow multicast traffic on the vSAN network between the ESXi hosts participating in the vSAN cluster; however, multicast is used only for relatively infrequent operations. For example, multicast is used for the initial discovery of hosts in the vSAN cluster and for the ongoing “liveness” checks among the hosts in the cluster.
So, how does this lack of multicast support on the network manifest itself? Well, what you will see after enabling vSAN on the cluster is that a warning is shown on a cluster level. If you go to vSAN > health section on the monitor tab of your vSAN cluster object and look at the network health test it will display a warning on “Multicast assessment based on other checks” and/or “All hosts have matching multicast settings,” even though you can ping/vmkping all the vSAN interfaces on all the hosts. Another symptom is that you may find multiple single host vSAN clusters formed, with a single ESXi host in its own unique cluster partition.
How do you resolve it? Well, a number of our vSAN customers discussed some options on the VMware community forum for vSAN, and these were the recommendations:
- Option 1: Disable internet group management protocol (IGMP) snooping for the vSAN traffic VLAN. Now this will allow all multicast traffic through; but if the only traffic is vSAN, this should be a negligible amount of traffic and should be safe to use.
- Option 2: Configure IGMP snooping querier. If there is other multicast traffic and you are concerned that disabling IGMP snooping might open the network up to a flood of multicast traffic, this is a preferred option.
Customers who ran into this situation stated that both methods worked for them; however, we recommend that you refer to your switch provider documentation on how to handle multicast configurations. Some switches convert multicasts to broadcasts, and the packets will be sent to all ports. VMware recommends that customers should avoid using these switches with vSAN if at all possible. The smarter switches that use IGMP snooping have the capability to send the multicast packets on ports where the multicast has been requested, and these switches are more desirable in vSAN deployments. The reason for this recommendation is that simple switches that turn multicast traffic into broadcast traffic can flood a network and affect other non-vSAN hosts attached to the switch.
One final point is to explain how you can figure out which host or hosts are partitioned from the cluster. The easiest way is to use the disk management view under the vSAN manage tab and then the disk groups view. This contains a column called network partition groups. This column will show a group number to highlight which partition a particular host resides in. If the cluster is successfully formed and all hosts are communicating, all hosts in this view will have the same network partition number as shown in the example in Figure 3.13. Note that it also shows whether the hosts are healthy and currently connected to the vSAN cluster.
Figure 3.13 - Network partition group
Network I/O Control Configuration Example
As previously mentioned, Network I/O Control (NIOC) can be used to guarantee bandwidth for vSAN cluster communication and I/O. NIOC is available only on VDS, not on VSS. Indeed, VDS are only available with some of the higher vSphere editions; however, vSAN includes VDS irrespective of the vSphere edition used.
If you are using an earlier version of a Distributed Switch prior to your vSphere version, although not explicitly called out in the vSphere documentation, we recommend upgrading to the most recent version of the Distributed Switch if you plan to use it with vSAN. This is simply a cautionary recommendation as we did all of our vSAN testing with the most recent version of Distributed Switch.
As of vSphere 5.5 NIOC has a traffic type called vSAN traffic, and thus provides QoS on vSAN traffic. Although this QoS configuration might not be necessary in most vSAN cluster environments, it is a good feature to have available if vSAN traffic appears to be impacted by other traffic types sharing the same 10 GbE network interface card. An example of a traffic type that could impact vSAN is vMotion. By its very nature, vMotion traffic is “bursty” and might claim the full available bandwidth on a network interface card (NIC) port, impacting other traffic types sharing the NIC, including vSAN traffic. Leveraging NIOC in those situations will avoid a self-imposed denial-of-service (DoS) attack.
Setting up NIOC is quite straightforward, and once configured it will guarantee a certain bandwidth for the vSAN traffic between all hosts. NIOC is enabled by default when a VDS is created. If the feature was disabled during the initial creation of the Distributed Switch, it may be enabled once again by editing the Distributed Switch properties via the vSphere web client. To begin with, use the vSphere web client to select the VDS in the vCenter Server inventory. From there, navigate to the manage tab and select the resource allocation view. This displays the NIOC configuration options, as shown in Figure 3.14.
Figure 3.14 - NIOC resource allocation
To change the resource allocation for the vSAN traffic in NIOC, simply edit the properties of the vSAN traffic network resource pool. Figure 3.15 shows the modifiable configuration options.
Figure 3.15 - NIOC configuration
By default, the limit is set to unlimited, physical adapter shares are set to 50, and there is no reservation. The unlimited value means that vSAN network traffic is allowed to consume all the network bandwidth when there is no congestion. With a reservation you can configure the minimum bandwidth that needs to be available for this particular traffic stream, which must not exceed 75% of available bandwidth. We recommend leaving this untouched and prefer using the shares mechanism. If congestion arises, the physical adapter shares come into place. These shares are compared with the share values assigned to other traffic types to determine which traffic type gets priority.
With vSAN deployments, VMware is recommending a 10 GbE network infrastructure. In these deployments, two 10 GbE network ports are usually used, and are connected to two physical 10 GbE capable switches to provide availability. The various types of traffic will need to share this network capacity, and this is where NIOC can prove invaluable.
We do not recommend setting a limit on the vSAN traffic. The reason for this is because a limit is a “hard” setting. In other words, if a 2 Gbps limit is configured on vSAN traffic, the traffic will be limited even when additional bandwidth is available on the network. Therefore, you should not use limits because of this behavior. Instead, you should use shares and “artificially limit” your traffic types based on resource usage and demand.
Design Considerations: Distributed Switch and Network I/O Control
To provide QoS and performance predictability, vSAN and NIOC should go hand in hand. Before discussing the configuration options, the following types of networks are being considered:
- Management network
- vMotion network
- vSAN network
- VM network
This design consideration assumes 10 GbE redundant networking links and a redundant switch pair for availability. Two scenarios will be described. These scenarios are based on the type of network switch used:
Redundant 10 GbE switch setup without “link aggregation” capability
Redundant 10 GbE switch setup with “link aggregation” capability
Note: Link aggregation (IEEE 802.3ad) allows users to use more than one connection between network devices. It basically combines multiple physical connections into one logical connection, and provides a level of redundancy and bandwidth improvement.
In both configurations, recommended practice dictates that you create the following port groups and VMkernel interfaces:
- 1 × management network VMkernel interface
- 1 × vMotion VMkernel interface (with all interfaces in the same subnet)
- 1 × vSAN VMkernel interface
- 1 × VM port group
To simplify the configuration, you should have a single vSAN and vMotion VMkernel interface.
To ensure traffic types are separated on different physical ports, we will leverage standard Distributed Switch capabilities. We will also show how to use shares to avoid noisy neighbor scenarios.
Scenario 1: Redundant 10 GbE Switch Without “Link Aggregation” Capability
In this configuration, two individual 10 GbE uplinks are available. It is recommended to separate traffic and designate a single 10 GbE uplink to vSAN for simplicity reasons. The recommended minimum amount of bandwidth per traffic type is as follows:
- Management network: 1 GbE
- vMotion VMkernel interface: 5 GbE
- VM network: 2 GbE
- vSAN VMkernel interface: 10 GbE
Various traffic types will share the same uplink. The management network, VM network, and vMotion network traffic are configured to share uplink 1, and vSAN traffic is configured to use uplink 2. With the network configuration done this way, sufficient bandwidth exists for all the various types of traffic when the vSAN cluster is in a normal or standard operating state.
To make sure that no single traffic type can impact other traffic types during times of contention, NIOC is configured, and the shares mechanism is deployed.
When defining traffic type network shares, this scenario works under the assumption that there is only one physical port available and that all traffic types share that same physical port for this exercise.
This scenario also takes a worst-case scenario approach into consideration. This will guarantee performance even when a failure has occurred. By taking this approach, we can ensure that vSAN always has 50% of the bandwidth at its disposal while leaving the remaining traffic types with sufficient bandwidth to avoid a potential self-inflicted DoS.
Table 3.1 outlines the recommendations for configuring shares for the traffic types.
|vMotion VMkernel Interface||50||N/A|
|VM Port Group||30||N/A|
|vSAN VMkernel Interface||100||N/A|
Table 3.1 - Recommended Share Configuration by Traffic Type (Scenario 1)
When selecting the uplinks used for the various types of traffic, you should separate traffic types to provide predictability and avoid noisy neighbor scenarios. The following configuration is recommended:
- Management network VMkernel interface = Explicit failover order = Uplink 1 active/Uplink 2 standby
- vMotion VMkernel interface = Explicit failover order = Uplink 1 active/Uplink 2 standby
- VM port group = Explicit failover order = Uplink 1 active/Uplink 2 standby
- vSAN VMkernel interface = Explicit failover order = Uplink 2 active/Uplink 1 standby
Setting an explicit failover order in the teaming and failover section of the port groups is recommended for predictability (see Figure 3.16). The explicit failover order always uses the highest-order uplink from the list of active adapters that passes failover detection criteria.
Figure 3.16 - Using explicit failover order
Separating traffic types allows for optimal storage performance while also providing sufficient bandwidth for the vMotion and VM traffic (see Figure 3.17). Although this could also be achieved by using the load based teaming (LBT) mechanism, note that the LBT load balancing period is 30 seconds, potentially causing a short period of contention when “bursty” traffic share the same uplinks. Also note that when troubleshooting network issues, it might be difficult to keep track of the relationship between the physical NIC port and VMkernel interface. Therefore, this approach also provides a level of simplicity to the network configuration.
Figure 3.17 - DistributedsSwitch, failover order, and NIOC configuration
Scenario 2: Redundant 10 GbE Switch with Link Aggregation Capability
In this next scenario, there are two 10 GbE uplinks set up in a teamed configuration (often referred to as EtherChannel or link aggregation). Because of the physical switch capabilities, the configuration of the virtual layer will be extremely simple. We will take the previous recommended minimum bandwidth requirements into consideration for the design:
- Management network: 1 GbE
- vMotion VMkernel: 5 GbE
- VM port group: 2 GbE
- vSAN VMkernel interface: 10 GbE
When the physical uplinks are teamed (link aggregation), the Distributed Switch load-balancing mechanism is required to be configured with one of the following configuration options:
- Link aggregation control protocol (LACP)
IP-Hash is a load-balancing option available to VMkernel interfaces that are connected to multiple uplinks on an ESXi host. An uplink is chosen based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is located at those IP address offsets in the packet is used to compute the hash.
LACP is supported on vSphere 5.5 and higher distributed switches. This feature allows you to connect ESXi hosts to physical switches by means of dynamic link aggregation. LAGs (link aggregation groups) are created on the Distributed Switch to aggregate the bandwidth of the physical NICs on the ESXi hosts that are in turn connected to LACP port channels.
LACP support was introduced in vSphere distributed switch version 5.1, but enhanced support was introduced in the 5.5 versions. If you are running an earlier version of the Distributed Switch, you should upgrade to the 5.5 versions at a minimum.
The official vSphere networking guide has much more detail on IP-hash and LACP support and should be referenced for additional details.
It is recommended to configure all port groups and VMkernel interfaces to use either LACP or IP-Hash depending on the type of physical switch being used:
- Management network VMkernel interface = LACP/IP-Hash
- vMotion VMkernel interface = LACP/IP-Hash
- VM port group = LACP/IP-Hash
- vSAN VMkernel interface = LACP/IP-Hash
Because various traffic types will share the same uplinks, you also want to make sure that no traffic type can affect other types of traffic during times of contention. For that, the NIOC shared mechanism is once again used. Table 3.2 outlines the recommendations for configuring shares for the traffic types defined in NIOC.
|vMotion VMkernel Interface||50||N/A|
|VM Port Group||30||N/A|
|vSAN VMkernel Interface||100||N/A|
Table 3.2 - Recommended Share Configuration by Traffic Type (Scenario 2)
Working under the same assumptions as before that there is only one physical port available and that all traffic types share the same physical port, we once again take a worst-case scenario approach into consideration. This approach will guarantee performance even in a failure scenario. By taking this approach, we can ensure that vSAN always has 50% of the bandwidth at its disposal while giving the other traffic types sufficient bandwidth to avoid a potential self-inflicted DoS situation arising.
When both uplinks are available, this will equate to 10 GbE for vSAN traffic. When only one uplink is available (due to NIC failure or maintenance reasons), the bandwidth is also cut in half, giving a 5 GbE bandwidth.
Figure 3.18 depicts this configuration scenario.
Figure 3.18 - Distributed switch configuration for link aggregation
Either of the scenarios discussed here should provide an optimal network configuration for your vSAN cluster.
Creating a vSAN Cluster
The creation of a vSAN cluster is identical in many respects to how a vSphere administrator might set up a vSphere Distributed Resource Scheduler (DRS) or vSphere High Availability (HA) cluster. A cluster object is created in the vCenter Server inventory, and one can either choose to enable the vSAN cluster functionality and then add in the hosts to the cluster, or add the hosts first, and then enable vSAN on the cluster. The net result from enabling vSAN on a cluster is to have all of the ESXi hosts in the vSAN cluster access a shared, distributed vSAN datastore. At the time of writing in vSAN, only a single vSAN datastore can be created on a single cluster. Therefore, all local storage is consumed by this single vSAN datastore.
The vSAN datastore is made up from the local storage of each of the ESXi hosts in the cluster. The size of the vSAN datastore is entirely dependent on the number of hosts in the vSAN cluster and the number of magnetic disks (hybrid) or flash devices used for capacity (all-flash) in the ESXi hosts participating in the vSAN cluster.
When you enabled the initial version of vSAN, only a single option was displayed asking the administrator to choose a manual or automatic cluster. As of version 6.2 additional options are displayed during the initial configuration of vSAN as shown in Figure 3.19.
Figure 3.19 - New vSAN configuration screen
The automatic or manual aspect of disk claiming is still available in version 6.2. If you enable “Deduplication and Compression” however the selection for this option will always be “manual.” Disk claiming refers to whether the vSphere administrator would like vSAN to discover all the local disks on the hosts and automatically add them to the vSAN datastore or if the vSphere administrator would like to manually select which disks to add to the cluster. Note that when configuring vSAN on an existing cluster, vSphere HA needs to be disabled before enabling vSAN. Before we look at all the different vSAN constructs and configuration aspects, lets take a side step first and look at what changes for vSphere HA with the introduction of vSAN and discuss some configuration options which are recommended for vSAN.
vSphere HA is fully supported on a vSAN cluster to provide additional availability to VMs deployed in the cluster; however, a number of significant changes have been made to vSphere HA to ensure correct interoperability with vSAN. These changes are important to understand as they will impact the way you configure vSphere HA.
vSphere HA Communication Network
In non-vSAN deployments, vSphere HA agents communication takes place over the management network. In a vSAN environment, vSphere HA agents communicate over the vSAN network. The reasoning behind this is that in the event of a network failure we want vSphere HA and vSAN hosts to be part of the same partition. This avoids possible conflicts when vSphere HA and vSAN observe different partitions when a failure occurs, with different partitions holding subsets of the storage components and objects.
vSphere HA in vSAN environments by default continues to use the management network’s default gateway for isolation detection. We suspect that most vSAN environments will more than likely have the management network and the vSAN network sharing the same physical infrastructure (especially in 10 GbE environments). However, if the vSAN and management networks are on a different physical infrastructure, it is recommended to change the default vSphere HA isolation address from the management network to the vSAN network. By default, the isolation address is the default gateway of the management network as previously mentioned. VMware’s recommendation when using vSphere HA with vSAN is to use an IP address on the vSAN network as an isolation address. To prevent vSphere HA from using the default gateway and to use an IP address on the vSAN network, the following settings must be changed in the advanced options for vSphere HA:
- das.isolationAddress0=<ip address on vSAN network>
However, if there is no suitable isolation address on the vSAN network, then leave the isolation address on the management network as per the default.
One other notable difference relates to network reconfiguration. Changes are not automatically detected by vSphere HA if they are made at the vSAN layer to the vSAN networks. Therefore, a vSphere HA cluster reconfiguration must be manually initiated by the vSphere administrator for these changes to be detected.
vSphere HA Heartbeat Datastores
Another noticeable difference with vSphere HA on vSAN is that the vSAN datastore cannot be used for datastore heartbeats. These heartbeats play a significant role in determining VM ownership in the event of a vSphere HA cluster partition with traditional SAN or NAS datastores. This feature is very advantageous when vSphere HA is deployed on traditional shared storage (SAN/NAS) because it allows some level of coordination between partitions. vSphere HA does not use the vSAN datastore for heart-beating and won’t let a user designate it as a heartbeat datastore. vSAN instead uses the clustering service over the network that allows for very fast failure detection. The key reason for this is that vSAN typically leverages the same network interfaces and switches as vSphere HA would, as such the result of datastore heartbeats and network heartbeats would be the same.
Note, however, that if ESXi hosts participating in a vSAN cluster also have access to shared storage, either VMFS (Virtual Machine File System) or NFS (Network File System), these traditional datastores are used for vSphere HA heartbeats.
vSphere HA Admission Control
There is another consideration to discuss regarding vSphere HA and vSAN interoperability. When configuring vSphere HA, one of the decisions that need to be made is about admission control. Admission control ensures that vSphere HA has sufficient resources at its disposal to restart VMs after a failure by setting aside resources.
Note that vSAN is not admission control-aware when it comes to failure recovery. There is no way to automatically set aside spare resources like this on vSAN to ensure that overcommitment does not occur.
If a failure occurs, vSAN will try to use all the remaining space on the remaining nodes in the cluster to bring the VMs to a compliant state. Caution and advanced planning is imperative on vSAN with vSphere HA as multiple failures in the vSAN cluster may fill up all the available space on the vSAN datastore due to overcommitment of resources.
Recommended practice dictates that you take “rebuild capacity” into consideration when planning and designing a vSAN environment. In Chapter 9, “Designing a vSAN Cluster,” you learn how to achieve this. For simplicity reasons, it is recommended to align this form of vSAN (manual) admission control with the selected vSphere HA admission control settings. Do note that the health check section in the Web Client does inform the current state of the cluster, and the state after a full host failure.
vSphere HA Isolation Response
When a host isolation event occurs in a vSAN cluster with vSphere HA enabled, vSphere HA will apply the configured isolation response. With vSphere HA, you can select three different types of responses to an isolation event:
- Leave power on
- Power off, then fail over
- Shut down, then fail over
The recommendation is to have vSphere HA automatically power off the VMs running on that host when a host isolation event occurs. Therefore, the “isolation response” should be set to “power off, then fail over” and not the default setting that is leave powered on.
Note that “power off, then fail over” is similar to pulling the power cable from a physical host. The VM process is literally stopped—this is not a clean shutdown. In the case of an isolation event, however, it is unlikely that vSAN can write to the disks on the isolated host and as such powering off is recommended. If the ESXi host is partitioned, it is also unlikely that the VM will be able to access a quorum of components of the storage object.
vSphere HA Component Protection
In a traditional environment it is possible to configure a response to an all paths down (APD) and permanent device loss (PDL) scenario within HA. This is part of new functionality that was introduced with vSphere 6.0 called VM Component Protection. In the current version this is not supported for vSAN and as such a response to APD and/or PDL does not have to be configured for vSphere HA in a vSAN only cluster.
Now that we know what has changed for vSphere HA, let’s take a look at some core constructs of vSAN.
The Role of Disk Groups
vSAN uses the concept of a disk group as a container for magnetic disks and flash devices. VMs that have their storage deployed on a device in a particular disk group will leverage the caching capabilities of the flash device in the same disk group only. The disk group can be thought of as an aggregate of storage devices that uses flash for performance and magnetic disk drives or flash for capacity. You must take into account a number of considerations for disk groups, which we will look at in detail now. In the future, when we refer to a cache device in the context of vSAN disk groups, we refer to SSDs, PCIe flash, and NVMe flash devices. When we refer to a capacity device we refer to magnetic disks (SATA, SAS or NL-SAS), SSDs, PCIe flash, and NVMe flash devices.
Disk Group Maximums
In vSAN 6.2, there are a maximum number of five disk groups per host. Each disk group will contain at least one caching device and one capacity device to persistently store VMs. vSAN supports both hybrid and all-flash configurations but a single vSAN cluster cannot mix hybrid and all-flash diskgroups at the time of writing.
Each of these disk groups can contain a maximum of one caching device and seven capacity devices. This means that the vSAN datastore maximum size is seven capacity devices × five disk groups × number of ESXi hosts in the cluster × size of the capacity device. As you see, this is quite scalable and can produce a very large distributed datastore.
Why Configure Multiple Disk Groups in vSAN?
Disk groups allow a vSphere administrator to define a failure domain and a deduplication/compression domain in the case of an all-flash configuration. How deduplication and compression works is described in Chapter 5, “Architectural Details.” For now it is sufficient to understand that it happens on a per disk group basis as it can impact the design of your host and the disk groups within the host.
There are different ways of designing a disk group, most important factor here is whether the vSAN cluster is all-flash or hybrid based. For both, however, the failure domain concept applies. Let’s look at that first.
With multiple disk groups with a single caching device and a few capacity devices, should the caching device in a disk group fail, the failure domain is limited to only those capacity devices in that particular disk group. With one very large disk group containing lots of capacity devices, a caching device failure can impact a greater number of VMs as it will impact the full disk group. The failure domain should be a consideration when designing disk group configurations.
Figure 3.20 shows two vSAN hosts. The first vSAN host contains two disk groups with each one caching device and three capacity devices. The second vSAN host contains a single disk group with one caching device and six capacity devices. In the case of the first host, when the caching device fails it does not impact the other disk group in this host which means that 50% of the capacity and performance is still available. The second vSAN host will be impacted to a greater extend. In that case all six capacity devices would be unavailable when the caching device has failed. This is what we mean when we say that a disk group can be used to define a failure domain.
Figure 3.20 - Disk groups define failure domains
In most cases the failure domain or the performance aspects arguments will lead to multiple disk groups instead of a single disk group. There is however another interesting discussion when it comes to this. For all-flash configurations on a per-cluster level space efficiency functionality (deduplication and compression) can be enabled. Although enabled on a per cluster basis, deduplication and compression happens on a per disk group basis. This means that for any given block that needs to be stored on a capacity device in an all-flash cluster, vSAN will see if an identical block has been stored already on that particular disk group. If that is the case then there is no need to store it, else vSAN will compress the data and store it as a new unique block on the disk group. This also means that in an all-flash configuration, depending on your workload, it may be beneficial to create a few larger disk groups over many smaller disk groups, as the deduplication process will be more effective with a larger group. But this needs to be considered along with the fact that when deduplication and compression are enabled on a disk group, a failure of any of the devices in that disk group, be it a cache device or capacity device, impacts the whole of the disk group. This is because the space efficiency metadata is distributed across all the capacity devices in the disk group. As always, this is up to the administrator to decide, and a balance between risk and benefit will need to be found.
Cache Device to Capacity Device Sizing Ratio
When designing your vSAN environment from a hardware perspective, realize that vSAN heavily relies on your caching device (flash) for performance. As a rule of thumb, VMware recommends 10% cache capacity of the expected consumed total virtual disk capacity before “failures to tolerate” has been taken into account. VMware also supports lower ratios. Larger ratios will, in fact, improve the performance of VMs by virtue of the fact that more I/O can be cached. SSDs will function as read cache and write buffer capacity for VMs in vSAN. For the moment, it is sufficient to understand that in a hybrid vSAN cluster 70% of your caching device will be used as a read cache and 30% as a write buffer. In an all-flash configuration 100% of the caching device will be used as a write-buffer and the 10% rule also applies, but increasing the size will not lead to greater performance as all IOs are already served from cache.
The 10% value is based on the assumption that the majority of working data sets are about 10%. Using this rule of thumb (and it is just a rule of thumb) to cover the majority of workloads means that live data from the application running in your VM should be in flash.
For example, assume that we have 100 VMs. Each VM has a 100 GB virtual disk, of which anticipated usage is 50 GB on average. In this scenario, this would result in the following:
10% of (100 × 50 GB) = 500 GB
This total amount of cache capacity should be divided by the number of ESXi hosts in the vSAN cluster. If you have five hosts, in this example that would lead to 100 GB of cache capacity recommended per host.
Automatically Add Disks to vSAN Disk Groups
Automatic versus manual mode has been a topic of hot debate over the past releases. We have found that in the majority of cases customers like to keep control over which device is used for what and part of which disk group. Manual mode allows you to do this. In some cases customers however prefer to have vSAN handle disk management, and this is fully supported.
If automatic mode is chosen during the vSAN cluster creation workflow, vSAN will automatically discover local magnetic disks and local SSDs on each host and build disk groups on each host in the cluster. Note that these SSDs and magnetic disks will be claimed by vSAN only if they are empty and contain no partition information. vSAN will not claim disks that are already being used or have been used in the past and contain residual data. For vSAN to claim these disks, they will first have to be wiped clean.
Each host with valid storage will have a disk group containing their local magnetic disks and/or SSDs. Suffice it to say that a disk group can be thought of as a container of magnetic disks and/or SSDs. As stated previously, each disk group can only contain a single caching device and a maximum of seven capacity devices, but there may be multiple disk groups defined per ESXi host. Finally, after all of this is completed, the vSAN datastore is created, and its size reflects the capacity of all the capacity devices across all the hosts in the cluster, less some metadata overhead.
ESXi hosts that are part of the vSAN cluster but do not contribute storage to the vSAN datastore can still access the vSAN datastore. This is a very advantageous feature of vSAN, because a vSAN cluster can now be scaled not just on storage requirements, but also on compute requirements. Note, however, that VMware recommends uniformly configured clusters for better load balancing, availability, and overall performance.
Although automatic mode will claim “local” disks, most ESXi hosts with SAS controllers will have their disks show up as “remote,” and vSAN will not auto-claim these disks. In this case, the vSphere administrator must manually create the disk groups, even though the cluster is set up in automatic mode, as explained in Chapter 2, “vSAN Prerequisites and Requirements for Deployment.”
Manually Adding Disks to a vSAN Disk Group
As mentioned earlier, as you create the vSAN cluster, you have the option to manually add disks. If this option is selected, administrators are given the opportunity to select multiple cache devices and multiple capacity devices manually via the vSAN configuration wizard. The administrator can select between one and seven capacity devices per disk group and at most one caching device to each of the disk groups. After each disk group is created on a per-host basis, the size of the vSAN datastore will grow according to the amount of capacity devices that is added. Note that the SSDs function as caching devices and are not included in the capacity of the vSAN datastore.
You might wonder when this manual option would be used. First and foremost, it is a requirement for all-flash environments. This is to ensure that the right type of flash is selected for the caching layer and the capacity layer. Another possible reason could be that when vSAN constructs disk groups it will always try to do this in a consistent manner; however, due to many different server configurations, especially those using SAS for disk connectivity, the manual method may be an important approach over the automatic method. SAS reports devices with unique identifiers instead of on a port-by-port basis. Therefore, a disk in disk slot 1 of one host may be part of disk group 1 in ESXi host 1, while disk in disk slot 1 may become part of disk group 2 in ESXi host 2. When disks need to be replaced for whatever reason, it is of the utmost importance that the correct disk is removed and replaced with a new one. Therefore, a vSphere administrator may want to manually configure the disk groups so that the disks are easily identifiable and we have found that this is what most vSAN customers do as it may make life easier during replacement of devices. When using an all-flash configuration though and functionality like deduplication and compression is enabled, all disks should be claimed at the same time and not one by one.
Disk Group Creation Example
Disk group creation is necessary only if the cluster is created in manual mode. If the cluster is created in automatic mode, the disk groups are automatically created for you, using all available disks on the host. The mechanism to create a disk group is quite straightforward. You need to remember some restrictions, however, as mentioned previously:
- At most, there can be one caching device per disk group.
- At most, there can be seven capacity devices per disk group.
Multiple disk groups may be created if a host has more than seven capacity devices or more than one caching device. To create a disk group, the cluster must first of all be configured in manual mode, as shown in Figure 3.21. This can be done during the configuration of vSAN but also can be changed to manual after the configuration of vSAN.
Figure 3.21 - Turning on vSAN
Once the cluster is in manual mode, there will be no storage devices claimed by vSAN. The next step is to manually create disk groups. Navigate to the disk management section under vSAN management in the vSphere Web Client. From here, you select a host in the cluster and click the icon to create a new disk group. This will display all available disks (SSD and magnetic disks) in the host, as shown in Figure 3.22.
Figure 3.22 - vSAN Disk Management
At this point, vSphere administrators have a number of options available. They can decide to claim all disk from all hosts if they want, or they can individually build disk groups one host at a time. The first option is useful if disks show up as not local, such as disks that may be behind a SAS controller. For more granular control, however, administrators may like to set up disk groups one host at a time for more control.
When you decide to configure disk groups manually, the vSphere Web Client provides a very intuitive user interface (UI) to do this. From the UI, you can select the capacity devices and flash devices that form the disk group in a single step, as shown in Figure 3.23.
Figure 3.23 - Claiming disks for vSAN
If the first icon (claim disks) is chosen, all hosts and disks may be selected in one step. If the second icon (create disk groups) is chosen, this steps through the hosts one at a time, claiming disks for that host only. Note the guidance provided in the wizard. Hosts that contribute storage to the vSAN must contribute at least one caching device and one capacity device. In reality, you would expect a much higher number of capacity devices compared to caching devices. And just to reiterate the configuration maximums for vSAN, a disk group may contain only one caching device but up to seven capacity devices.
After the disk groups have been created, the vSAN datastore is created. This vSAN datastore can now be used for the deployment of VM.
vSAN Datastore Properties
The raw size of a vSAN datastore is governed by the number of capacity devices per ESXi host and the number of ESXi hosts in the cluster. There is some metadata overhead to also consider. For example, if a host has seven × 2 TB magnetic disks in the cluster, and there are eight hosts in the cluster, the raw capacity is as follows:
7 × 2 TB × 8 = 112 TB raw capacity
Now that we know how to calculate how much raw capacity we will have available, how do we know much effective capacity we will have? Well this depends on various factors, but it all begins with the hardware configuration, all-flash or hybrid. When creating your vSAN cluster, depending on whether you have an all-flash or hybrid configuration, you have the option to enable “deduplication and compression.” Deduplication and compression will play a big factor in available capacity for an all-flash configuration. Note that these data services are not available in a hybrid configuration. We will discuss deduplication and compression in more detail in Chapter 5.
But not just deduplication and compression, there is also the number of copies of the VM that we need to factor in. This is enabled through the policy-based management framework.
After creating the disk groups, your vSAN is configured. Once the vSAN datastore is formed, a number of datastore capabilities are surfaced up into vCenter Server. These capabilities will be used to create the appropriate VM storage policies for VMs and their associated virtual machine disk (VMDK) storage objects deployed on the vSAN datastore. These include stripe width, number of failures to tolerate, force provisioning, provisioned capacity and if the replication mechanism needs to be optimized for performance or for capacity. Before deploying VMs, however, you first need to understand how to create appropriate VM storage policies that meet the requirements of the application running in the VM.
VM storage policies and vSAN capabilities will be discussed in greater detail later in Chapter 4, “VM Storage Policies on vSAN,” but suffice it to know for now that these capabilities form the VM policy requirements. These allow a vSphere administrator to specify requirements based on performance, availability, and data services when it comes to VM provisioning. The next chapter discusses VM storage policies in the context of vSAN and how to correctly deploy a VM using vSAN capabilities.
If everything is configured and working as designed, vSAN can be configured in just a few clicks. However, it is vitally important that the infrastructure is ready in advance. Identifying appropriate magnetic disk drives for capacity, sizing your flash resources for performance, and verifying that your networking is configured to provide the best availability and performance are all tasks that must be configured and designed up front.
Now the vSAN cluster is up and running, let’s makes use of it. We touched on the topic of VM storage policies. These should be created to reflect the requirements of the application running in your VM. We look at how to do this in the next chapter.