Chapter 7 - Management and Maintenance

This chapter covers the common vSAN management and maintenance procedures and tasks. It also provides some generic workflows and examples related to day-to-day management. Management and maintenance of vSAN has changed considerably since the initial version. This chapter will look at the original techniques involved, but will also discuss in detail the enhancements made along the way to vSAN 6.2.

Health Check

We will begin this chapter with a look at what has become the most valuable tool in an administrator’s arsenal when it comes to management and maintenance of vSAN. This is of course the vSAN health check. vSAN health check was introduced in vSAN 6.0 as a plugin for vCenter Server, which in turn pushed out a VIB (vSphere installation bundle) to each of the ESXi hosts in the vSAN cluster. This had to be installed independently from vSphere and vSAN. However since vSAN 6.1, this feature is now embedded into both vCenter Server and ESXi, so very little administrative action is required to leverage this feature now.

Health Check Tests

Possibly the most useful part of the health check is the number of tests that it performs on all aspects of the cluster. It tests to ensure that all of the hardware devices are on the VMware compatibility guide (VCG) (including driver versions), that the network is functioning correctly, that the cluster is formed properly, that the storage devices do not have any errors. This is invaluable when it comes to troubleshooting vSAN issues and can quickly lead administrators to the root cause of an issue. Administrators should always refer to the health check tests and make sure that vSAN is completely healthy before embarking on any management or maintenance tasks. Figure 7.1 shows a complete set of health checks taken from a 6.1 vSAN cluster. Enhancements are being added to the health check with each release, so expect a different list of health checks depending on the vSAN version. There are also additional health checks for use cases such as vSAN stretched cluster.

Figure 7.1 - Health check tests listing

vSAN health check also includes an alerting/alarm mechanism. This means that if a test fails in the health check, an alarm is raised to bring it to the administrator’s attention. The other really nice feature of the health check tests is that, through the AskVMware mechanism, all tests are linked to a VMware knowledgebase article which provides details about the nature of the test, what it means when it fails, and how you can remediate the situation. To run the health check tests, first select the vSAN cluster object in the vCenter inventory, then select monitoring, and then select vSAN > Health. The tests can be re-run at any time by clicking the “retest” button. However, the tests are run automatically every 60 minutes. Health check tests will be revisited in Chapter 10, “Troubleshooting, Monitoring, and Performance,” when we look at troubleshooting vSAN in greater detail.

Proactive Health Checks

Along with the set of health check tests introduced previously, vSAN health check also provides a set of proactive tests. Typically, one would not run these proactive tests during production. However, these tests can be very useful if you wish to implement a proof-of-concept (PoC) with vSAN, or even as part of the initial deployment of vSAN on site. These proactive tests can give you peace-of-mind that everything is working correctly before putting vSAN into production. The proactive tests include:

  • VM creation test
  • Multicast performance test
  • Storage performance test

Simply select the test that you wish to run, and click the green “start” arrow symbol to begin the test. Figure 7.2 shows the tests as they appear in the vSphere web client.

Figure 7.2 - Health check proactive tests

The actual tests are well described in the Web Client. The “VM creation test” quickly verifies that virtual machines can be deployed on the vSAN datastore, and once that verification is complete, the sample VMs are removed. The VM are created with whatever policy is the default policy for the vSAN datastore. The “multicast performance test” simply verifies that the network infrastructure can handle a particular throughput using multicast traffic, and highlights if the network is unable to carry a particular load that is desirable for vSAN. This is especially important when there is a complex network configuration that may involve a number of hops or routes when vSAN is deployed over L3. Finally, there is the “storage performance test” which allows administrator to run various storage loads on their vSAN cluster, and examine the resulting bandwidth, throughput and latency. This is also a very good test for proof-of-concepts or pre-production deployments where administrators can run a burn-in test on their vSAN storage and discover any unstable components, drivers or firmware on their system. A good recommendation would be to run a storage performance burn-in test overnight as part of any deployment.

Performance Service

One final feature that can be considered part of the health check is the performance service, which was introduced in vSAN 6.2. Since the initial release of vSAN, an area that was identified as needing much improvement was the area of monitoring vSAN performance from the vSphere Web Client. While some information was available in the vSphere Web Client, such as per-VM performance metrics, there was little information regarding the performance of the vSAN cluster at an overall cluster basis, a per host basis, a per disk group basis or even a per device basis. This information was only attainable via the vSAN observer tool, and was not integrated with vSphere. Nor could the vSAN observer provide any historic data; it only ran in real-time mode. With the release of the performance service, metrics such as IOPS, latency and throughput (and many others) are now available in the vSphere Web Client at a glance.

The performance service is initially disabled. Administrators will need to enable it via the Web Client. A nice feature of the performance service is that is does not put any additional load on vCenter Server for maintaining metrics. Instead, all metrics are saved on a special VM home object on the vSAN datastore (statistics database) that is created when the performance service is enabled. Now you can also view historic data as well as current system status. The metrics displayed in the UI are calculated as an average performance over a 5-minute interval (roll up). Since the statistics are stored in a VM home namespace object, it may use up to a maximum of 255 GB of capacity. Figure 7.3 shows the policy for the statistics database using the vSAN default policy once the performance service is enabled.

Figure 7.3 - Performance service enabled

Note that the health check also includes a number of tests to ensure that the performance service is functioning normally. These tests are only visible in vSAN 6.2.

Now that we have provided an overview of the health check and associated services, let’s now turn our attention to some of the more common management tasks an administrator might be faced with when managing vSAN.

Host Management

VMware vSAN is a scale-out and scale-up storage architecture. This means that it is possible to seamlessly add extra storage resources to your vSAN cluster. These storage resources can be magnetic disks or flash devices for additional capacity, complete disk groups including both cache and capacity devices, but could also be additional hosts containing storage capacity. Those who have been managing vSphere environments for a while will not be surprised that vSAN is extremely simple; adding storage capacity can truly be as simple as adding a new host to a cluster. Let’s look at some of these tasks more in depth.

Adding Hosts to the Cluster

Adding hosts to the vSAN cluster is quite straightforward. Of course, you must ensure that the host meets vSAN requirements or recommendations such as a 1 Gb dedicated network interface card (NIC) port (10 GbE being recommended) and at least one cache tier device and one or more capacity tier devices if the host is to provide additional storage capacity. Also, pre-configuration steps such as a VMkernel port for vSAN communication should be considered, although these can be done after the host is added to the cluster. After the host has successfully joined the cluster, you should observe the size of the vSAN datastore grow according to the size of the additional capacity devices in the new host. Remember that the flash tier device does not add anything to the capacity of the vSAN datastore. Just for completeness’ sake, these are the steps required to add a host to a vSAN cluster using the vSphere Web Client:

  1. Right-click the cluster object and click Add Host.

  2. Fill in the IP address or host name of the server, as shown in Figure 7.4.

  3. Fill in the user account (root typically) and the password.

  4. Accept the SHA1 thumbprint option.

  5. Click Next on the host summary screen.

  6. Select the license to be used.

  7. Enable lockdown mode if needed and click Next.

  8. Click Next in the resource pool section

  9. Click Finish to add the host to the cluster.

Figure 7.4 - Adding a host to the cluster

That is it; well, if you have your vSAN cluster configured to automatic mode, of course. If you do not have it configured to automatic mode, you will need an additional step to create a disk group manually. You will learn how to do that later in this chapter in the disk management section.

Removing Hosts from the Cluster

Should you want to remove a host from a cluster, you must first ensure that the host is placed into maintenance mode, which is discussed in further detail in the next section. After the host has been successfully placed into maintenance mode, you may safely remove it from the vSAN cluster. To remove a host from a cluster using the vSphere web client, follow these steps:

  1. Right-click the host and click Enter Maintenance Mode and select the appropriate vSAN migration option from the screen in Figure 7.5, and then click OK. If the plan is to truly remove this host from the cluster, then a full data migration is the recommended maintenance mode option.
  2. Now all the virtual machines (VMs) will be migrated (vMotion) to other hosts. If DRS is enabled on the cluster, this will happen automatically. If DRS is not enabled on the cluster, the administrator will have to manually migrate VMs from the host entering maintenance mode for the operation to complete successfully.
  3. When migrations are completed, depending on the selected vSAN migration option, vSAN components may also be copied to other hosts.
  4. When maintenance mode has completed, right-click the host again and select move to option to move the host out of the cluster.
  5. If you wish to remove the host from vCenter Server completely, right-click on the host once again, and select remove from inventory. This might be located under all vCenter actions in earlier versions of vCenter Server
  6. Read the text presented twice, and click Yes when you understand the potential impact.

Figure 7.5 - Enter maintenance mode

ESXCLI vSAN Cluster Commands

There are no specific host commands for vSAN. There is a namespace in ESXCLI for the vSAN, however. Using these command-line interface (CLI) commands, you can enable a host to join or leave a cluster. The basic commands as part of esxcli vsan cluster are shown in Example 7.1.

Example 7.1 esxcli vsan cluster Command Options

~ #esxcli vsan cluster
Usage: esxcli vsan cluster {cmd} [cmd options]            `
Available Commands            `
 get      Get the information of the vSAN cluster that this host is joined to.            `
 join     Join the host to a given vSAN cluster.            `
 leave    Leave the vSAN cluster the host is currently joined to.            `
 restore  Restore the persisted vSAN cluster configuration.            `
~ #

One command that we have used regularly during troubleshooting exercises is the get command. The get command allows you to get cluster configuration information on the command line, which can be used to compare hosts against each other; a sample is provided in Example 7.2.

Example 7.2 Using the get Command to get Cluster Configuration Information on the Command Line

~ # esxcli vsan cluster get
 Cluster Information
 Enabled: true
 Current Local Time: 2013-03-18T12:09:11Z
 Local Node UUID: 511b62c3-96e6-434e-6839-1cc1de253de4
 Local Node State: MASTER
 Local Node Health State: HEALTHY
 Sub-Cluster Master UUID: 511b62c3-96e6-434e-6839-1cc1de253de4
 Sub-Cluster Backup UUID: 511cc68b-352a-5cae-cf67-1cc1de252264
 Sub-Cluster UUID: 523845c8-73c9-5d99-0393-9ef20a328714
 Sub-Cluster Membership Entry Revision: 10
 Sub-Cluster Member UUIDs: 511b62c3-96e6-434e-6839-1cc1de253de4,
 Sub-Cluster Membership UUID: 56092451-245f-9c0c-29f6-1cc1de253de4

Maintenance Mode

The previous section briefly touched on maintenance mode when removing an ESXi host from a vSAN cluster. With vSAN, maintenance mode includes new functionality that we will elaborate on here. In the past, when an ESXi host was placed in maintenance mode, it was all about migrating VMs from that ESXi host; however, when you implement vSAN, maintenance mode provides you with the option to migrate data as well. The vSAN maintenance mode options relate to data evacuation, as follows:

  • Ensure Accessibility: This option evacuates enough data from the host entering maintenance mode to ensure that all VM storage objects are accessible after the host goes down. This is not full data evacuation. Instead, vSAN examines the storage objects that could end up without quorum or data availability when the host is placed into maintenance mode and makes enough copies of the object available to alleviate those issues. vSAN (or to be more precise, cluster level object manager [CLOM]) will have to successfully reconfigure all objects that would become inaccessible because of the lack of availability of those component(s) on that host. An example of when this could happen is when VMs are configured with “failures to tolerate” set to 0, or there is already a host with a failure in the cluster, or indeed another host is in maintenance mode. Ensure Accessibility is the default option of the maintenance mode workflow and the recommended option by VMware if the host is going to be in maintenance for a short period of time. If the maintenance time is expected to be reasonably long, administrators should decide if they want to fully evacuate the data from that host to avoid risk to their VMs and data availability. There is one subtle behavior difference to note between the original release of vSAN and later releases. In the first release, when a host was placed in maintenance mode, it continued to contribute storage to the vSAN datastore and components were still accessible. In later releases this behavior was changed. Now when a host is placed into maintenance mode, it no longer contributes storage to the vSAN datastore, and any components on the datastore are marked as ABSENT.
  • Full Data Migration: This option is a full data evacuation and essentially creates replacement copies for every piece of data residing on disks on the host being placed into maintenance mode. vSAN does not necessarily copy the data from the host entering maintenance mode; however, it can and will also leverage the hosts holding the replica copy of the object to avoid creating a bottleneck on the host entering maintenance mode. In other words, in an eight-host cluster, when a host is placed in maintenance mode using full data migration, then potentially all eight hosts will contribute to the re-creation of the impacted components. The host does not successfully enter maintenance mode until all affected objects are reconfigured and compliance is ensured when all of the component(s) have been placed on different hosts in the cluster. This is the option that VMware recommends when hosts are being removed from the cluster, or there is a longer-term maintenance operation planned.
  • No Data Migration: This option does nothing with the storage objects. If the host is powered off after entering maintenance mode, the situation is equivalent to the host crashing. It is also important to understand that if you have objects that have number of failures to tolerate set to 0, you could impact the availability of those objects by choosing this option. There are some other risks associated with this option. For example, if there is some other “unknown” issue or failure in the cluster or there is a maintenance mode operation in progress that the administrator is not aware of, this maintenance mode option can lead to VM or data unavailability. For this reason, VMware only recommend this option when there is a full cluster shutdown planned (or on the advice of VMware support staff).

Note that in the original release of vSAN, if a host enters maintenance mode, vSAN still operates, accesses, and serves data on that host. Only when the host is removed from the cluster or is powered off did vSAN stop using the host (or, of course, when you have decided to do a full data migration, when the data migration has been completed, and the “old” components have been removed). This behavior changed in vSAN 6.0. In vSAN 6.0 and later, when a host is placed into maintenance mode, it no longer contributes storage to the vSAN datastore, and any components that reside on the physical storage of the host that is placed into maintenance mode is marked as absent.

For manual maintenance mode operations (outside of VMware Update Manager), Figure 7.6 shows which options an administrator can select from the UI when a host or hosts are placed into maintenance mode, with Ensure Accessibility being the default preselected data migration suggestion.

Figure 7.6 - Maintenance mode options

Default Maintenance Mode/Decommission Mode

One other important point is the default maintenance mode setting when a product like VMware Update Manager is being used. Before vSAN 6.1, there was no way to control the maintenance mode (decommission mode) option; it was always set to Ensure Accessibility. This was not always the option that customers wished to use. Since the release of vSAN 6.1, customer can now control the default maintenance mode option through an advanced setting. The advanced setting is called vSAN.DefaultHostDecommissionMode, and allows administrators to set the default maintenance mode to an option other than Ensure Accessibility, as listed in Table 7.1.

Maintenance Mode Option Description
ensureAccessibility vSAN data reconfiguration should be performed to ensure storage object accessibility
evacuateAllData vSAN data evacuation should be performed such that all storage object data is removed from the host
noAction No special action should take place regarding vSAN data

Table 7.1 vSAN.DefaultHostDecommissionMode Options

It is best to draw a comparison to a regular storage environment first. When you do upgrades on a storage array, you typically do these upgrades in a rolling fashion, meaning that if you have two controllers, one will be upgraded while the other handles I/O. In this scenario, you are also at risk. The big difference is that as a virtualization administrator you have a bit more flexibility, and you expect certain features to work as expected, such as vSphere High Availability (HA), for instance. You need to ask yourself what level of risk you are willing to take, and what level of risk you can take.

From a vSAN perspective, when it comes to placing a host into maintenance mode, you will need to ask yourself the following questions:

  • Why am I placing my host in maintenance mode? Am I going to upgrade my hosts and expect them to be unavailable for just a brief period of time? Am I removing a host from the cluster altogether? This will play a big role in which maintenance mode data migration option you should use.
  • How many hosts do I have? When using three hosts, the only option you have is Ensure Accessibility because vSAN always needs three hosts to store objects (two replicas and one witness). Therefore with a three node cluster, you will have to accept some risk with using maintenance mode, and run with one copy of the data.
  • How long will the move take?
    • Is this an all-flash cluster or a hybrid cluster?
    • What types of disks have I used (SAS versus SATA)?
    • Do I have 10 GbE or 1 GbE?
    • How big is my cluster?
  • Do I want to move data from one host to another to maintain availability levels? Only stored components need to be moved, not the “raw capacity” of the host! That is, if 6 TB of capacity is used out of 8 TB, 6 TB will be moved.
  • Do I just want to ensure data accessibility and take the risk of potential downtime during maintenance? Only components of those objects at risk will be moved. For example, if only 500 GB out of the 6 TB used capacity is at risk, that 500 GB will be moved.

There is something to say for all maintenance mode data migration options. When you select full data migration, to maintain availability levels, your “maintenance window” will be stretched, as you could be copying terabytes over the network from host to host. It could potentially take hours to complete. If your ESXi upgrade including a host reboot takes about 20 minutes, is it acceptable to wait for hours for the data to be migrated? Or do you take the risk, inform your users about the potential downtime, and as such do the maintenance with a higher risk but complete it in minutes rather than hours? Of course, if the maintenance mode takes longer than 1 hour, then you may have components begin rebuild and resync on other nodes on the cluster, which will consume resources (60 minutes is when the clomd timeout expires, and absent components are automatically rebuilt). However the main risk is if another failure occurred in the cluster during the maintenance window. Then you risk availability to your VMs and your data. One other way to overcome this is to use a number of failures to tolerate = 2, which means that you can do maintenance on one node, and still tolerate another host failing at the same time. The new erasure coding option, which allows customer to implement a RAID-6 configuration, can tolerate two failures but not consume as much capacity as a RAID-1 configuration.

To be honest, it is impossible for us to give you advice on what the best approach is for your organization. We do feel strongly that for normal software or hardware maintenance tasks that only take a short period of time, it will be acceptable to use the Ensure Accessibility maintenance mode data migration option. You should still, however, discuss all approaches with your storage team and look at their procedures. What is the agreed service level agreement (SLA) with your business partners and what fits from an operational perspective?

Disk Management

One of the design goals for vSAN, as already mentioned, is the ability to scale out the storage capacity. This requires the ability to add new disks, replace disks with a larger capacity disk, or simply replace failed disks. This next section discusses the procedures involved in doing these tasks in a vSAN environment.

Adding a Disk Group

Chapter 2, “vSAN Prerequisites and Requirements for Deployment,” demonstrated how to add a disk group; however, for completeness, here are the steps again. This example shows how to create a disk group on all hosts simultaneously. However administrators can also create disk groups on a host-by-host basis.

  1. Click your vSAN cluster in the left pane.

  2. Click the Manage tab on right side.

  3. Click Settings and Disk Management.

  4. As shown in Figure 7.7, the available devices can be shown on a per device model/size basis or host basis. In this example, the Disk model/size view is shown

  5. Click on the “Claim For” field, and select “Capacity tier” for all capacity devices and select “Cache tier” for all the cache devices, then click OK.

Now new disk groups are created; this literally takes seconds.

Figure 7.7 - vSAN disk management

Removing a Disk Group

Before you start with this task, you may want to evacuate the components that are currently in that disk group. In the initial release of vSAN, this could only be achieved by placing the ESXi host with the disk group that you want to remove in maintenance mode. In vSAN 6.0, a new method that allows administrators to evacuate disk groups without placing the host into maintenance mode was introduced.

Evacuating the VM components from a disk group is not a required step for deleting a disk group, but we believe that most administrators would like to move the VM components currently in this disk group to other disk groups in the cluster before deleting the disk group. If you don’t do this step, and evacuate the data, you may be left with degraded components that are no longer highly available while vSAN reconfigures these components. And as highlighted many times now, if there is a failure while the objects are degraded, it may lead to data loss.

If you are planning on doing a full data evacuation of a disk group, you should validate first whether sufficient disk space is available within the cluster to do so.

When you complete this step, as shown in Figure 7.8, and you should be able to remove the disk group. However, the icon to remove the disk group may not be visible in the disk groups view depending on how vSAN has been configured.

Figure 7.8 - vSAN disk groups

When vSAN is configured initially, a decision is made about how disks are added to the vSAN datastore. This can be done fully automated, “automatic” mode, or in a manual fashion, “manual” mode. By default, vSAN is configured to automatic mode, which means that if we remove a disk group, vSAN immediately claims the disks again, and as such the option to remove a disk group is not presented to the user. When it is desired to remove a disk group, you will need to place the vSAN cluster in manual mode for the remove the disk group icon to appear. This can simply be done through the vSAN cluster settings, as shown in Figure 7.9.

Figure 7.9 - Changing mode from Automatic to Manual in the initial release of VSAN

When the vSAN is placed in manual mode, return to the disk management view and you should see that the remove the disk group icon (it has the red X) is now visible when you select the disk group on the host that is in maintenance mode. You can now proceed with removing the disk group. If the disk group has already been evacuated, you should see the popup as shown in Figure 7.10, which shows that there is no data left to evacuate. In vSAN 6.0, if there is data still on the disks in the disk group, administrators now have the opportunity to evacuate the components in the disk groups rather than follow the maintenance mode procedure outlined earlier.

Figure 7.10 - Remove disk group from vSAN

Adding Disks to the Disk Group

If your vSAN was configured in automatic mode, adding disks to the disk group is not an issue. New or existing disks are automatically claimed by the vSAN cluster and are used to provide capacity to the vSAN datastore.

However, if your cluster was created in manual mode, you will need to add new disks to the disk groups to increase the capacity of the vSAN datastore. This can easily be done via the vSphere Web Client. Navigate to the vSAN cluster, select the Manage tab, and then the vSAN Disk Management section. Next, new disks can be “claimed” for a disk group. If your disks do not show up, be sure to do a rescan on your disk controller. Once again, displaying the list of disks by model/size, or by hosts can do this. In this case, the grouping has been changed to hosts. Most of the hosts in this cluster already have all of their disks claimed. However host esxi-hp-12 still has a HDD (hard disk) available for selection. Select the host and then the disk, and in the Claim For column, administrators can choose if the disk is for the cache tier or capacity tier. If the disk is being added to an already existing disk group, or it is not a flash device, then it can only be added to a capacity tier, as shown in Figure 7.11. Select the disks that you want to add to the vSAN cluster and click OK.

Figure 7.11 - Claim disks

Removing Disks from the Disk Group

Just like removing disk groups discussed previously, disks can be removed from a disk group in the vSphere Web Client only when the cluster is placed in manual mode. If the cluster is in automatic mode, the vSAN cluster will simply reclaim the disk you’ve just removed. When the cluster is in manual mode, navigate to the Disk Management section of the vSAN cluster, select the disk group, and an icon to remove a disk becomes visible in the user interface (UI)—a disk with a red X—as highlighted in Figure 7.12. Note that this icon is not visible when the cluster is in automatic mode.

Figure 7.12 - Remove a disk from a disk group

Similar to how the administrator is prompted to evacuate a disk group when the delete disk group option is chosen, administrators are also prompted to evacuate individual disks when a disk is being removed from a disk group. This will migrate all of the components on the said disk to other disks in the disk group if there is sufficient space. In the example in Figure 7.13, the disk is already empty so there are 0 bytes to move, but if there were still components on this disk, administrators have the opportunity to migrate them before deleting the disk from the disk group.

There is one important note on the removing of individual disk from a disk group. If deduplication and compression is enabled on the cluster, it is not possible to remove the disk from a disk group. The reason for this is that the deduplicated and compressed data, along with the associated hash tables and metadata associated with deduplication and compression, are striped across all the capacity tier disks in the disk group. Therefore it is not possible to remove a single disk. To remove a single disk from a disk group where deduplication and compression are enabled on the cluster, the entire disk group should be evacuated and then the disk may be replaced. Afterwards the disk group should be recreated and the cluster balanced via the health check UI.

Figure 7.13 - Evacuate data

Wiping a Disk

In some cases, other features or operating systems may have used magnetic disks and flash devices before vSAN is enabled. In those cases, vSAN will not be able to reuse the devices when the devices still contain partitions or even a file system. Note that this has been done intentionally to prevent the user from selecting the wrong disks. If you want to use a disk that has been previously used, you can wipe the disks manually.

There are three commonly used methods to wipe a disk before vSAN is used:

  • If it was previously in use by vSAN, removing it from the disk group via the vSphere Web Client will remove the partition table.
  • If using vSphere 6.0 U1 (and vSAN 6.1), the disk can be erased from the vSphere Web Client directly.
  • Using the command partedUtil, a disk partition management utility which is included with ESXi.
  • Booting the host with the gparted bootable ISO image.

The gparted procedure is straightforward. You can download the ISO image from, boot the ESXi host from it and it is simply a matter of deleting all partitions on the appropriate disk and clicking Apply.

Warning: The tasks involved with wiping a disk are destructive, and it will be nearly impossible to retrieve any data after wiping the disk.

The partedUtil method included with ESXi is slightly more complex because it is a command-line utility. The following steps are required to wipe a disk using partedUtil. If you are not certain which device to wipe, make sure to double-check the device ID using esxcli storage core device list:

Step 1: Display the partition table

~ # partedUtil get - /dev/disks/naa.500xxxxxx
15566 255 63 250069680
1 2048 6143 0 0
2 6144 250069646 0 0

Step 2: Display partition types

~ # partedUtil getptbl /dev/disks/naa.500xxxxxx
15566 255 63 250069680
1 2048 6143 381CFCCC728811E092EE000C2911D0B2 vsan 0
2 6144 250069646 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
~ #

Step 3: delete the partitions

~ # partedUtil delete /dev/disks/naa.500xxxxxx 1
~ # partedUtil delete /dev/disks/naa.500xxxxxx 2

If you are looking for more guidance about the use of partedUtil, read the following VMware Knowledge Base (KB) article:

Blinking the LED on a Disk

In vSphere 6.0, you can blink the LEDs on the front disk drives. Having the ability to identify a drive for replacement becomes very important for vSAN, as clusters can contain tens or even hundreds of disk drives.

You’ll find the icons for turning on and off LEDs when you select a disk drive in the UI, as highlighted in Figure 7.14. Clicking on the “green” icon turns the LED on; clicking on the “grey” icon turns the LED off again.

Figure 7.14 - Blink a disk LED from the vSphere Web Client

ESXCLI vSAN Disk Commands

From the ESXCLI, a number of disk-related activities might be done. First, you can get and set the manual or automated mode of the cluster. The remaining commands in ESXCLI for vSAN storage relate to disks and disk groups. You can add, remove, or list the disks in a disk group. This will list both SSDs and magnetic disks. Example 7.3 showing esxcli vsan storage list highlights for the two devices whether it is a flash device and of which disk group they are part.

Example 7.3 Output of esxcli vsan storage list

~ # esxcli vsan storage list
   Device: naa.5000c5002bd7526f
   Display Name: naa.5000c5002bd7526f
   Is SSD: false
   vSAN UUID: 52db9f60-57b8-ad88-70eb-889f3c72b5e1
   vSAN Disk Group UUID: 521f9dda-efda-4718-e75d-aec63eb6fbd4
   vSAN Disk Group Name: naa.500253825000c296
   Host UUID: 519f364d-ef04-8d94-8ad4-1cc1de252264
   Cluster UUID: 520bff2a-badc-0cdd-7be7-70e5e5ae032f
   Used by this host: true
   In CMMDS: true
   Checksum: 11848694795517181960
   Checksum OK: true
   Device: naa.500253825000c296
   Display Name: naa.500253825000c296
   Is SSD: true
   vSAN UUID: 521f9dda-efda-4718-e75d-aec63eb6fbd4
   vSAN Disk Group UUID: 521f9dda-efda-4718-e75d-aec63eb6fbd4
   vSAN Disk Group Name: naa.500253825000c296
   Host UUID: 00000000-0000-0000-0000-000000000000
   Cluster UUID: 00000000-0000-0000-0000-000000000000
   Used by this host: true
   In CMMDS: true
   Checksum: 12800345249350977942
   Checksum OK: true

As previously mentioned, it is also possible to remove a magnetic disk or a flash device from a disk group through the CLI; however, this should be done with absolute care and preferably through the UI, as shown on the previous pages.

Failure Scenarios

We have already discussed some of the failure scenarios in Chapter 5, “Architectural Details,” and explained the difference between absent components and degraded components. From an operational perspective, though, it is good to understand how a magnetic disk, SSD, or host failure impacts you. Before we discuss them, let’s first shortly recap the two different failure states, because they are fundamental to these operational considerations:

  • Absent: vSAN does not know what has happened to the component that is missing. A typical example of this is when a host has failed; vSAN cannot tell if it is a real failure or simply a reboot. When this happens, vSAN waits for 60 minutes by default before new replica components are created.
  • Degraded: vSAN knows what has happened to the component that is missing. A typical example of when this can occur is when an SSD or a magnetic disk has died. When this happens, vSAN instantly spawns new components to make all impacted objects compliant again with their selected policy.

Now that you know what the different states are, let’s look again at the different types of failures, or at least the “most” common and what the impact is.

Capacity Device Failure

A disk failure is probably the most common failure that can happen in any storage environment, and vSAN is no different. The reason for this is simple: moving parts. The question, of course, is this: How does vSAN handle disk failure? What if it is doing a write or read to or from that disk?

If a read error is returned from a storage component, be it a magnetic disk in the case of hybrid configurations or a flash device in the case of all-flash configurations, vSAN checks to see whether a replica component exists and reads from that instead. Every object by default is created with number of failures to tolerate set to 1, which means that there are always two identical copies of your object available. There are two separate scenarios when it comes to reading data. The first one is where the problem is recoverable, and the second one is an irrecoverable situation. When the issue is recoverable, the I/O error is reported to the object owner. A component re-creation takes place, and when that is completed, the errored component is deleted. However, if for whatever reason, no replica component exists (an unlikely scenario and something an administrator would have had to create a policy specifically for), vSAN will report an I/O error to the VM.

Write failures are also propagated up to the object owner. The components are marked as degraded and a component re-creation on different disks in the vSAN cluster is initiated. When the component re-creation is completed, the cluster directory (cluster monitoring, membership, and directory service [CMMDS]) is updated. Note that the flash device (which has no error) continues to service reads for the components on all the other capacity devices in the disk group.

In the initial vSAN release, the vCenter Web Client today does not provide an indication of how much data needs to be synced when a component or components are being created as a result of a failure; however, a very useful vsan.resync_dashboard Ruby vSphere console (RVC) command does allow you to verify, as shown in Figure 7.15:

Figure 7.15 Using vsan.rsync_dashboard for verification

However, since vSAN 6.0, the vSphere Web Client provides the ability to monitor how much data is being resynced in the event of a failure. Selecting the vSAN cluster oject in the vCenter Server inventory, then selecting Monitor, vSAN and then “resyncing components” can find this information. It will report on the number of resyncing components, the bytes left to resync and the estimate time for the resyncing to complete.

Cache Device Failure

What about when the cache device becomes inaccessible? When a cache device becomes inaccessible, all the capacity devices backed by that cache device in the same disk group are also made inaccessible. A cache device failure is the same as a failure of all the capacity devices bound to the cache device. In essence, when a cache device fails, the whole disk group is considered to be degraded. If there is spare capacity in the vSAN cluster, it tries to find another host or disk and starts reconfiguring the storage objects.

Therefore, from an operational and architectural decision, depending on the type of hosts used, it could be beneficial to create multiple smaller disk groups versus a single large disk group because a disk group should be considered to be a failure domain, as shown in Figure 7.16.

Figure 7.16 - vSAN disk groups

Host Failure

Assuming vSAN VM storage policies have been created with the number of failures to tolerate at least set to 1, a host failure in a vSAN cluster is similar to a host failure in a cluster that has a regular storage device attached. The main difference, of course, being that the vSAN host that has failed contains components of objects that will be out of sync when the host returns. Fortunately, vSAN has a mechanism that syncs all the components as soon as they return.

In the case of a host failure, after 60 minutes vSAN will start re-creating components because the likelihood of the host returning within a reasonable amount of time is slim. When the reconstruction of the storage objects is completed, the cluster directory (CMMDS) is once again updated. In fact, it is updated at each step of the process, from failure detection, start of resync, resync progress and rebuild complete.

If the host that originally failed recovers and rejoins the cluster, the object reconstruction status is checked. If object reconstruction has completed on another node or nodes, no action is taken. If object resynchronization is still in progress, the components of the originally failed host are also resynched, just in case there is an issue with the new object synchronization. When the synchronization of all objects is complete, the components of the original host are discarded, and the more recent copies are utilized. Otherwise, if the new components failed to resync for any reason, the original components on the original host are used.

You probably are wondering by now how this resynchronization of vSAN components actually works. vSAN maintains a bitmap of changed blocks in the event of components of an object being unable to synchronize due to a failure on a host, network, or disk. This allows updates to vSAN objects composed of two or more components to be reconciled after a failure. Let’s use an example to explain this. If a host with replica A of object X has been partitioned from the rest of the cluster, the surviving components of X have quorum and data availability, so they continue functioning and serving writes and reads. While A is “absent,” all writes performed to X are persistently tracked in a bitmap by vSAN, that is, the bitmap are tracking the regions that are still out of sync. If the partitioned host with replica A comes back and vSAN decides to reintegrate it with the remaining components of object X, the bitmap is used to resynchronize component A.

When a host has failed, all VMs that were running on the host at the time of the failure will be restarted by vSphere HA. vSphere HA can restart the VM on any available host in the cluster whether or not it is hosting vSAN components, as shown in Figure 7.17.

In the event of an isolation of a host, vSphere HA can and will also restart the impacted VMs. As this is a slightly more complex scenario, let’s take a look at it in more depth.

Figure 7.17 - vSAN 1 host failed, HA restart

Network Partition

A network partition could occur when there is a network failure. In other words, some hosts can end up on one side of the cluster, and the remaining hosts on another side. vSAN will surface warnings related to network misconfiguration in the event of a partition.

After explaining the host and disk failure scenarios in the previous sections, it is now time to describe how isolations and partitions are handled in a vSAN cluster. Let’s look at a typical scenario first and explain what happens during a network partition based on this scenario.

In the scenario depicted in Figure 7.18, vSAN is running a single VM on ESXi-01. This VM has been provisioned using a VM storage policy that has number of failures to tolerate set to 1.

Figure 7.18 - vSAN I/O flow: Failures to tolerate = 1

Because vSAN has the capability to run VMs on hosts that are not holding any active storage components of that VM, this question arises: What happens in the case where the network is isolated? As you can imagine, the vSAN network plays a big role here, made even bigger when you realize that it is also used by HA for network heartbeating. Note that the vSphere HA network is automatically reconfigured by vSAN to ensure that the correct network is used for handling these scenarios. Should this situation occur, the following steps describe how vSphere HA and vSAN will react to an isolation event:

  1. HA will detect there are no network heartbeats received from esxi-01.

  2. HA master will try to ping the slave esxi-01.

  3. HA will declare the slave esxi-01 is unavailable.

  4. VM will be restarted on one of the other hosts (esxi-03, in this case, as shown in Figure 7.19).

Figure 7.19 - vSAN partition with one host isolated: HA restart

Now this question arises: What if something has gone horribly bad in my network and esxi-01 and esxi-02 end up as part of the same partition? What happens then? Well, that is where the witness comes in to play. Refer to Figure 7.20 as that will make it a bit easier to understand.

Figure 7.20 - vSAN partition with multiple hosts isolated: HA restart

Now this scenario is slightly more complex. There are two partitions, one of which is running the VM with its virtual machine disk (VMDK), and the other partition has a VMDK replica and a witness. Guess what happens? Right, vSAN uses the witness to see which partition has quorum, and based on that result, one of the two partitions will win. In this case, partition 2 has more than 50% of the components of this object and therefore is the winner. This means that the VM will be restarted on either esxi-03 or esxi-04 by vSphere HA. Note that the VM in partition 1 will be powered off only if you have configured the isolation response to do so.

Tip: We would like to stress that this is highly recommended! (Isolation response > power off.)

But what if esxi-01 and esxi-04 were isolated, what would happen then? Figure 7.21 shows what it would look like.

Figure 7.21 - vSAN 2 hosts isolated: HA restart

Remember the rule we discussed earlier?

The winner is declared based on the percentage of components available or percentage of votes available within that partition.

If the partition has access to more than 50% of the components or votes (of an object), it has won. For each object, there can be at most one winning partition. This means that when esxi-01 and esxi-04 are isolated, either esxi-02 or esxi-03 can restart the VM because 66% of the components of the object reside within this part of the cluster.

To prevent these scenarios from occurring, it is most definitely recommended to ensure the vSAN network is made highly available through NIC teaming and redundant network switches, as discussed in Chapter 3, “vSAN Installation and Configuration.”

If vCenter is unavailable for whatever reason and you would like to retrieve information about the vSAN network, you can do so through the CLI of ESXi. From the CLI, an administrator can examine or remove the vSAN network configuration. In Example 7.4, you can see the VMkernel network interface used for cluster communication, and also the IP protocol used.

Example 7.4 VMKernel Network Interface and IP Protocol in use

~ # esxcli vsan network list
 VmkNic Name: vmk2
 IP Protocol: IPv4
 Interface UUID: 06419f51-ec79-0b57-5b3e-1cc1de252264
 Agent Group Multicast Address:
 Agent Group Multicast Port: 23451
 Master Group Multicast Address:
 Master Group Multicast Port: 12345
 Multicast TTL: 5
~ #

Disk Full Scenario

Another issue that can occur is a disk full scenario. You might ask, “What happens when the vSAN datastore gets full?” To answer that question, you should first ask the question, “What happens when an individual magnetic disk fills up?” because this will occur before the vSAN datastore fills up.

Before explaining how vSAN reacts to a scenario where a disk is full, it is worth knowing that vSAN will try to prevent this scenario from happening. vSAN balances capacity across the cluster and can and will move components around, or even break up components, when this can prevent a disk full scenario. Of course, the success of this action is entirely based on the rate at which the VM claims and fills new blocks and at which vSAN can relocate existing components. Simply put, the law of physics applies here.

In the event of a disk’s reaching full capacity, vSAN pauses (technically called stun) the VMs that are trying to write data and require additional new disk space for these writes; those that do not need additional disk space continue to run as normal. Note that vSAN-based VMs are deployed thin by default and that this only applies when new blocks need to be allocated to this thin-provisioned disk. When this occurs, the error message shown in Figure 7.22 appears on the VM’s summary screen.

Figure 7.22 - No more space message

This is identical to the behavior observed on Virtual Machine File system (VMFS) when the datastore reached capacity. When additional disk capacity is made available on the vSAN datastore, the stunned VMs may be resumed via the vSphere Web Client. Administrators should be able to see how much capacity is consumed on a per-disk basis via the Monitor > vSAN > Physical Disks view, as shown in Figure 7.23.

Figure 7.23 - Monitoring physical disks

Thin Provisioning Considerations

By default, all VMs provisioned to a vSAN datastore are thin provisioned. The huge advantage of this is that VMs are not taking up any unused disk capacity. It is not uncommon in datacenter environments to see 40% to 60% of unused capacity within the VM. You can imagine that if a VM were thick provisioned, this would drive up the cost, but also make vSAN less flexible in terms of placement of components.

Of course, there is an operational aspect to thin provisioning. There is always a chance of filling up a vSAN datastore when you are severely overcommitted and many VMs are claiming new disk capacity. This is not different in an environment where network file system (NFS) is used, or VMFS with thin provisioned VMs. The Web Client interface fortunately has many places where capacity can be checked, of which the summary tab shown in Figure 7.24 is an example.
Figure 7.24 - Capacity of vSAN datastore

We also have a new set of capacity views introduced in vSAN 6.2, and this makes it very easy to monitor how much space virtual machine objects are consuming.

When certain capacity usage thresholds are reached, vCenter Server will raise an alarm to ensure that the administrator is aware of the potential problem that may arise when not acted upon. By default, this alarm is triggered when the 75% full threshold is exceeded with an exclamation mark (severity warning), and another alarm is raised when 85% is reached (severity critical), as demonstrated in Figure 7.25. (Note that this issue will also raise an alarm in the health check).

Figure 7.25 - Datastore usage warning

vCenter Management

vCenter Server is an important part of most vSphere deployments because it is the main tool used for managing and monitoring the virtual infrastructure. In the past, new features introduced to vSphere often had a dependency on vCenter Server to be available, like for instance, vSphere Distributed Resource Scheduler (DRS). If vCenter Server was unavailable, that service would also be temporarily unavailable; in the case of vSphere DRS, this meant that no load balancing would occur during this time.

Fortunately, vSAN does not rely on vCenter Server in any shape or form, not even to make configuration changes or to create a new vSAN cluster. Even if vCenter Server goes down, vSAN continues to function, and VMs are not impacted whatsoever when it comes to vSAN functionality. If needed, all management tasks can be done through ESXCLI (or RVC for that matter); and in case you are wondering, yes, this is fully supported by VMware.

You might wonder at this point why VMware decided to align the vSAN cluster construct with the vSphere HA and DRS construct, especially when there is no direct dependency on vCenter Server and no direct relationship. There are several reasons for this, so let’s briefly explain those before looking at a vCenter Server failure scenario.

The main reason for aligning the vSAN cluster construct with the vSphere HA and DRS cluster construct is user experience. Today, when vSAN is configured/enabled, it takes a single click in the cluster properties section of the vSphere Web Client. This is primarily achieved because a compute cluster already is a logical grouping of ESXi hosts.

This not only allows for ease of deployment, but also simplifies upgrade workflows and other maintenance tasks that are typically done within the boundaries of a cluster. On top of that, capacity planning and sizing for compute is done at cluster granularity; by aligning these constructs, storage can be sized accordingly.

Last but not least: availability. vSphere HA is performed at cluster level, and it is only natural to deal with the new per-VM accessibility consideration within the cluster because vSphere HA at the time of writing does not allow you to fail over VMs between clusters. In other words, life is much easier when vSphere HA, DRS, and vSAN all share the same logical boundary and grouping.

vCenter Server Failure Scenario

What if you would lose the vCenter Server? What will happen to vSAN, and how do you rebuild this environment? Even though vSAN is not dependent on vCenter Server, other components are. If, for instance, vCenter Server fails and a new instance needs to be created from scratch, what is the impact on your vSAN environment?

After you rebuild a new vCenter, you need to redefine a vSAN-enabled cluster and add the hosts back to the cluster. Until you complete this last step, you will be receiving a “configuration issue” warning because the vSAN cluster will not have a matching vSphere cluster (with matching membership) in the vCenter inventory.

One additional consideration, however, is that the loss of the vCenter Server will also mean the loss of the VM storage policies that the administrator has created. Storage policy-based management (SPBM) will not know about the previous VM storage policies and the VMs to which they were attached. vSAN, however, will still know exactly what the administrator had asked for and keep enforcing it. Today, there is no way in the UI to export existing policies, but there is an application programming interface (API) for VM storage policies has been exposed.

One important thing to note about the VM storage policy API is that it is exposed as a separate API endpoint on vCenter Server and it will not be accessible through the normal vSphere API. To consume this API, you must connect to the SPBM server that requires an authenticated vCenter Server session. This API can be leveraged to export and import these policies. You can find an example of how to retrieve information about current VM storage policies in the following article by William Lam: Leveraging this example and the public SPBM APIs, it is possible to develop export and import scripts for your VM storage policies.

Running vCenter Server on vSAN

A common support question relates to whether VMware supports the vCenter Server that is managing vSAN to run in the vSAN cluster. The concern would be a failure scenario where the access to the vSAN datastore is lost and thus VMs, including vCenter Server, can no longer run. The major concern here is that no vCenter Server (and thus no tools such as RVC) is available to troubleshoot any issues experienced in the vSAN environment. Fortunately, vSAN can be fully managed via ESXCLI commands on the ESXi hosts. So, to answer the initial question, yes, VMware will support customers hosting their vCenter Server on vSAN (as in it is supported), but obviously in the rare event where the vCenter Server is not online and you need to manage or troubleshoot issues with vSAN, the user experience will not be as good. This is a decision that should be given some careful consideration.

Bootstrapping vCenter Server

If you can run vCenter Server on ESXi, how do you get it up and running in a greenfield deployment? Typically in greenfield deployments, no external storage is available, so vSAN needs to be available before vCenter Server can be deployed.

William Lam of has described a procedure that allows you to do exactly that. For the full procedure and many more articles on the topic of automation and vSAN, check out William Lam’s website (, who kindly gave us permission to leverage his content. For your convenience, we have written a short summary of the steps required to “bootstrap” vCenter Server on a single-server vSAN datastore:

Install ESXi onto your physical hosts. Technically, one host is needed to begin the process, but you will probably want to have two additional hosts ready unless you do not care about your vCenter Server being able to recover if there are any hardware issues.

You must modify the default vSAN storage policy on the ESXi host in which you plan to provision your vCenter Server. You must run the following two ESXCLI commands to enable “force provisioning”:

esxcli vsan policy setdefault -c vdisk -p
  “((\”hostFailuresToTolerate\” i1) (\”forceProvisioning\” i1))”

esxcli vsan policy setdefault -c vmnamespace -p
  “((\”hostFailuresToTolerate\” i1) (\”forceProvisioning\” i1))”

Confirm you have the correct vSAN default policy by running the following ESXCLI command:

~ # esxcli vsan policy getdefault
 Policy Class - Policy Value
 cluster - ((“hostFailuresToTolerate” i1))
 vdisk - ((“hostFailuresToTolerate” i1) (“forceProvisioning” i1))
 vmnamespace - ((“hostFailuresToTolerate” i1) (“forceProvisioning” i1))
 vmswap - ((“hostFailuresToTolerate” i1) (“forceProvisioning” i1))

You must identify the disks that you will be using on the first ESXi host to contribute to the vSAN datastore. You can do so by running the following ESXCLI command:

esxcli storage core device list

To get specific details on a particular device such as identifying whether it is an SSD or regular magnetic disk, you can specify the -d option and the device name:

esxcli storage core device list -d <disk identifier>

After you have identified the disks you will be using, make a note of the disk names as they will be needed in the upcoming steps. In this example, we have only a single SSD and single magnetic disk.

Before we can create our vSAN datastore, we need to first create a vSAN cluster. To create a vSAN cluster, we will use the following ESXCLI command, note that as of vSAN 6.0 it is no longer needed to generate and specify the UUID using the -u option, but instead you can provide the option “new”:

esxcli vsan cluster new

After the vSAN cluster has been created, you can retrieve information about the vSAN cluster by running the following ESXCLI command:

esxcli vsan cluster get

Next we need to add the disks from our ESXi host to create our single-node vSAN datastore. To do so, we need the disk device names from our earlier step for both SSD and HDDs and to run the following ESXCLI command:

esxcli vsan storage add -d <HDD-DISK-ID> -s <SSD-DISK-ID>

The -d option specifies regular magnetic disks, and the -s option specifies an SSD disk. If you have more than one magnetic disk, you will need to specify multiple -d entries. We also want to point out that in vSAN 6.0/6.1, it is not possible to create an all-flash vSAN datastore without adding the vSAN license first, but in 6.2 this should be possible. Now that you have added the disks to the vSAN datastore you verify which disks are contributing to the vSAN datastore by running the following ESXCLI command:

esxcli vsan storage list

One additional step to save us is that you can also enable the vSAN traffic type on the first ESXi host using ESXCLI, and you can also do this for the other two hosts in advance. This step does not necessarily have to be done now because it can be done later when the vCenter Server is available and using the vSphere Web Client. You will need to either create or select an existing VMkernel interface to enable the vSAN traffic type, and you can do so by running the following ESXCLI command:

esxcli vsan network ipv4 add -i <VMkernel-Interface>

At this point, you now have a valid vSAN datastore for your single ESXi host. You can verify this by logging in to the vSphere C# client, and you should see the vSAN datastore mounted to your ESXi host. You can now deploy the vCenter Server appliance OVA/OVF onto the vSAN datastore and power on the VM.

Once vCenter is deployed, you can create a cluster, enable vSAN on the cluster, and add the bootstrapped host as the first host of the cluster.

You can now reset the policies back to defaults.

You should add the remaining hosts to the cluster as soon as possible. Also, you need to create a new VM storage policy, and it is recommended to attach this policy to the vCenter Server VM and ensure that the vCenter Server VM becomes compliant with this new policy.


As demonstrated throughout the chapter, vSAN is easy to scale out and up. Even when configured in manual mode, adding new hosts or new disks is still only a matter of a few clicks. For those who prefer the command line, ESXCLI is a great alternative to the vSphere Web Client. For those who prefer PowerShell, VMware has a wide variety of PowerCLI cmdlets available.

results matching ""

    No results matching ""