vSAN TRIM/UNMAP Functionality

With my new role, I am once again drinking from the firehose. The result of which has primarily seen me go heads-down on all things vSAN related, including a course on the new features included in the vSphere 6.7 U1 release.

The latest vSAN release included in vSphere 6.7 U1, includes new functionality to support TRIM/UNMAP guest OS commands to reclaim unused disk space within a vSAN datastore. Previous versions of vSAN disregarded ATA TRIM and SCSI UNMAP commands issued by a virtual machine guest OS. The added support in vSphere 6.7 U1 helps with reclaiming unused disk space that are marked as such by a guest OS.

There are a few requirements for leveraging the new functionality.

  1. VMs must be thin-provisioned.
  2. Windows OS VMs must be at least HW version 9.
  3. Linux OS VMs must be at least HW version 13.
  4. The vSAN datastore must be On-Disk format version 7.
  5. VMs must be powered cycled after TRIM/UNMAP is enabled on the cluster or ESXi host. A reboot is not sufficient.

When enabled, the TRIM/UNMAP functionality will begin to reclaim space object on each vSAN disk group. If the vSAN cluster is leveraging deduplication, the work behind-the-scenes will potentially impact performance as there are more operations occurring transparent to the consumer. The performance latency values for TRIM/UNMAP commands can be viewed on each ESXi host in the /usr/lib/vmware/vsan/bin/vsanTraceReader log file.

To enable the TRIM/UNMAP functionality on a vSAN cluster or ESXi host, the following commands should be executed.

ESXi Host

To enable the functionality:
$ esxcfg-advcfg -s 1 /VSAN/GuestUnmap

To verify the functionality:
$ esxcfg-advcfg -g /VSAN/GuestUnmap

To disable the functionality:
$ esxcfg-advcfg -s 0 /VSAN/GuestUnmap

vSAN Cluster

Using the RVC, to enable the functionality:
> vsan.unmap_support -e ~CLUSTER_NAME

To verify the functionality:
> vsan.unmap_support ~CLUSTER_NAME

To disable the functionality:
> vsan.unmap_support -d ~CLUSTER_NAME

Once enabled on all the hosts within the vSAN cluster, and the VMs have all undergone a power cycle operation, there is one more thing to consider. There are two methods for the TRIM/UNMAP functionality to actually reclaim unused space as reported by the guest OS — Passive and Active.

Passive

  • For a Microsoft Windows Server 2012+ operating systems, it is enabled by default and reclaim operations are performed automatically.
  • For Linux operating systems, it is not enabled by default unless the filesystem has been mounted with the discard parameter.

Active

  • For a Microsoft Windows Server operating system, the Optimize Drive Utility must be leveraged.
  • For a Linux operating system, the fstrim command must be leveraged.

Enjoy!

Claim vSAN Capacity Disks for VCF 3.0

The latest release of VMware Cloud Foundation (VCF 3.0) removed the host imaging functionality. As past of the laundry list of pre-requisites for preparing an environment for VCF, one necessary step in an All-Flash vSAN environment is to mark the appropriate capacity disks.

During a POC deployment last week of VCF 3.0, this pre-requisite became evident and required a quick solution for marking the disks without having to glean all of the information manually. The following method is a quick way to identify which disks should be used for capacity and correctly allocating them as such for vSAN to claim during the VCF deployment workflows for either the Management or Workload Domain.

On the first ESXi node, we need to execute the following command to determine the capacity disk size. This command can be omitted on all remaining ESXi nodes as you prep them for VCF.

$ esxcli storage core device list
naa.58ce38ee20455a75
   Display Name: Local TOSHIBA Disk (naa.58ce38ee20455a75)
   Has Settable Display Name: true
   Size: 3662830
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.58ce38ee20455a75
   Vendor: TOSHIBA
   Model: PX05SRB384Y
   Revision: AS0C
   SCSI Level: 6
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: true
   Is Removable: false
   Is SSD: true
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: yes
   Attached Filters:
   VAAI Status: unknown
   Other UIDs: vml.020000000058ce38ee20455a75505830355352
   Is Shared Clusterwide: false
   Is Local SAS Device: true
   Is SAS: true
   Is USB: false
   Is Boot USB Device: false
   Is Boot Device: false
   Device Max Queue Depth: 254
   No of outstanding IOs with competing worlds: 32
   Drive Type: physical
   RAID Level: NA
   Number of Physical Drives: 1
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

The above output is an example of a vSAN SSD capacity disk. The only bit of information we need to automate the rest of the work is the size of the disk. Once you have the known size, substitute the value into the first grep command and execute the following CLI script on each node.

$ esxcli storage core device list | grep -B 3 -e "Size: 3662830" | grep ^naa > /tmp/capacitydisks; for i in `cat /tmp/capacitydisks`; do esxcli vsan storage tag add -d $i -t capacityFlash;  vdq -q -d $i; done

As each disk is marked as eligible for vSAN, the script will output that information for the user.

That’s it!

If you’d like to read more about the VCF 3.0 release, please check out the DataReload blog post.