IOPS Limit for Virtual SAN Objects
Virtual SAN version 6.2 introduced the ability to limit the amount of IOPS a virtual machine object could consume on a per-second basis. Virtual SAN normalizes the IO size, either read or write IO, in 32 KB blocks when it performs calculations. The throttling is specified as part of the storage policy and will be applied to each virtual machine object.
The following table demonstrates how actual IOPS are calculated on a Virtual SAN virtual machine object.
|VM IO Size
||VSAN IO Size
||VSAN IOPS Limit
||Actual IOPS Limit
Once a virtual machine encounters the IOPS limit specified in the storage policy, any remaining IO is delayed until the next one-second window. The current implementation creates a spikey IO profile for the virtual machine objects, which is far from ideal.
SIOC or Virtual SAN Limits?
A key differentiator between the Virtual SAN IOPS limit and Storage I/O Control, is the IOPS limit specified in the Virtual SAN storage policy is applied regardless of any congestion that may or may not exist on the datastore. As a result, a virtual machine object can be throttled even though the Virtual SAN datastore has plenty of unused or free IOPS.
As mentioned in the previous post, SIOC provides the ability to assign resource shares to a virtual machine and/or a limit. The current Virtual SAN feature is only a limit — meaning it will be enforced regardless of the overall IO resource consumption on the Virtual SAN datastore. As such, it is not a feature I have seen being regularly implemented inside a hyper-converged infrastructure (HCI) architecture. It would be nice to see the SIOC functionality added entirely to Virtual SAN in the future.
I have been heavily involved in designing our next-generation, large-scale hyper-converged (HCI) private cloud architecture at work the past couple of months. As part of that design, we needed a way to easily calculate resources available and cluster sizes using VMware Virtual SAN. When determining the resources available and the effects of the new Virtual SAN 6.2 features, the calculations became rather complex pretty quickly. A spreadsheet was born.
The spreadsheet allows a user to input the characteristics of their HCI nodes, and from there the spreadsheet will calculate resources available per node and per cluster size (4 nodes – 64 nodes). The key assistance the spreadsheet provides is the ability to specify a VM unit that can be used to determine how many units per server are necessary to fulfill an architectures requirements. The VM unit should be based off of the workload (known or expected) that will operate within the architecture.
The spreadsheet also allows the user to input the VSAN FTT policies, VSAN reduplication efficiency factor and memory overcommitment factors — all in an effort to help the user best determine what cluster sizes should be used and how different server configurations effect the calculations.
A few key cells that should be modified by the user initially:
- B2-B5 – HCI node CPU characteristics
- B10 – HCI node Memory characteristic
- B15-16,B18-19 – HCI node VSAN disk configuration
- B22-28 – Expected/desired VSAN and cluster efficiencies. A value of 1.0 for any efficiency factor means is the baseline.
From there, the remaining cells will be updated and provide a HCI summary node box (highlighted in Yellow) and cluster nodes sizes. The user can then see what the different configurations will yield with a VSAN RAID-1, VSAN RAID-5 and VSAN RAID-6 configuration based on the values inputted in the spreadsheet.
The spreadsheet takes into consideration the number of VSAN disk groups, the ESXi system overhead for memory and CPU, and the overhead VSAN 6.2 introduces as well.
All-in-all, this has proven to be a good tool for our team as we’ve been working on our new HCI design and hopefully will be a useful tool for others as well.
The spreadsheet can be downloaded here.