Notes from the Field – Tips for Using Storage Policy Based Management with vSAN

One of my favorite aspects about VMware vSAN in an environment is the use of Storage Policy Based Management (SPBM for short) and how you can define the performance or resiliency aspects you want applied to a virtual machine. The days of working with LUNs and/or volumes can be left behind as for what used to be done in hardware, is now accomplished in software with a couple of mouse clicks. But not all are taking advantage of these options, I do see in meeting and working with customers is they are not leveraging this ability to its fullest potential or understanding. They are missing out! With that in mind, I thought I would provide a few tips for using SPBM along side vSAN deployments.

My Tips for Using SPBM with vSAN

Don’t modify the vSAN Default Storage Policy – vSphere/vSAN provides a default policy out of the box for a starting point for virtual machines and can be edited/modified as needed. If you find that changes need to be made to the policy either clone the default policy or create a new one from scratch. It might be my old school days of managing Active Directory but keeping the default policy intact provides a “known good state” that can be rolled back to if issues arise from a user created policy. That brings me to….

Don’t only use the vSAN Default Storage Policy – The out of the box policy is configured for a Failures to Tolerate of one (FTT=1) and a Fault Tolerance Method of Mirroring (FTM=Mirroring) and may be suitable for most workloads, but there are other possible configurations. If you have deployed an all flash vSAN configuration that opens the door for using erasure coding (RAID5/RAID6) policies for space savings (RAID5) or data resiliency (RAID6). Possibly there are database workloads that require higher level of performance (Mirroring) with a higher Failures to Tolerate (three is the maximum setting). Take away, create additional policies to meet virtual machine work load and resiliency requirements. These additional policies can be applied real time, without requiring downtime or an outage to the virtual machine(s).

Figure 1 – Fine Tune Your Policies

Get a baseline of your workloads – If making changes to the policy assigned to a virtual machine, having baseline data before and after the change can help troubleshoot potential issues. Using performance graphs in vCenter or a monitoring tool like vRealize Operations Manager (vROPS), can provide insight if the changes made had impact on the virtual machine (positive or negative). Consider a scenario of moving a higher performance workload from a Mirroring policy (best for performance) to a RAID5 policy to take advantage of the storage capacity savings. Having this data might save some time and headaches if issue arise.

Understand the additional SPBM policy options – During the creation of new policies there are additional options that can be chosen beyond the Failures to Tolerate or Fault Tolerance Method. Knowing what the impact of these additional options are and when to best utilize them is important. For instance, using the Number of Disk Stripes per Object policy setting might be useful when working with Big Data platforms, but for “general purpose” workloads it might be best to leave this option and the default setting of one. The IO Limits option is another setting I have seen cause issues in environments, setting this threshold to low can possibly restrict the performance of a virtual machine

Figure 2 – If using advanced options, read and understand the implications

Policies can be applied at several levels – Policies can be applied at various levels within a vSAN enabled cluster. At the highest level you can apply a policy to the vSAN Datastore and all virtual machines created and not assigned a specific policy will receive this “default” policy. Next down the list is to assign a policy to a given virtual machine either during VM creation or after the fact to meet performance or resiliency (or both) requirements. Finally, and at the most granular level is the ability to assign a policy at the VMDK level. This can come in handy when working with a DB server, for the data drives apply a Mirroring policy for optimal performance while applying a RAID5 policy for backup data and maintenance jobs for optimal space savings.

Figure 3 – Specifying Different Policies per Object

Back them up – Accidents happen and sometimes things get deleted, and backups can save the day! While SPBM policies are included in a backup of vCenter server (either VCSA or Windows based), that’s a lot of work to recover a SPBM policy or two. Thankfully code exists to backup (thx to Jase McCarty) the policies via PowerCLI and restore them as needed. Grab the code HERE.

Thanks for reading!

-Jason

Thinking About Storage Protocols for VSAN

While Software Defined Storage products such as VMware vSAN has moved the industry to thinking less and less about traditional monolithic or dual controller storage systems in the data center and more to a “commodity” based x86 architecture the underlying hardware still matters (just not as much). Specifically, when it comes to vSAN two key areas to think about for the physical storage devices are the HBA’s and the storage protocols for media devices. I am going to touch on the later of those two items as one thing that comes up in costumer conversations is “Should I use NVMe, SAS or SATA in my vSAN hosts?”. Hopefully this post will provide some insight to assist others with the same question. [Read more…]

Notes from the Field – vSAN Virtual Network Design

vvVirtual networking always takes a significant role in any VMware vSphere design, and even more so if you are leveraging IP-based storage like NAS or iSCSI. If using VMware’s vSAN product, I think it “turns the dial to 11” as the internode communication becomes that much more important versus host-to-target communication. A few months back (based on the date of this post), VMware released an updated vSAN Network Design document that I strongly encourage everyone to read if looking to, or are already running vSAN. For this post however, I am going to dive into what I have used in the field for customer deployments around NIC teaming and redundancy, as well as Network IO Control (NIOC) on the vSphere Distributed Switch (vDS).

Example Scenario

To start, let’s put together a sample scenario to create context around the “how” and “why”. As suggested in the vSAN Network Design document, all the customer designs I have been involved with have incorporated a single pair of ten gigabit Ethernet (10GbE) interfaces for the host-uplink connectivity to a Top of Rack (ToR) or core switch, using either TwinAX or 10GBaseT for the physical layer. This is accomplished using a pair of dual-port Intel X520- or X540-based cards, and allows for future growth if network needs arise down the road. The uplink ports are configured as Trunk ports (if using Cisco) or Tagged ports (if using Brocade/Dell/HP/etc) and the required VLANs for the environment are passed down to the hosts. On the virtual side, a single vDS is 必利勁
created, and each of the hosts in the vSAN cluster are added to the vDS. The required/needed port groups are created and configured with the relevant VLAN id, NIC Teaming and Failover policy (more to come later here). The following figure provides a visual representation:

(more…)

Migrating From a 2-Node to a 3-Node vSAN Cluster

A few months back I put together a post outlining the deployment of a 2-node vSAN cluster (located HERE). Well just like in a customer scenario, a 2-Node cluster might just not be enough resources and there is a need to expand. My lab has proven to fall into that category as my need for additional compute and storage resources has expanded for my Secondary/DR site and a third host is being added. This post will step through the straight forward process of “breaking” the 2-Node configuration.

[Read more…]