Notes from the Field – vSAN Virtual Network Design

vvVirtual networking always takes a significant role in any VMware vSphere design, and even more so if you are leveraging IP-based storage like NAS or iSCSI. If using VMware’s vSAN product, I think it “turns the dial to 11” as the internode communication becomes that much more important versus host-to-target communication. A few months back (based on the date of this post), VMware released an updated vSAN Network Design document that I strongly encourage everyone to read if looking to, or are already running vSAN. For this post however, I am going to dive into what I have used in the field for customer deployments around NIC teaming and redundancy, as well as Network IO Control (NIOC) on the vSphere Distributed Switch (vDS).

Example Scenario

To start, let’s put together a sample scenario to create context around the “how” and “why”. As suggested in the vSAN Network Design document, all the customer designs I have been involved with have incorporated a single pair of ten gigabit Ethernet (10GbE) interfaces for the host-uplink connectivity to a Top of Rack (ToR) or core switch, using either TwinAX or 10GBaseT for the physical layer. This is accomplished using a pair of dual-port Intel X520- or X540-based cards, and allows for future growth if network needs arise down the road. The uplink ports are configured as Trunk ports (if using Cisco) or Tagged ports (if using Brocade/Dell/HP/etc) and the required VLANs for the environment are passed down to the hosts. On the virtual side, a single vDS is created, and each of the hosts in the vSAN cluster are added to the vDS. The required/needed port groups are created and configured with the relevant VLAN id, NIC Teaming and Failover policy (more to come later here). The following figure provides a visual representation:

Figure 1 – Logical vDS Design

NIC Teaming – Keep it Simple Stupid

I like simple things, especially in my infrastructure. Simple things work, and when they don’t, it makes it easier to troubleshoot. My vDS layout for vSAN is no different, but can ease the burden on you and make it so you don’t have to really engage your networking individual/team for higher level configuration [i.e., Link Aggregation Control Protocol (LACP)]. So how do we accomplish this? Easy: on a per vDS port group level, configure Load Balancing for Use explicit failover order and set the Active Uplinks for all the non-vSAN portgroups to vmnic0 (based on figure above) and the vSAN-specific Active Uplinks to vmnic1. Next, set the Standby Uplinks for the non-vSAN portgroups for vmnic1; set the Standby Uplinks to vmnic0 for the vSAN portgroup. The below table provides the example based on our scenario

Table 1 – Active/Standby Settings

Portgroup

Load Balancing

Active Uplink

Standby Uplink

MGMT-100

Use Explicit Failover

vmnic0

vmnic1

VSAN-110

Use Explicit Failover

vmnic1

vmnic0

VMOTION-120

Use Explicit Failover

vmnic0

vmnic1

SERVERS-200

Use Explicit Failover

vmnic0

vmnic1

For the remainder of a given port group’s settings, accept the default setting of Link status only for Network Failure Detection, and Yes for Notify Switches and Failback options.

Figure 2 – Teaming and Failover Settings

With this configuration, you will provide vSAN with the full use of one 10GbE link (with redundancy) and allow other traffic in your environment to “share” a different 10GbE link (with redundancy).

Network I/O Control or “Trust in the Software”

Now that we have the vDS Port Group layout taken care of, it is time to tackle Network I/O Control (NIOC). NIOC provides the ability to properly balance network traffic on the shared 10GbE Ethernet interfaces; it accomplishes this by utilizing Network Resource Pools (NRPs) and Shares to determine the bandwidth that is provided to different network traffic types leaving the vDS. Each NRP is assigned a physical adapter share value that determines the total available bandwidth guaranteed to that traffic type. These reservations only apply when a physical adapter is saturated, and are implemented to ensure a guaranteed floor of network bandwidth for each of the NRPs. NRPs are based on individual physical uplinks, and are not an aggregate of uplinks assigned to the vSphere Distributed Switch. Only traffic types that are accessing the uplink at that time are calculated to determine the usable network bandwidth.

Note – The above is a brief description of how NIOC functions and works in a vSphere environment. For a deeper dive see Frank Denneman’s blog and have a look at his post A Primer on Network IO Control.

Just having NIOC enabled only gets us part way there. As mentioned above, we want to leverage NRPs to assign/provide share values to when traffic is constrained. By default, VMware has preconfigured values for the standard traffic types you see in a vSphere environment (see table 2 – Default NRP Shares). Under normal circumstances and deployments, these defaults alone are usually accepted and no tweaking/tuning need be considered.

Table 2 – Default NRP Shares

Traffic Type

Shares

Shares Value

Fault Tolerance (FT) Traffic

Normal

50

Management Traffic

Normal

50

NFS Traffic

Normal

50

Virtual Machine Traffic

High

100

iSCSI Traffic

Normal

50

vMotion Traffic

Normal

50

vSAN Traffic

Normal

50

vSphere Data Protection Traffic

Normal

50

vSphere Replication (VR) Traffic

Normal

50

When adding vSAN into the mix, this stance changes a bit. Per VMware’s own vSAN Networking Design white paper, “vSAN should always have the highest priority compared to any other protocol”. So, with that in mind—and taking our example scenario into consideration—we only need to account for Management, vMotion, virtual machine, and vSAN traffic going across the wire. The new NRP Shares would be configured to look like the following:

Table 3 – vSAN Enabled NRP Shares

Traffic Type

Shares

Shares Value

Management Traffic

Low

25

Virtual Machine Traffic

Normal

50

vMotion Traffic

Low

25

vSAN Traffic

High

100

And now some math… 😊

With the NRP Shares put into place, what does our worst-case scenario look like if a host had to failover to a single 10GbE adapter with our NIOC configuration? Table 4 provides the answer.

Table 4 – NIOC Minimums

NRP Share

RAW Value

Minimum Bandwidth

Management Traffic (25)

25/200 (12.5%)

1,250 Mb

Virtual Machine Traffic (50)

50/200 (25%)

2,500 Mb

Virtual SAN Traffic (100)

100/200 (50%)

5,000 Mb

vMotion Traffic (25)

25/200 (12.5%)

1,250 Mb

Total

100%

10,000 Mb

Wrapping Up

Keeping things simple tends to make one’s life easier, and especially so when dealing with IT infrastructure. What has been described above is a good start or foundation to a vSAN deployment, as it relates to the virtual networking layer. Depending on the size of your environment, this configuration could be more than adequate; but this isn’t meant to be a one size fits all post. Make sure to read through and understand the vSAN Networking Design white paper, especially if you look to leverage multiple vSAN vmkernel ports or wish to use LACP in your environment. These might require a higher level of configuration and/or complexity in your environment, but the design requirements and justifications might validate the need for these additional changes.

Thanks for reading!

-Jason

%d bloggers like this: