Book Review–Networking for VMware Administrators

NetworkingForVMwareAdministrators

When I first heard that Chris Wahl  (blog / twitter) and Steven Pantol (twitter) were working on a book focused on networking topics for VMware administrators I knew it was going to be a must have book for the tech library and it did not disappoint. Being an IT veteran of 15+ years, my focus has always been on the system administration/storage side of the house. I did not really get more active in networking till I started working with VMware during VI 3.x days. Even at that point from a networking side I was mostly interested that my networking team provided the right/correct VLANs on my VMNIC uplinks.

This has changed over last few years as I have moved away from the day to day administration of a virtual environment more towards an architectural role. Along that way I have had to pickup networking skills from various resources, but nothing that was compiled together in a single book. Chris and Steve do a fantastic job of building up the basics of physical networking and then taking you into the advanced features of virtual networking in a vSphere environment.

The book is broken into four parts to help you across your networking journey:

  1. Physical Networking 101 – This section consists of six chapters and starts off with Ethernet basics, Layer 2/Layer 3 networking concepts, and finishes up discussing converged infrastructures solutions by Cisco, Nutanix, and others.
  2. Virtual Networking – Section two is the “meat” of this book. Seven chapters breakdown everything you need to know about configuring and designing virtual networking in your VMware vSphere environment. With full breakdowns of vSphere Standard Switch, vSphere Distributed Switch, and the Cisco 1000v this section alone is worth the price of admission.
  3. You Got Your Storage in My Networking: IP Storage – Four chapters covering the design and implementation of IP based storage. The chapters are split evenly between iSCSI and NFS best practices.
  4. Other Design Scenarios – The last two chapters in the book provide additional design scenarios and information. Chapter 18 provides four different network adapter configurations (2, 4, 6, and 8 nic based servers) with and without IP based storage. Chapter 19 covers mult-nic vMotion architectures.

While the subjects that are covered are for sure complex and detailed, both authors have done an excellent job creating content that is easy to read and retain. With the addition of the design examples you are sure to walk away from this book with the knowledge to implement the most advanced of vSphere networking features.

Happy Reading!

-Jason

Notes from the Field: VSAN Design–Networking

NoSANA few weeks back I published a post related to a VMware VSAN design I was working on for a customer (located here). That post focused mostly around the key area that VSAN addresses, storage. While the storage piece is where VSAN shines and has the most moving parts to understand from a design and implementation perspective, you can’t forget about the network. With the scale out nature of  VMware VSAN, the network connectivity between hosts to carry replica VM storage traffic becomes increasingly important.

As this post and the previous post are based on a customers design  leveraging  VSAN in a net new infrastructure we are implementing 10Gb Ethernet connectivity for the ESXi host connectivity. Two factors played into this decision, first was the fact that 10Gb Ethernet over the last few years has come down in pricing allowing for a greater adoption rate. Second, as we are deploying VSAN, VMware recommends using 10Gbe to provide the needed network throughput/bandwidth to handle the storage traffic.

Since we are “building” our own VSAN nodes as mentioned in the storage post, it was off to the VMware HCL I/O section to confirm/verify supported 10Gb Ethernet NICs to be used with our Dell R720 servers. We will be using copper based 10GBase-T switches  for connectivity so the servers will be configured with redundant Dell OEM’ed Intel Ethernet 10G 2P x540-t adapters. For the initial deployment we will be using one port from each card to provide redundancy and availability.

Someone Mixed VDS with My VSAN

While VSAN brings along with it some cool technology related to storage, one piece that is overlooked (or hasn’t received enough attention in my opinion) is that when licensing VSAN, VMware bundles in the ability to utilize the Virtual Distributed Switch (VDS). This feature is normally reserved for deployments involving VMware’s Cadillac version of licensing, Enterprise Plus. Leveraging the VDS along with the Network I/O Control (NIOC), a feature that is only available on VDS, allows for a streamlined installation/configuration of the vSphere environment. Additionally deploying the VDS in 10Gbe VSAN environment is preferred by VMware. The below quote is taken from page 7 of the VMware Virtual SAN Design and Sizing Guide:

Virtual SAN provides support for both vSphere standard switch and VMware vSphere Distributed Switch™, with either 1GbE or 10GbE network uplinks. Although both vSphere switch types and network speeds work with Virtual SAN, VMware recommends the use of the vSphere Distributed Switch with 10GbE network uplinks.”

If you are not familiar with VDS or NIOC, Frank Denneman has a great primer post on the feature and functionality, that post can be viewed here. Also, though a bit dated, VMware has an excellent whitepaper around VDS design and implementation. VMware vSphere Distributed Switch Best Practices  is available here. The diagram below provides an overview of how the hosts will configured and will communicate at both the physical layer as well as the VDS/portgroup layer.

image

For sake of simplicity the diagram above only show the use of five portgroups that will need to be created on the VDS for our deployment. The traffic type and VDS teaming policy for each portgroup is listed in the table below:

Traffic Type

Port Group

Teaming Option

Active Uplink

Standby Uplink

Management

Mgmt

LBT

vmnic0/vmnic2

N/A

vMotion

vMotion-1

Explicit Failover

vmnic0

vmnic2

vMotion

vMotion-2

Explicit Failover

vmnic2

vmnic0

VSAN

VSAN

Explicit Failover

vmnic0

vmnic2

Virtual Machine

Virtual Machine

LBT

vmnic0/vminc2

N/A

 

Virtual SAN Networking Requirements and Best Practices

VMware has published a guideline for VSAN requirements and deployment best practices. Below is the listing from VMware vSphere 5.5 Documentation Center located here.

  • Vitual SAN does not support IPv6
  • Virtual SAN requires a private 1Gb network. As a best practice, use 10Gb network.
  • On each host, dedicate at minimum a single physical 1Gb Ethernet NIC to Virtual SAN. You can also provision one additional physical NIC as a failover NIC.
  • You can use vSphere standard switches on each host, or you can configure your environment with a vSphere Distributed Switch.
  • For each network that you use for Virtual SAN, configure a VMkernel port group with the Virtual SAN port property activated.
  • Use the same Virtual SAN Network label for each port group and ensure that the labels are consistent across all hosts.
  • Use Jumbo Frames for best performance.
  • Virtual SAN supports IP-hash load balancing, but cannot guarantee improvement in performance for all configurations. You can benefit from IP-hash when Virtual SAN is among its many consumers. In this case, IP-hash performs the load balancing. However, if Virtual SAN is the only consumer, you might not notice changes. This specifically applies to 1G environments. For example, if you use four 1G physical adapters with IP-hash for Virtual SAN, you might not be able to use more than 1G. This also applies to all NIC teaming policies that we currently support. For more information on NIC teaming, see the Networking Policies section of the vSphere Networking Guide.
  • Virtual SAN does not support multiple VMkernel adapters on the same subnet for load balancing. Multiple VMkernel adapters on different networks, such as VLAN or separate physical fabric, are supported.
  • You should connect all hosts participating in Virtual SAN to a single L2 network, which has multicast (IGMP snooping) enabled. If the hosts participating in Virtual SAN span across multiple switches or even across L3 boundaries, you must ensure that your network is configured correctly to enable multicast connectivity. You can change multicast addresses from the defaults if your network environment requires, or if you are running multiple Virtual SAN clusters on the same L2 network.
    I hope this post as well as the original post is helpful in designing/implementing your VSAN environment.

-Jason

Notes From the Field: VSAN Design

NoSANWith the official release of VMware VSAN a bit of over a month ago on March 11th when ESXi 5.5 U1 dropped I am having more conversations with customers around the product and designing solutions. While with some customers it has been more of an inquisitive peek at the technology, I have had the chance to work on a few designs (OK, two) with customers looking to deploy VSAN over a “traditional” storage array for their storage needs.

For both configurations we went the “roll your own” solution over configurations available via the Node Ready program,  For this reason I leaned heavily on three key resources for the builds:

*Dell documentation listed as server/compute/storage are based on Dell platforms

    While I am not going to provide a deep dive review of VSAN as there are plenty of resources available on the internet/blogs as well as the documentation listed above that will provide the needed details. But I will give is a quick breakdown of what the storage requirements are  laid out by VMware for a VSAN deployment.
Artifacts Minimums Maximums
Disk Groups One per host Five per host
Flash Devices
SAS, SATA, PCIe SSD
One per disk group One per disk group
Magnetic Disk Devices One HDD per disk group Seven HHDs per disk group
Disk Formatting Overhead 750MB per HDD 750MB per HDD

*Table from page 3 of the “VMware Virtual SAN Design and Sizing Guide

For our specific use case and customer requirements we will be deploying a three node cluster (minimum for VSAN) with the default settings of Number of  Failures to Tolerate set to 1 and Number of Disk Stripes per Object  set to 1 as well. We are aiming for around twelve usable terabytes of space to start.

      • Number of Failures to Tolerate – This setting controls the number of replica copies of the virtual machine VMDK(s) created across the cluster. With the default value set to 1, two replicas of the VMDK(s) will be created. As you increase this value you will provide additional redundancy for the virtual machine at the cost of using additional storage for the replica copies. Maximum value is 3
      • Number of Disk Stripes per Object – The number of HDDs across which each replica of a virtual machine object is striped. A value higher than 1 might result in better performance, but also results in higher use of system resources

VSAN

The Build

As mentioned above, I will be leveraging servers from Dell for this configuration. To meet the minimum requirements defined by the customer we went with Dell R720’s as the host servers with the capability to hold 16 x 2.5 inch drives in a single chassis. Utilizing the 16 drive chassis gives us the ability to create at least two fully populated VSAN disk groups (7+1)  per host for future growth/expansion (one now, one down the road). To make room/allow for the use of all 16 slots we will be leverage redundant SD cards for the ESXi installation (Note – Remember to redirect the scratch partition!). Again, with building our own solution I checked and rechecked the VMware VSAN compatibility guide for IO devices (controllers/HDD/SSD) as well as the ESXi compatibility guide for supported servers and components.

DellR720_2

For the actual drive configuration I took to the VSAN HCL to verify the supported drives from Dell. As stated above each disk group needs to have one flash device and and one magnetic device. To meet the overall storage requirement (calculation below) of twelve usable terabytes the first disk group will be made up of 7 x 1.2TB 10K SAS drives. The flash device used for the read/write buffering will be the 400GB SSD SATA Value drive. Connectivity of all the drives will be provided by a LSI SAS 9207-8i controller. This controller was chosen as it will allow for true pass-through or JBOD mode to present the drives to VMware for the VSAN Datastore creation.

Some might ask why we decided to go with 10K SAS drives over “cheap and deep” with NL-SAS drives. The largest 2.5 inch NL-SAS drive offering from Dell is 1.0TB while the largest 2.5 inch SAS drive comes in at 1.2TB for a 10K spindle. Going with the 10K drives provided two design advantages, additional capacity per disk and the additional IOPS provided for when IO needs to come from spinning disk.

Now for the capacity. The VMware documentation breaks down the math needed to come up with sizing calculations around capacity, objects, components, etc. What I am going to show below is how the chosen configuration gets us to the target number for the customer. On page 9 of the VMware Virtual SAN Design and Sizing Guide the following formula is provided for Cluster Capacity:

Formula: Host x NumDskGrpPerHst x NumDskPerDskGrp x SzHDD = y

My Configuration: 3 x 1 x 7 x 1.2TB = 25.2TB

But that is only one step in the process. After calculating the Cluster Capacity I need to get to the number I really care about, Usable Capacity. Again from the VMware documentation on page 10 I get the following:

Formula: (DiskCapacity – DskGrp x DskPerDskGrp x Hst x VSANoverhead )/(ftt+1)

My Configuration: (25.2 – 1 x 7 x 3 x 1)/ftt+1 = 25804 –21/ftt+1 = 25783/2=  12891 or roughly 12.8TB

Now a word on flash devices. One thing to make note of, in your VSAN calculations the flash devices DO NOT participate in the calculations. Since they are only used as read/write buffering they don’t contribute to the overall storage pool. Also, you maybe wondering how/why I chose the 400GB SSD’s for my flash tier. Stated on page 7 of the VMware documentation:

The general recommendation for sizing flash capacity for Virtual SAN is to use 10 percent of the anticipated consumed storage capacity before the number of failures to tolerate is considered.

By the following statement I have oversized my flash tier as initially the customer will be using only percentage of the twelve terabytes of capacity. But I like to play things a little safer and sized my flash tier based on ten percent of the usable capacity (VMware’s original sizing guideline) as the difference in pricing from a 200GB to a 400GB SSD is nominal, in addition we have sized for the future utilization of the usable capacity to follow in-line with VMware’s statement above.

Become a World Traveler on Veeam’s Dime

veeam_thumbVeeam, one of the most well know backup software suites for your virtual environment (regardless if VMware or Microsoft) is on pace for it’s 100,000 customer! In celebration of this milestone Veeam is starting an interactive contest giving  you the chance to win awesome prizes by Google (Glass), Apple (iPad) and Microsoft (Surface). The grand prize winner will win a trip around the world! Not a bad deal.

The contest will begin this week, so be sure to visit the registration page (link below) to make sure you don’t miss out on the action.

To participate, visitors need to register and predict the location of Veeam’s 100,000th customer on the interactive map. The closer you are to the right spot, the better chance you have to win the trip around the world and other prizes.

Guess the location here: http://world.veeam.com/

For more information: http://world.veeam.com/veeam_tc_2014.pdf

Best of luck!

-Jason

Notes From the Field – vSphere 5.5 with Virtual Distributed Switch

With vSphere 5.5 being generally available (GA) for almost six months I am starting to work with more customers who are looking into doing upgrades of their existing environments (4.x and up) or who are interesting in rolling out 5.5 as clean install for new/refreshed deployments. With vSphere 5.5, VMware brings some exciting new enhancements and improvements <cough> SSO <cough> to the table. I can say with a few upgrades/deployments under my belt the upgrades have been mostly pain free (thank you vCenter 5.5b) and the net new installs pretty much a breeze.

That changed a few weeks back when I was working with a customer on a new vSphere 5.5 deployment on fresh hardware. After working through the standard/best practices documentation I was able to get the new environment up and humming along quite easily. Feeling confident on the deployment (and not yet rolled into production) I left the customer site.  The next day is when I received an email from the customer. They were being flooded with email alerts from each of their hosts roughly every 30 to 40 minutes stating that network redundancy was lost as the 10GB uplinks where reporting a loss of connectivity to the upstream switch.

We began pouring over the Cisco switch configurations to make sure there wasn’t an error or typo. Next was a review of the implementation of the vDS, nothing jumping out. I checked documentation both from Cisco and VMware to make sure both the networking team and the virtualization teams where on the same page for the requirments. All was good. Then came the checking of the cables, connections, right cable in the right port, etc. Everything checked out OK. Next up, drivers. I noticed on the VMware site a driver update later then the version bundled with ESXi media. Again no luck. Grabbing at one last final life line I reached out to the Twitters:

Twitter_Help

Nothing. At that point we decided to place a call to VMware Technical Support. Once on the line with the technician he noted that there was an internal KB article outlining this issue that had not yet been published. The “workaround” was to disable Network IO Control (NIOC) as VMware is still working on resolving the issue. While it was an answer and possible solution, I was less then excited as we are carrying multiple traffic types (VM, vMotion, etc) on these links and was worried about traffic congestion (you know, the whole reason you run NIOC).

Fast forward a few days and I see this Tweet from Mark Snook ( Twitter ) about the external KB article outlining the issue I was seeing:

Twitter_Help_2

The VMware KB article Mark is referencing is located here –> ESXi 5.5 Uplink Port Flaps when connected to a vSphere Distributed Switch (2065183)

While our TSR is still open with VMware in an effort to resolve the issue, I wanted to throw a post together so if anyone else sees/runs into this issue maybe their Google search will bring them to this post. Also as the KB article doesn’t make mention of it, I would be curious to know if this affects all versions of vDS (4.x/5.x) when running on vSphere 5.5 or just the “native” 5.5 version of the vDS.

-Jason

Converting in-guest iSCSI volumes to native VMDKs

In this guest post, fellow Seattle VMUG member, Pete Koehler (@vmpete) writes about options on transitioning away from in-guest iSCSI attached volumes to native VMDKs.

Over the years I’ve posted about the benefits of using in-guest iSCSI volumes. The need stemmed from a time several years ago in which my environment had very limited tools in the war chest, but needed to take advantage of large volume sizes, good application quiescing using VSS, and squeak out every bit of performance with multi-pathing. As my buddy Jason Langer reminded me often, the thought of in-guest iSCSI volumes sounded a little, well… antiquated.

tweet

Interestingly enough, I couldn’t agree more. While it might have been the virtualization equivalent of wearing acid-washed jeans, they had served a purpose at one time. It also became painfully clear that in other ways, they were an administrative headache that no longer fit in today’s world. I wanted to change, but had limited opportunities to do so.

As times change, so do the options. We are all served well by doing a constant evaluation of design choices. So please welcome 2014. vSphere 5.5 has broken down the 2TB VMDK barrier. The best of the backup products leverage VMware’s APIs to see the data. And, if you wanted to take advantage of a great host acceleration solutions out there, VMware needs to be aware of the volumes. Add that up, and in-guest iSCSI wasn’t going to cut it. This was by no surprise, and the itch to convert all of my guest attached volumes has been around for at least a couple of years. In fact, many had been migrated a while ago. But I’ve noticed quite a few questions have come my way on how I made the transition. Apparently I wasn’t the only one sporting acid washed jeans.

That was the rationale for the change. Now for how to go about changing them. Generally, the most viable options are:

  • Conversion using the VMware converter
  • Conversion by changing connection to an RDM, then convert to VMDK via storage vMotion.
  • Transition data inside of guest to pristine VMDK using rsync (Linux VM).

What option you choose may depend somewhat on your environment. Honestly, I never thought of the RDM/Storage vMotion method until Jason suggested over dinner recently. Chalk one up to sharing ideas with your peers, which was the basis for this post. Below I will outline the steps taken for each of the three methods listed.

Regardless of the method chosen, you will be best served by taking the additional steps necessary to remove the artifacts from the old method of connections. Removing of NICs used for iSCSI connection, as well as any Integration tools (like the Dell EqualLogic Host Integration Toolkit). And finally, remove the old guest volumes from the storage array after they are no longer in use. Also remember to take any precautions necessary before the transitions, such as backing up the data or changes to the VM.

 

Conversion using the VMware converter
This method using the VMware converter tool installed in the VM. It allows the ability to convert the in guest volumes to a native VMDK files. Using this method is very predictable and safe, but depending on the size of the volume, might require a sizable maintenance window as you convert the volumes.

1. Install VMware Converter inside of guest.

2. Make note of all services that touch guest volumes, and shut off, as well as temporarily turning them to “disabled”

3. Launch Converter, and click “Convert Machine” > “This local machine”. Select a destination type of VMware Infrastructure VM, and change name to “[sameVMname]-guest”. Complete by selecting appropriate VMware folder and destination location. You may select only guest volumes necessary, as the other files it creates will be unnecessary

4. Remove newly created “[sameVMname]-guest” VM from inventory, and copy VMDK file(s) from old datastore to new location if necessary.

5. Once complete, disconnect all in-guest iSCSI volumes, remove the Host integration Toolkit, disable iSCSI NICs inside the VM, and power down.

6. Edit the VM properties to disable or remove the iSCSI NICs

7. Attach newly created VMDKs to VM, ideally choosing a device node of anything other than 0:# to improve performance. (e.g. new VMDK might be on 1:0 and an additional VMDK might be on 2:0, etc.)

8. Power on to verify all is running, and new drives are mapped to the correct drive letters or mount points.

9. Re-enable all services set to disabled earlier to their original settings, clear event logs, and reboot

10. Verify access and services are now running correctly.

This method was most commonly used on SQL servers that had smaller volume sizes. As the guest volumes grew, so did the maintenance window.

 

Conversion by changing connection to an RDM, then convert to VMDK via storage vMotion
This method first changes the connection method of the in guest volume to an RDM, then converts it to a native VMDK after a storage vMotion. A very predictable and safe method as well, but offers the additional benefit that any maintenance window needed for conversion is NOT based on the size of the volume. It’s maintenance windows is only for the time in which you briefly power down the VM.

1. Make note of all services that touch guest volumes, and shut off, as well as temporarily turning them to “disabled”

2. Once complete, disconnect all in-guest iSCSI volumes, remove the Host integration Toolkit, disable iSCSI NICs inside the VM, and power down.

3. On the storage system present the iSCSI disk to all ESXi hosts

4. Scan the host so they see the disk

5. Add an RDM (Virtual Mode) disk to the VM and pointing it to the newly host mounted iSCSI disk

6. Power on the VM, verify the RDM mounted and apps and/or data is present.

7. Re-enable all services set to disabled earlier to their original settings.

8. Storage vMotion the VM, making sure to you go into the “Advanced” settings.

9. Move the c: VMDK to a LUN and move the RDM to a VMFS LUN (Then change the disk format from “Same” to Thick, Thin, or Thick Eager Zero on the RDM disk). Once the storage vMotion is complete the RDM should now be migrated to a VMDK.

10. Unmount the previous mounted iSCSI volume from the ESXi hosts and verify access and services are now running correctly.

The nice thing about this method is that the VM is up and in production while the storage vMotion happens. It also catches all of the changes during the move.

 

Transition data inside of guest to pristine VMDK using rsync
This method is for Linux VMs, whereby one creates a pristine VMDK, then transfer the data inside the guest via rsync. This process can take some time to seed the new volume, but it is essentially a background process for the VM. The actual cut is typically just a changing of /etc/fstab and a restart. It can use additional resources, but in certain circumstances may be a good fit.

1. Create desired VMDKs for the VM, ideally choosing a device node of anything other than 0:# to improve performance. (e.g. new VMDK might be on 1:0 and an additional VMDK might be on 2:0)

2. Inside the guest, create the new partition using parted, or gparted, then format using mkfs.

3. Create the device mount locations, and then add entries in /etc/fstab.

4. Restart and validate that volume is mounting properly.

5. Begin the rsync process from the old location to the new location. Syntax will look something like rsync -av –delete –bwlimit=7500 root@[systemname]:/oldpath/todata /newpath/todata/

6. Once complete, redirect any symbolic links to the new location, and adjust mount points in /etc/fstab.

7. Restart to test and validate. Verify access and services are now running correctly.

8. Remove connections to old guest volumes, and clean up VM by disabling or removing iSCSI based NICs, etc.

This method allowed for some restructuring of data on some extremely large volumes, something that the Development team wanted to do anyway. It allowed IT to delegate the rsync processes off to the teams handling the data change, so that the actual cutover could be fit into their schedules.

 

The results
While I’m not completely finished with the conversion (a few more multi-terabyte volumes to go), the process of simplifying the environment has been very rewarding. Seeing these large or I/O sensitive data volumes take advantage of I/O acceleration has been great. Simplifying the protection our mission critical VMs was even more rewarding.

- Pete