With vSphere 5.5 being generally available (GA) for almost six months I am starting to work with more customers who are looking into doing upgrades of their existing environments (4.x and up) or who are interesting in rolling out 5.5 as clean install for new/refreshed deployments. With vSphere 5.5, VMware brings some exciting new enhancements and improvements <cough> SSO <cough> to the table. I can say with a few upgrades/deployments under my belt the upgrades have been mostly pain free (thank you vCenter 5.5b) and the net new installs pretty much a breeze.
That changed a few weeks back when I was working with a customer on a new vSphere 5.5 deployment on fresh hardware. After working through the standard/best practices documentation I was able to get the new environment up and humming along quite easily. Feeling confident on the deployment (and not yet rolled into production) I left the customer site. The next day is when I received an email from the customer. They were being flooded with email alerts from each of their hosts roughly every 30 to 40 minutes stating that network redundancy was lost as the 10GB uplinks where reporting a loss of connectivity to the upstream switch.
We began pouring over the Cisco switch configurations to make sure there wasn’t an error or typo. Next was a review of the implementation of the vDS, nothing jumping out. I checked documentation both from Cisco and VMware to make sure both the networking team and the virtualization teams where on the same page for the requirments. All was good. Then came the checking of the cables, connections, right cable in the right port, etc. Everything checked out OK. Next up, drivers. I noticed on the VMware site a driver update later then the version bundled with ESXi media. Again no luck. Grabbing at one last final life line I reached out to the Twitters:
Nothing. At that point we decided to place a call to VMware Technical Support. Once on the line with the technician he noted that there was an internal KB article outlining this issue that had not yet been published. The “workaround” was to disable Network IO Control (NIOC) as VMware is still working on resolving the issue. While it was an answer and possible solution, I was less then excited as we are carrying multiple traffic types (VM, vMotion, etc) on these links and was worried about traffic congestion (you know, the whole reason you run NIOC).
Fast forward a few days and I see this Tweet from Mark Snook ( Twitter ) about the external KB article outlining the issue I was seeing:
The VMware KB article Mark is referencing is located here –> ESXi 5.5 Uplink Port Flaps when connected to a vSphere Distributed Switch (2065183)
While our TSR is still open with VMware in an effort to resolve the issue, I wanted to throw a post together so if anyone else sees/runs into this issue maybe their Google search will bring them to this post. Also as the KB article doesn’t make mention of it, I would be curious to know if this affects all versions of vDS (4.x/5.x) when running on vSphere 5.5 or just the “native” 5.5 version of the vDS.