For this objective I used the following resources:
- vSphere Troubleshooting 6.0
- SAN System Design and Deployment Guide
- VMware Information Guide – VMware Virtual Networking Concepts
- VMware KB Article 1003893 – Troubleshooting Virtual Machine Connection Issues
- VMware KB Article 1001938 – Host Requirements for Link Aggregation for ESXi and ESX
- VMware KB Article 1004048 – Sample Configuration of EtherChannel/Link Aggregation Control Protocol with ESXi/ESX and Cisco/HP Switches
- VMware KB Article 1005577 – What is Beacon Probing
- VMware KB Article 1008205 – Using ESXTOP to Identify Storage Performance Issues for ESX/ESXi
Objective 7.2 Troubleshoot vSphere Storage & Network
Verify Network Configuration
Refer to each objective under Section Two. Focus on the core concepts and configuration of both vNetwork Standard Switches and vNetwork Distributed Switches:
- Port/dvPort Groups
- Load Balancing and Failover Policies
- VLAN Settings
- Security Policies
- Traffic Shaping Policies
For additional information read the VMware Information Guide “VMware Virtual Networking Concepts”. This document is based on VI3 but still does a good job with the core functions of a vStandard Switch.
Verify a Given Virtual Machine is Configured with the Correct Network Resources
Instead of duplicating work, refer to VMware KB 1003893, “Troubleshooting Virtual Machine Network Connection Issues”. More then enough information listed there.
Troubleshoot Physical Network Adapter Configuration Issues
This is pretty straight forward as there is not a lot of configuration done at the physical network layer. Be sure that your physical nics that are assigned to a virtual switch (vSwitch or dvSwitch) are configured the same (speed, vlans, etc) on the physical switch. If using IP Hash as your load balancing method make sure on the physical switch side link aggregation has been enabled. Refer to VMware KB 1001938 and VMware KB 1004048 for further details as well as examples. If using beacon probing for network failover detection it standard practice to use a minimum of three (or more) uplink adapters. See VMware KB 1005577 for further details.
Troubleshoot Virtual Switch and Port Group Configuration Issues
One key aspect to remember is when setting up Port Groups or dvPort Groups, spelling counts (as well as upper/lower case)! If a Port Group is spelled Test on one host and is spelled test on a second host vMotion will fail. Same holds true with Security Policies, if one vSwitch on a host is set to accept Promiscuous Mode and it is set to Reject on the other host, again vMotion will fail. Also, refer to the objectives under Section Two to be sure your switches are configured correctly.
Troubleshoot Common Network Issues
Using the above notes as well as the linked VMware KB articles one should be able to isolate issue to one of four areas:
- Virtual Machine
- ESX/ESXi Host Networking (uplinks)
- vSwitch or dvSwitch Configuration
- Physical Switch Configuration
Troubleshoot VMFS Metadata Consistency
Use the vSphere On-disk Metadata Analyser (VOMA) to identify and fix incidents of metadata corruption that affect file systems or underlying logical volumes. VOMA is executed from the CLI of an ESXi host and can be used to check and fix metadata inconsistency issues for a VMFS datastore or a virtual flash resource. The following example was pulled from the vSphere Troubleshooting documentation:
- Obtain the name and partition number of the device that backs the VMFS datastore that you need to check
- #esxcli storage vmfs extent list
- Run VOMA to check for VMFS errors. Provide the absolute path to the device partition that backs the VMDS datastore, and provide a partition number with the device name:
- # voma –m vmfs –f check –d /vmfs/devices/disks/naa.600508e000000000b367477b3be3d703:3
- The output lists possible errors
For the full run down of VOMA command options review the table on page 66 of the vSphere Troubleshooting documentation.
Verify Storage Configuration
Refer to the vSphere Storage and the SAN System Design and Deployment Guide (not specific to vSphere 6, but worth a read) by VMware. This will cover a lot of areas needed for working with a FC/iSCSI SAN environment with vSphere. Also a good understanding of the hardware you are using on the backend (storage arrays, FC switches, networking, etc) and there “vSphere Best Practices” documents will assist in the proper configuration.
Identify Storage I/O Constraints
With the mention “storage constraints” I am assuming they are hinting at I/O throughput or I/O latency issues. I find the quickest and easiest way of measuring/checking this is via esxtop/resxtop. VMware KB 1008205 and Duncan Eppings esxtop blog post covers this is in more detail.
Metrics to be aware of:
|DAVG||25||This is the average response time in milliseconds per command being sent to the device|
|GAVG||25||This the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG|
|KAVG||2||This is the amount of time the command spends in the VMKernel|
The following diagram (provided by VMware) provide a visual representation of the chart above:
Monitor/Troubleshoot Storage Distributed Resource Scheduler (SDRS)
Refer to Section 6, Troubleshooting Resource Management in vSphere Troubleshooting 6.0 documentation (pages 47 thru 55).
Troubleshoot Common Storage Issues
Refer to Section 7, Troubleshooting Storage in vSphere Troubleshooting 6.0 documentation (pages 55 thru 72). The section covers several storage related issues that you may run into.