VCP 6–Objective 7.2 Troubleshoot vSphere Storage & Network Issues

For this objective I used the following resources:

Objective 7.2 Troubleshoot vSphere Storage & Network

Knowledge

Verify Network Configuration

Refer to each objective under Section Two. Focus on the core concepts and configuration of both vNetwork Standard Switches and vNetwork Distributed Switches:

  • Port/dvPort Groups
  • Load Balancing and Failover Policies
  • VLAN Settings
  • Security Policies
  • Traffic Shaping Policies

For additional information read the VMware Information Guide “VMware Virtual Networking Concepts”. This document is based on VI3 but still does a good job with the core functions of a vStandard Switch.

Verify a Given Virtual Machine is Configured with the Correct Network Resources

Instead of duplicating work, refer to VMware KB 1003893, “Troubleshooting Virtual Machine Network Connection Issues”. More then enough information listed there.

Troubleshoot Physical Network Adapter Configuration Issues

This is pretty straight forward as there is not a lot of configuration done at the physical network layer. Be sure that your physical nics that are assigned to a virtual switch (vSwitch or dvSwitch) are configured the same (speed, vlans, etc) on the physical switch. If using IP Hash as your load balancing method make sure on the physical switch side link aggregation has been enabled. Refer to VMware KB 1001938 and VMware KB 1004048 for further details as well as examples. If using beacon probing for network failover detection it standard practice to use a minimum of three (or more) uplink adapters. See VMware KB 1005577 for further details.

Troubleshoot Virtual Switch and Port Group Configuration Issues

One key aspect to remember is when setting up Port Groups or dvPort Groups, spelling counts (as well as upper/lower case)! If a Port Group is spelled Test on one host and is spelled test on a second host vMotion will fail. Same holds true with Security Policies, if one vSwitch on a host is set to accept Promiscuous Mode and it is set to Reject on the other host, again vMotion will fail. Also, refer to the objectives under Section Two to be sure your switches are configured correctly.

Troubleshoot Common Network Issues

Using the above notes as well as the linked VMware KB articles one should be able to isolate issue to one of four areas:

  • Virtual Machine
  • ESX/ESXi Host Networking (uplinks)
  • vSwitch or dvSwitch Configuration
  • Physical Switch Configuration

Troubleshoot VMFS Metadata Consistency

Use the vSphere On-disk Metadata Analyser (VOMA) to identify and fix incidents of metadata corruption that affect file systems or underlying logical volumes. VOMA is executed from the CLI of an ESXi host and can be used to check and fix metadata inconsistency issues for a VMFS datastore or a virtual flash resource.  The following example was pulled from the vSphere Troubleshooting documentation:

  • Obtain the name and partition number of the device that backs the VMFS datastore that you need to check
    • #esxcli storage vmfs extent list
  • Run VOMA to check for VMFS errors. Provide the absolute path to the device partition that backs the VMDS datastore, and provide a partition number with the device name:
    • # voma –m vmfs –f check –d /vmfs/devices/disks/naa.600508e000000000b367477b3be3d703:3
  • The output lists possible errors

For the full run down of VOMA command options review the table on page 66 of the vSphere Troubleshooting documentation.

Verify Storage Configuration

Refer to the vSphere Storage and the SAN System Design and Deployment Guide (not specific to vSphere 6, but worth a read) by VMware. This will cover a lot of areas needed for working with a FC/iSCSI SAN environment with vSphere. Also a good understanding of the hardware you are using on the backend (storage arrays, FC switches, networking, etc) and there “vSphere Best Practices” documents will assist in the proper configuration.

Identify Storage I/O Constraints

With the mention “storage constraints” I am assuming they are hinting at I/O throughput or I/O latency issues. I find the quickest and easiest way of measuring/checking this is via esxtop/resxtop. VMware KB 1008205 and Duncan Eppings esxtop blog post covers this is in more detail.

Metrics to be aware of:

Disk Metric Threshold Description
DAVG 25 This is the average response time in milliseconds per command being sent to the device
GAVG 25 This the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG
KAVG 2 This is the amount of time the command spends in the VMKernel

 

The following diagram (provided by VMware) provide a visual representation of the chart above:

Horizon_6_Storage_ESXi

Monitor/Troubleshoot Storage Distributed Resource Scheduler (SDRS)

Refer to Section 6, Troubleshooting Resource Management in vSphere Troubleshooting 6.0 documentation (pages 47 thru 55).

Troubleshoot Common Storage Issues

Refer to Section 7, Troubleshooting Storage in vSphere Troubleshooting 6.0 documentation (pages 55 thru 72). The section covers several storage related issues that you may run into.

%d bloggers like this: