The Unofficial Official VCP6-DCV Study Guide

vmworld2015

We are just a few short days away from one of my favorite weeks of the year, VMworld! Well just like a few years back, Josh Coen (blog / twitter) and I have teamed up with our good friends at Veeam Software (website / twitter) to release the latest version of “The Unofficial Official VCP6-DCV Study Guide”. With the short turn around time  Josh and I had to complete the study guide we have our fingers crossed that hard copies of the guide will be available next week (watch Twitter for updates) at the Veeam booth. For those who can’t wait, clink on the cover below to download an electronic copy.

Hope to see you at VMworld and happy Studying!

-Jason

Cover

VCP6–Objective 7.5–Troubleshoot HA and DRS Configurations and Fault Tolerance

For this objective I used the following resources:

  • vSphere Availability documentation
  • vSphere Resource Management documentation
  • vCenter Server and Host Management documentation
  • vSphere Troubleshooting documentation

Objective 7.5 – Troubleshoot HA and DRS Configurations and Fault Tolerance

Knowledge

Identify HA/DRS and vMotion Requirements

HA Requirements

  • All hosts must be licensed for vSphere HA
  • You need at least two hosts in the cluster
  • All hosts need to be configured with static IP addresses. If you are using DHCP, you must ensure that the address for each hosts persists across reboots
  • There should be at least on management network in common among all hosts and best practices is to have at least two. Management networks differ depending on the version of host you are using.
    • To ensure that any virtual machine can run on any host in the cluster, all hosts should have access to the same virtual machine networks and datastores
  • For VM Monitoring to work, VMware tools must be installed
  • vSphere HA supports both IPv4 and IPv6. A cluster that mixes the use of both of the protocol versions, however is more likely to result in a network partition

For further information see page 32 of the vSphere Availability documentation

DRS Requirements

  • Shared Storage
    • Storage can be either SAN or NAS
  • Shared VMFS volumes
    • Place the disks of all virtual machines on VMFS volumes that are accessible by all hosts
    • Set access mode for the shared VMFS to public
    • Ensure the VMFS volumes on source and destination host use volume names, and all virtual machines use those volume names for specifying the virtual disks
  • Processor Compatibility – Processors of both the source and destination host must be of the same vendor (AMD or Intel) and be of the same processor family. This requirement is more for the use of vMotion and allowing a VM to execute its processes from one host to the other. vCenter provides advanced features to make sure that processor compatibility requirements are met:
    • Enhanced vMotion Compatibility (EVC) – You can use EVC to help ensure vMotion compatibility for the hosts in a cluster. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. This prevents migration with vMotion from failing due to incompatible CPUs.
    • CPU Compatibility Masks – vCenter Server compares the CPU features available to a virtual machine with the CPU features of the destination host to determine whether to allow or disallow migrations with vMotion. By applying CPU compatibility mask to individual virtual machines, you can hide certain CPU features from the virtual machine and potentially prevent migrations with vMotion from failing due to incompatible CPUs.

For further information see pages 63 thru 64 of the vSphere Resource Management documentation

vMotion Requirements

  • The virtual machine configuration file for ESXi hosts must reside on a VMware Virtual Machine File System (VMFS)
  • vMotion does not support raw disks or migration of applications clustered using Microsoft Cluster Service (MSCS)
  • vMotion requires a private Gigabit Ethernet (minimum) migration network between all of the vMotion enabled managed hosts. When vMotion is enabled on a managed host, configure a unique network identity object for the managed host and connect it to the private migration network
  • You cannot use migration with vMotion to migrate a virtual machine that uses a virtual device backed by a device that is not accessible on the destination host
  • You cannot use migration with vMotion to migrate a virtual machine that uses a virtual device backed by a device on the client computer

For further information see page 56 of the vSphere Resource Management documentation and pages 123 thru 124 of the vCenter Server and Host Management documentation

Verify vMotion/Storage vMotion Configuration

See above sections for DRS and vMotion requirements. Key areas of focus will be proper networking (VMKernel interface for vMotion), CPU compatibility and shared storage access across all hosts.

Verify HA Network Configuration

  • On legcacy ESX hosts in the cluster, vSphere HA communications travel over all networks that are designated as service console networks. VMkernel networks are not used by these hosts for vSphere HA communications
  • On ESXi hosts in the cluster, vSphere HA communications, by default, travel over VMkernel networks, except those marked for use with vMotion. If there is only one VMkernel network, vSphere HA shares it with vMotion, if necessary. With ESXi 4.x and ESXi, you must also explicitly enable the Management Network checkbox for vSphere HA to use this network

For further information see page 40 of the vSphere Availability documentation

Verify HA/DRS Cluster Configuration

Configuration issues and other errors can occur for your cluster or its hosts that adversely affect the proper operation of vSphere HA. You can monitor these errors by looking at the Cluster Operational Status and Configuration Issues screens, which are accessible in the vSphere Client from the vSphere HA section of the cluster’s Summary tab.

For further information see page 30 of the vSphere Availability documentation

Troubleshoot HA Capacity Issues

To troubleshoot HA capacity issues first be familiar with the three Admission Control Policies:

  • Host failures the cluster tolerates (default) – You can configure vSphere HA to tolerate a specified number of host failures. Uses a “slot” size to display cluster capacity
  • Percentage of cluster resources reserved as failover spare capacity – You can configure vSphere HA to perform admission control by reserving a specific percentage of cluster CPU and memory resources for recovery from host failure
  • Specify failover hosts – You can configure vSphere HA to designate specific hosts as the failover hosts
    Things to look out for when troubleshooting HA issues:

  • Failed or disconnected hosts
  • Over sized VM’s with high CPU/memory reservations. This will affect slot sizes
  • Lack of capacity/resources if you using “Specify Failover Hosts”, IE not enough hosts set as failovers

See Section 5 – Troubleshooting Availablity in the vSphere Troubleshooting documentation that outlines common failover scenarios for each of the three Admission Control Policies. For further reading on the three admission control policies see page 22 thru 28 of the vSphere Availability documentation.

Troubleshoot HA Redundancy Issues

Like all other components in a vSphere design, you want design redundancy for a clusters HA network traffic. You can go about this one of two ways or both. The use of NIC of teaming (two physical NICs preferably connected to separate physical switches) is the most common method used. This will allow either of the two links to fail and still be able to communicate on the the network. The second option is the setup and creation of a secondary management network. This second interface will need to be attached to a different virtual switch as well as a different subnet as the primary network. This will allow for HA traffic to be communicated over both networks.

Interpret the DRS Resource Distribution Graph and Target/Current Host Load Deviation

The DRS Resource Distribution Chart is used to display both memory and CPU metrics for each host in the cluster. Each resource can be displayed in either a percentage or as a size in mega bytes for memory or mega hertz for CPU. In the chart display each box/section represents a VM running on that host and the resources it is currently consuming. The chart is accessed from the Summary tab at the cluster level under the section for VMware DRS. Click the hyperlink for View Resource Distribution Chart.

The target/current host load deviation is a representation of the balance of resources across the hosts in your cluster. The DRS process runs every 5 minutes and analyzes resource metrics on each host across the cluster. Those metrics are plugged in an equeation:

(VM entitlements)/(Host Capacity)

This value returned is what determines the “Current host load standard deviation”. If this number is higher then the “Target host load standard deviation” your cluster is imbalanced and DRS will make recommendations on which VM’s to migrate to re-balance the cluster.

This is just my basic understanding of how DRS works. For complete down in the weeds explanations I would recommend reading this post as well as this one from Duncan Epping @ Yellow-Bricks.com.

Troubleshoot DRS Load Imbalance Issues

DRS clusters become imbalanced/overcomitted for several reasons:

  • A cluster might become overcommitted if a host fails
  • A cluster becomes invalid if vCenter Server is unavailable and you power on virtual machines using a vSphere Client connected directly to a host
  • A cluster becomes invalid if the user reduces the reservation on a parent resource pool while a virtual machine is in the process of failing over
  • If changes are made to hosts or virtual machines using a vSphere Client connected to a host while vCenter Server is unavailable, those changes take effect. When vCenter Server becomes available again, you might find that clusters have turned red or yellow because cluster requirements are not longer met.

Troubleshoot vMotion/Storage vMotion Migration Issues

For vMotion refer to section above for DRS and vMotion requirements. Make sure all requirements are being met.

For Storage vMotion be aware of the following requirements and limitations

  • Virtual machine disks must be in persistent mode or be raw device mappings (RDMs). For virtual compatibility mode RDMs, you can migrate the mapping file or convert to thick-provisioned or thin-provisioned disks during migration as long as the destination is not an NFS datastore. If you convert the mapping file, a new virtual disk is created and the contents of the mapped LUN are copied to this disk. For physical compatibility mode RDMs, you can migrate the mapping file only.
  • Migration of virtual machines during VMware Tool installation is not supported
  • The host on which the virtual machine is running must have a license that includes Storage vMotion
  • The host on which the virtual machines is running must have access to both the source and target datastore

Interpret vMotion Resource Maps

vMotion resource maps provide a visual representation of hosts, datastores, and networks associated with the selected virtual machine.

vMotion resource maps also indicate which hosts in the virtual machine’s cluster or datacenter are compatible, it must meet the following criteria:

  • Connect to all the same datastores as the virtual machine
  • Connect to all the same networks as the virtual machine
  • Have compatible software with the virtual machine
  • Have a compatible CPU with the virtual machine

Identify the Root Cause of a DRS/HA Cluster or Migration Issue Based on Troubleshooting Information

Use information from above topics to help isolate the issue based on HA/DRS requirements as well pages from the reference documents listed.

Verify Fault Tolerance Configuration

Identify Fault Tolerance Requirements

When VMware Fault Tolerance was originally announced back in the ESXi/ESX 4.x days it received a lukewarm reception. While the concept of protecting tier 1 workloads with a synchronous/shadow VM, the requirment of supporting a single vCPU virtual machine limited the use case of the feature. In vSphere 6 VMware has lifted the vCPU limitation from 1 vCPU to up 4 vCPU (based on licensing). With this increase I would assume this feature will now be leveraged in environments.

Beyond the increase of support for multi processor, there are other requirements/features you should know for the exam:

  • Physical CPU’s must be compatible with vSphere vMotion or Enhanced vMotion Compatibility (EVC)
  • Physical CPU’s must support hardware MMW virtualization (Intel EPT or AMD RVI
  • Use a dedicated 10GB network for FT logging
  • vSphere Standard and Enterprise allows up to 2 vCPU’s for FT
  • vSphere Enterprise Plus allows ups to 4 vCPU’s for FT

While FT provides a higher level of availability, there are a few features that are NOT supported if a VM is protected via Fault Tolerance:

  • Virtual machine snapshots
  • Storage vMotion
  • Linked Clones
  • Virtual SAN (VSAN)
  • VM Component Protection (VMCP)
  • Virtual Volume datastores
  • Storage-based policy management
  • I/O filters

VCP-6–Objective 7.1 Troubleshoot vCenter Server, ESXi Hosts, and Virtual Machines

For this objective I used the following resources:

Objective 7.1 – Troubleshoot vCenter Server, ESXi Hosts, and Virtual Machines

Knowledge

Identify General ESXi Host Troubleshooting Guidelines

The vSphere Troubleshooting guide is the one stop shop for this section

Identify General vCenter Troubleshooting Guidelines

The vSphere Troubleshooting guide is the one stop shop for this section

Troubleshoot Common Installation Issues

Refer to Objective 1.3 and make sure your hosts meet the hardware requirements as well as the VMware HCL. If using AutoDeploy refer to pages 20 thru 26 of the vSphere Troubleshooting guide and also VMware KB 2000988 (Troubleshooting vSphere Auto Deploy).

Monitor ESXi System Health

With the release of ESXi back in the VI 3.5 days it provided a new way to manage your hosts, the Common Information Model (CIM). CIM allows for a standard framework to manage computing resources and presents this information via the vSphere Client. For further information read the VMware White Paper “The Architecture of VMware ESXi” as well as this VMware Support Insider blog post.

Locate and Analyze vCenter and ESXi Logs

ESXi Log Files and Locations

Log Description
/var/log/auth.log ESXi Shell authentication success and failure
/var/log/dhclient.log DHCP client service, including discovery, address lease requests and renewals
/var/log/esxupdate.log ESXi patch and update installation logs
/var/log/lacp.log Link Aggregation Control Protocol logs
/var/log/hostd.log Host management service logs, including virtual machine and host Task and Events, communication with the vSphere Client and vCenter Server vpxa agent, and SDK connections
/var/log/hostd-probe.log Host management service responsiveness checker
/var/log/rhttproxy.log HTTP connections proxied on behalf of other ESXi host webservices
/var/log/shell.log ESXi Shell usage logs, including enable/disable and every command entered
/var/logsysboot.log Early VMkernel startup and module loading
/var/log/boot.gz A compressed file that contains boot log information
/var/log/syslog.log Management service initialization, watchdogs, scheduled tasks and DCUI use
/var/log/usb.log USB device arbitration events, such as discovery and pass-through to virtual machines
/var/log/vobd.log VMkernel Observation events
/var/log/vmkernel.log Core VMkernel logs, including device discovery, storage and networking device and driver events, and virtual machine startup
/var/log/vmkwarning.log A summary of Warning and Alert log messages excerpted from the VMkernel logs
/var/log/vmksummary.log A summary of ESXi host startup and shutdown, and an hourly heartbeat with uptime, number of virtual machines running, and service resource consumption
/var/log/Xorg.log Vide acceleration

 

vCenter Log Files and Locations

vCenter running on windows the log files will be located in C:\ProgramData\VMware\VMware VirtualCenter\Logs

vCenter running on virtual appliance the log files will be located in /var/log/vmware/vpx

Log Description
vpxd.log The main vCenter Server log, consisting of all vSphere Client and WebServices connections, internal tasks and events, and communication with the vCenter Server Agent (vpxa) on managed ESXi/ESX hosts.
vpxd-profiler.log Profiled metrics for operations performed in vCenter Server. Used by the VPX Operational Dashboard (VOD) accessible at https://VCHostnameOrIPAddress/vod/index.html.
vpxd-alert.log Non-fatal information logged about the vpxd process.
cim-diag.log and vws.log Common Information Model monitoring information, including communication between vCenter Server and managed hosts’ CIM interface.
drmdump ctions proposed and taken by VMware Distributed Resource Scheduler (DRS), grouped by the DRS-enabled cluster managed by vCenter Server. These logs are compressed.
ls.log Health reports for the Licensing Services extension, connectivity logs to vCenter Server.
vimtool.log Dump of string used during the installation of vCenter Server with hashed information for DNS, username and output for JDBC creation.
stats.log Provides information about the historical performance data collection from the ESXi/ESX hosts
sms.log Health reports for the Storage Monitoring Service extension, connectivity logs to vCenter Server, the vCenter Server database and the xDB for vCenter Inventory Service
eam.log Health reports for the ESX Agent Monitor extension, connectivity logs to vCenter Server
catalina.date.log Connectivity information and status of the VMware Webmanagement Services.
jointool.log Health status of the VMwareVCMSDS service and individual ADAM database objects, internal tasks and events, and replication logs between linked-mode vCenter Servers

 

Export Diagnostic Information

Covered in Objective 7.3 – Troubleshoot vSphere upgrades, located HERE. But for reference read VMware KB Article 653 – Collecting Diagnostic Information for VMware ESX/ESXi

Identify Common Command Line Interface (CLI) Commands

Here is a list of command that I use on a daily basis:

  • esxtop – used for real time performance monitoring and troubleshooting
  • vmkping – Works like a ping command but allows for sending traffic out a specific vmkernel  interface
  • esxcli network name space – Used for monitoring or configuring ESXi networking
  • esxcli storage name space – Used for monitoring or configuring ESXi storage
  • vmkfstools – Allows for the management of VMFS volumes and virtual disks from the command line

Troubleshoot Common Virtual Machine Issues

Identify/Troubleshoot Virtual Machines Various States (e.g, Orphaned, Unkown, etc)

For these  two sections refer to Section 2 of the vSphere Troubleshooting documentation. This section covers the following topics:

  • Troubleshooting Fault Tolerant Virtual Machines
  • Troubleshooting USB Passthrough Devices
  • Recover Orphaned Virtual Machines
  • Virtual Machine Does Not Power On After Cloning or Deploying From Template

Troubleshoot Virtual Machine Resource Contention Issues

Identify Virtual Machine Constraints

For these two sections review the following VMware KB articles:

Identify Fault Tolerant Network Latency Issues

Fault Tolerance requirements are covering in Objective 7.5 – Troubleshoot HA and DRS Configurations and Fault Tolerance. For the latency portion remember the following:

  • Use a dedicated 10-Gbit logging network for Fault Tolerance traffic
  • Use the vmkping command to verify low sub-millisecond network latency

Troubleshoot VMware Tools Installation Issues

Have a look at VMware KB Article 1003908 – Troubleshooting a Failed VMware Tools Installation in a Guest Operating System

Identify the Root Cause of a Storage Issue Based on Troubleshooting Information

The vSphere Troubleshooting document covers several issues that you may run into. See Pages 45 thru 51.

Identify Common Virtual Machine Boot Disk Errors

Have a look at VMware KB Article 1003999 – Identifying Critical Guest OS Failures Within Virtual Machines.

VCP-6 Objective 7.3–Troubleshoot vSphere Upgrades

For this objective I used the following resources:

Objective 7.3 – Troubleshoot vSphere Upgrades

Knowledge

Identify vCenter Server and vCenter Server Appliance Upgrade Issues

For this section I am going to take the easy way out. Refer to Section 12 of the vSphere Troubleshooting  documentation. This section covers the following topics:

  • Collecting Logs for Troubleshooting a vCenter Server Installation or Upgrade
  • Collect Logs to Troubleshoot ESXi Hosts
  • Errors and Warnings Returned by the Installation and Upgrade Precheck Script
  • Restore vCenter Server Services if Upgrade Fails
  • VMware Component Manager Error During Startup After vCenter Server Appliance Upgrade
  • Microsoft SQL Database Set to Unsupported Compatibility Mode Causes vCenter Server Installation or Upgrade to Fail

Create a Log Bundle

Locate/Analyze VMware Log Bundles

There are multiple ways to get at this information, but I will assume the exam is going to be geared more towards using the vSphere Web Client for this task.

Using vSphere Web Client

Have a look over at VMware KB Article 2032892, Collecting Diagnostic Information for ESX/ESXi Hosts and vCenter Server Using the vSphere Web Client.

      • Log into the vSphere Web Client with administrative privileges
      • Under Inventory Lists, select vCenter Servers
      • Click the vCenter Server that contains the ESX/ESXi hosts you
      • wish to export logs from
      • Select the Monitor tab in right hand navigation screen and choose System Logs
      • Click the Export Systems Logs
      • Select the ESX/ESXi hosts you wish to export logs from
      • Optionally, select the Include vCenter Server and vSphere Web Client Logs.
      • Click Next
      • Select the type of Log Data to be exported
      • Optionally, select to Gather Performance Data
      • When ready click the Generate Log Bundle
      • Once the log bundle is generated, click Download Log Bundle
      • Select a location and click Save

 

Log_Bundle

For additional diagnostic and log collection (either virtual appliance, ESXi hosts, or Windows vCenter) have a look at the following VMware KB articles:

Identify Alternative Methods to Upgrade ESXi Hosts in Event of Failure

For this section not really sure what VMware is after with the “Event of Failure” piece. I am going to tackle this from the perspective of outlining the supported methods of upgrading an ESXi hosts. My guess this will give the baseline knowledge for the exam that you will need.

  • vSphere Update Manager – For me this is my favorite of the options. You should already have VUM installed in your environment so the only work that really needs to be done is importing the ESXi 6.0 ISO into the repository and creating an Upgrade baseline. Super easy.
  • Upgrade via ESXi Installer (ISO on USB/CD/DVD) – In a small enough environment you might just be able to create a boot image from the ESXi 6.0 ISO and place it on a CD/DVD/USB device and boot the ESXi host from it. This would be labeled as an “interactive” upgrade. You will have to provide some inputs to complete the upgrade
  • Perform Scripted Upgrade – I myself haven’t used nor seen a lot of scripted upgrades in the field. It is supported and could be a faster deployment method to multiple hosts over VUM.
  • vSphere Auto Deploy – Using Auto Deploy you can reprovision the host and reboot it with a new image profile. This profile would include the ESXi upgrade to 6.x. You will need to leverage vSphere Image Builder to build the package
  • esxcli – You can use the esxcli command-line utility to upgrade hosts to ESXi 6.x

Configure vCenter Logging Options

  • Log into the vSphere Web Client with administrative privileges
  • Under Resources, select vCenter Servers
  • Click the vCenter Server to update the level of logging
  • Select the Settings tab in right hand navigation screen and choose General
  • From the General  tab click Edit
  • The Edit vCenter Server Settings dialog will be displayed. Select Logging Settings
  • Select the level of logging from the Logging Options dropdown.
  • Click OK when finished

Logging_Options

The available options are:

Option Description
None (Disable Logging) Turns off logging
Error (Errors Only) Displays only error log entries
Warning (Errors and Warnings Displays warning and error log entries
Info (Normal Logging – Default) Displays information, error, and warning log entries
Verbose (Verbose) Displays information, error, warning, and verbose log entries
Trivia (Extended Verbose) Displays information, error, warning, verbose, and trivia log entries