VAAI, Is This Thing On??

Late in the summer we purchased a new NetApp FAS2040 storage array to replace an out going disk based backup target. Since we have ran into a few delays on moving our production vSphere environment to the unit I thought I would test out some of the capabilities it has to offer and more specifically VAAI. So last week I carved out a few FC LUNS and presented them to our ESXi 5 hosts just to do some testing and playing before going “live”.

Just for some quick background, VAAI or vStorage APIs for Array Integration was first introduced with ESX/ESXi 4.1. It allows for offloading of certain VM and storage functions that would typically take place on the ESX host to the storage array. The benefit being faster completion of these tasks as well as saving compute resources on your ESX hosts. The following list are the offloading features VAAI provides:

Atomic Test & Set – Also known as Hardware Assisted Locking – Allows for granular locking of files on a VMFS volume

Cloning Blocks – Also known as  Full Copy/Extended Copy – Allows the array to make a copy of data (Think cloning and Storage vMotion)

Zero Blocks – Also know as Block Zeroing – Allows the array to zero out blocks (Think creating new VMDKs)

After confirming on the VMware HCL the array matched all requirements it was time to get down to business. After presenting the LUNs to a test host I quickly provisioned a Windows VM on one LUN to be used for the Storage vMotion as well as the clone testing. Also while in vCenter I confirmed with the NetApp Virtual Storage Console that it detected the array and confirmed that it was VAAI capable:

VAAI_Enabled

Next step was to fire up ESXTOP, go to the devices screen (u) and then deselect/select appropriate fields (removed B, F, G, I, and added O):

ESXTOP_Fields

This will present the VAAI Stats counters in the ESXTOP Session. For a full run down of each counter refer to VMware Document 11812, Interpreting esxtop 4.1 statistics. For my testing I was only concerned with CLONE_RD, CLONE_WR, and CLONE_F.

After setting up ESXTOP I switched back over to my VI Client and issued a storage vMotion of my test VM between the new LUNs. I then went back to ESXTOP expecting see the counters climbing as the VM’s storage was moved. However, I got nothing:

VAAI_Not_Working

Hmm, what’s up with that? So I did some searching and came up with two very helpful items. First a NetApp Technical Report (TR-3886 Understanding and Using vStorage APIs for Array Integration and NetApp Storage) and second VMware KB article 1021976 (vStorage APIs for Array Integration FAQ). Both documents provided excellent information on VAAI as well as few things to check to make sure you have a supported configuration.

I also searched the NetApp forums to see if anyone else had seen a similar issue. Came up empty, so I decided to put a post out there and also notified the twitter-verse looking for additional help. Received a reply from a NetApp employee named Rodrigo Nascimento and below is a culmination of steps outlined in the above documentation as well as things to look for Rodrigo had posted.

First off was to check to see if the new SCSI devices are showing as support for VAAI. This is accomplished either by checking the device in the Storage view of your ESX host or running an esxcli command “esxcli storage core device list | egrep “Display Name:|VAAI Status:””

VAAI_Enabled

VAAI_Supported

As you can see from the above screen shots all seems to be good from that standpoint. I jumped back to the VMware HCL again to double check the requirements. Paying a little closer attention I saw the following:

VMware_HCL

This got me thinking so I double checked the loaded modules on the ESX host with the following command looking for the vmw_vaaip_netapp module, “esxcli system module list| grep vmw_vaaip_netapp”. The results came up empty.

vaaip_netapp_empty

Not surprising after I look back on it as I just recently presented these LUNs to the host and had not being using NetApp storage before hand, so I quickly put the host in maintenance mode and rebooted.

After rebooting I connected back into the system and re-ran the “esxcli system module list| grep vmw_vaaip_netapp” command:

vaaip_netapp_populated

Bingo! Module is loaded and should be good to go. Fired up ESXTOP again and started my Storage vMotion test. Again nothing.

VAAI_Not_Working

I went back to the previous mentioned documents as well as my post on the NetApp forum. At this time Rodrigo mentioned to verify if VAAI is enabled on the ESX host. This is quickly done by checking the following Advanced Settings are set to 1 (default) under Configuration -> Software:

DataMover.HardwareAccelerateMove
DataMover.HardwareAcceleratedInit
VMFS3.HardwareAcceleratedLocking

And from the console this completed by verifying that Int Value is set to 1:

esxcli system settings advanced list -o /DataMover/HardwareAcceleratedMove
esxcli system settings advanced list -o /DataMover/HardwareAcceleratedInit
esxcli system settings advanced list -o /VMFS3/HardwareAcceleratedLocking

Low and behold on our test ESX system these were set to 0, thus disabling VAAI. I reset them back to 1 to enable VAAI at the host level and rebooted the system (not sure if it was needed, but why not). After the reboot I again connected to the console and fired up ESXTOP and restarted the Storage vMotion test:

VAAI_Working

As you can see from the screen shot, finally success! The CLONE_RD and CLONE_WR as well as the MBC_RD/s and MBC_WR/s metrics are displaying activity.

Even though VAAI is  and should be enabled by default on an ESX host researching the issue and reading these documents as well as other posts on the web definitely gave me a better understanding of how it works.