Converting in-guest iSCSI volumes to native VMDKs

In this guest post, fellow Seattle VMUG member, Pete Koehler (@vmpete) writes about options on transitioning away from in-guest iSCSI attached volumes to native VMDKs.

Over the years I’ve posted about the benefits of using in-guest iSCSI volumes. The need stemmed from a time several years ago in which my environment had very limited tools in the war chest, but needed to take advantage of large volume sizes, good application quiescing using VSS, and squeak out every bit of performance with multi-pathing. As my buddy Jason Langer reminded me often, the thought of in-guest iSCSI volumes sounded a little, well… antiquated.

tweet

Interestingly enough, I couldn’t agree more. While it might have been the virtualization equivalent of wearing acid-washed jeans, they had served a purpose at one time. It also became painfully clear that in other ways, they were an administrative headache that no longer fit in today’s world. I wanted to change, but had limited opportunities to do so.

As times change, so do the options. We are all served well by doing a constant evaluation of design choices. So please welcome 2014. vSphere 5.5 has broken down the 2TB VMDK barrier. The best of the backup products leverage VMware’s APIs to see the data. And, if you wanted to take advantage of a great host acceleration solutions out there, VMware needs to be aware of the volumes. Add that up, and in-guest iSCSI wasn’t going to cut it. This was by no surprise, and the itch to convert all of my guest attached volumes has been around for at least a couple of years. In fact, many had been migrated a while ago. But I’ve noticed quite a few questions have come my way on how I made the transition. Apparently I wasn’t the only one sporting acid washed jeans.

That was the rationale for the change. Now for how to go about changing them. Generally, the most viable options are:

  • Conversion using the VMware converter
  • Conversion by changing connection to an RDM, then convert to VMDK via storage vMotion.
  • Transition data inside of guest to pristine VMDK using rsync (Linux VM).

What option you choose may depend somewhat on your environment. Honestly, I never thought of the RDM/Storage vMotion method until Jason suggested over dinner recently. Chalk one up to sharing ideas with your peers, which was the basis for this post. Below I will outline the steps taken for each of the three methods listed.

Regardless of the method chosen, you will be best served by taking the additional steps necessary to remove the artifacts from the old method of connections. Removing of NICs used for iSCSI connection, as well as any Integration tools (like the Dell EqualLogic Host Integration Toolkit). And finally, remove the old guest volumes from the storage array after they are no longer in use. Also remember to take any precautions necessary before the transitions, such as backing up the data or changes to the VM.

 

Conversion using the VMware converter
This method using the VMware converter tool installed in the VM. It allows the ability to convert the in guest volumes to a native VMDK files. Using this method is very predictable and safe, but depending on the size of the volume, might require a sizable maintenance window as you convert the volumes.

1. Install VMware Converter inside of guest.

2. Make note of all services that touch guest volumes, and shut off, as well as temporarily turning them to “disabled”

3. Launch Converter, and click “Convert Machine” > “This local machine”. Select a destination type of VMware Infrastructure VM, and change name to “[sameVMname]-guest”. Complete by selecting appropriate VMware folder and destination location. You may select only guest volumes necessary, as the other files it creates will be unnecessary

4. Remove newly created “[sameVMname]-guest” VM from inventory, and copy VMDK file(s) from old datastore to new location if necessary.

5. Once complete, disconnect all in-guest iSCSI volumes, remove the Host integration Toolkit, disable iSCSI NICs inside the VM, and power down.

6. Edit the VM properties to disable or remove the iSCSI NICs

7. Attach newly created VMDKs to VM, ideally choosing a device node of anything other than 0:# to improve performance. (e.g. new VMDK might be on 1:0 and an additional VMDK might be on 2:0, etc.)

8. Power on to verify all is running, and new drives are mapped to the correct drive letters or mount points.

9. Re-enable all services set to disabled earlier to their original settings, clear event logs, and reboot

10. Verify access and services are now running correctly.

This method was most commonly used on SQL servers that had smaller volume sizes. As the guest volumes grew, so did the maintenance window.

 

Conversion by changing connection to an RDM, then convert to VMDK via storage vMotion
This method first changes the connection method of the in guest volume to an RDM, then converts it to a native VMDK after a storage vMotion. A very predictable and safe method as well, but offers the additional benefit that any maintenance window needed for conversion is NOT based on the size of the volume. It’s maintenance windows is only for the time in which you briefly power down the VM.

1. Make note of all services that touch guest volumes, and shut off, as well as temporarily turning them to “disabled”

2. Once complete, disconnect all in-guest iSCSI volumes, remove the Host integration Toolkit, disable iSCSI NICs inside the VM, and power down.

3. On the storage system present the iSCSI disk to all ESXi hosts

4. Scan the host so they see the disk

5. Add an RDM (Virtual Mode) disk to the VM and pointing it to the newly host mounted iSCSI disk

6. Power on the VM, verify the RDM mounted and apps and/or data is present.

7. Re-enable all services set to disabled earlier to their original settings.

8. Storage vMotion the VM, making sure to you go into the “Advanced” settings.

9. Move the c: VMDK to a LUN and move the RDM to a VMFS LUN (Then change the disk format from “Same” to Thick, Thin, or Thick Eager Zero on the RDM disk). Once the storage vMotion is complete the RDM should now be migrated to a VMDK.

10. Unmount the previous mounted iSCSI volume from the ESXi hosts and verify access and services are now running correctly.

The nice thing about this method is that the VM is up and in production while the storage vMotion happens. It also catches all of the changes during the move.

 

Transition data inside of guest to pristine VMDK using rsync
This method is for Linux VMs, whereby one creates a pristine VMDK, then transfer the data inside the guest via rsync. This process can take some time to seed the new volume, but it is essentially a background process for the VM. The actual cut is typically just a changing of /etc/fstab and a restart. It can use additional resources, but in certain circumstances may be a good fit.

1. Create desired VMDKs for the VM, ideally choosing a device node of anything other than 0:# to improve performance. (e.g. new VMDK might be on 1:0 and an additional VMDK might be on 2:0)

2. Inside the guest, create the new partition using parted, or gparted, then format using mkfs.

3. Create the device mount locations, and then add entries in /etc/fstab.

4. Restart and validate that volume is mounting properly.

5. Begin the rsync process from the old location to the new location. Syntax will look something like rsync -av –delete –bwlimit=7500 root@[systemname]:/oldpath/todata /newpath/todata/

6. Once complete, redirect any symbolic links to the new location, and adjust mount points in /etc/fstab.

7. Restart to test and validate. Verify access and services are now running correctly.

8. Remove connections to old guest volumes, and clean up VM by disabling or removing iSCSI based NICs, etc.

This method allowed for some restructuring of data on some extremely large volumes, something that the Development team wanted to do anyway. It allowed IT to delegate the rsync processes off to the teams handling the data change, so that the actual cutover could be fit into their schedules.

 

The results
While I’m not completely finished with the conversion (a few more multi-terabyte volumes to go), the process of simplifying the environment has been very rewarding. Seeing these large or I/O sensitive data volumes take advantage of I/O acceleration has been great. Simplifying the protection our mission critical VMs was even more rewarding.

- Pete

 

Comments

  1. Ed Swindelles says:

    Thanks Pete. Do you have any experience of doing such a conversion when the in-guest volume is involved in Microsoft Cluster services? I have several Windows cluster (file servers, print servers, SQL servers, etc.) that use in-guest iSCSI with shared LUNs. So, for a conversion to RDMs, they would have to be physical mode RDMs. I tried this in a test environment and had a few problems. Have you ever tried it?

  2. Hi Ed,

    Thanks for reading. I do not have any experience with how it would react under MSCS scenarios. If I get a chance to experiment, I will certainly provide an update. Feel free to share your experiences as well if you make some headway on it.

    - Pete

  3. John Howell says:

    I have also traditionally used in-guest ISCSI connections ever since we got our first Equallogic unit around 2007. I recently bought Veeam and have started using it for backups. I especially like that it allows me to backup the VM guest and recover it should the whole thing just crater. I am still in the process of transitioning to VMDK files and several of my servers (SQL and File) are still using in-guest ISCSI connections. After just recently rebuilding our Exchange 2010 servers and moving to VMDK at the same time, I quickly learned that using Equallogic’s HIT kit to snapshot the Exchange environment doesn’t work anymore. The Veeam backup for our two on-premise Exchange servers takes about 8 hours to backup around 4.4 TB to another Equallogic 6100E unit; Equallogic snapshots were instant. I have been very disappointed in the amount of time it takes to back up the Exchange VM guests that use the VMDK with Veeam method as compared to the old Equallogic snapshot process.

    Also, I have not yet had to restore anyone’s mailbox or lost items and am dreading the first time it happens due to fears of it taking hours under the VMDK/Veeam backup model. Previously, I could restore Exchange mailboxes using the Equallogic HIT kit very quickly. All in all, I am afraid to move forward with migrating my SQL and other File servers to a VMDK model because Veeam seems to take too long to run backups. In regards to the File servers, I take hourly snapshots using the Equallogic HIT kit and can quickly recover overwritten or deleted files and am concerned about the amount of time to recover using Veamm from a VMDK file.

    I don’t want to get stuck in the past using an antiquated ISCSI method of connecting my SAN volumes to my VMs, but it just seems that losing the ability to perform snapshots (VSS aware snapshots especially on Exchange and SQL) is a lot to give up. I suppose I could look at running a snapshot directly against the volumes that hold the VMDK files, but it could be a lot of work to try and mount those and present to my VM guests; but perhaps it is worth a look.

    Any comments or suggestions are welcome.

    -John

  4. Hi John,

    Thanks for your question. Yes, one of the challenges with the HIT kit was always the indelible tie between the guest OS being used, HIT version, and firmware version on the array controllers. All too often one of these ingredients can cause problems, which really inhibits any sort of protection strategy using these tools. Unfortunately I saw that happen myself all too often.

    You state in your environment that the HIT snaps the Exchange volumes to another EqualLogic unit and were instant. Are you using replication to achieve that? Because if it is a local snap, it is limited to the arrays that are in your pool, and is not isolated to a single array (it spans across all of them) thus your protection target lives in the same failure domain as your production data. It’s good not to blend the two protection strategies together when comparing. SAN array snapshots like EqualLogic use a re-allocate on write, so there is no second copy of the data, and most certainly not outside of the failure domain of the arrays. With replicas, there are, but that takes transmission time, and you got it, more arrays.

    Also, because of the large 15MB block sizes use by the EqualLogic arrays, and hourly snaps occurring, I’m assuming then you had an exorbitant amount of snapshot reserve for these volumes in order to protect for more than just hours at a time? My personal experience was exactly that. It was nearly impossible to have meet an RPO because of the change rate versus the large block sizes. I’ve seen other environments with guest attached volumes where the snapshot reserve was set to 500% or more just to protect data for more than 4 or 5 days.

    4.4TB is not a trivial amount to protect, no matter how you look at it. While you don’t give specifics about your Exchange environment, there is a design element that comes into play here. The protection target should be outside of the failure domain of the arrays, so when one looks at running a Veeam backup job, one really needs to factor in that consideration. (otherwise it is just not a fair comparison). Large in-guest volumes can sometimes mask those types of matters. It might be interesting to explore how you could make changes to improve that. I know that in my current environment, I have two Exchange 2013 servers that Veeam protects to a separate storage target in 10 minutes for daily backups, and of course less time courtesy of CBT if that was at a higher frequency.

    I would not want to encourage anyone to not test out their recovery scenarios in any situation, so rather than use this as a deterrent, perhaps this might be the first thing to do. It’s best to have that recovery run-book ready to go before it really counts, regardless of the solution used.

    With regards to VSS, that was one of the early selling points of in-guest protection. ASM/ME was quite good for it’s time in real application layer quiescing. This was at a time in which VMware’s capabilities, the supported OSs, and the backup applications designed for a virtualized environment were very limited. I even posted about the ease of recovering a mailbox via this method at: http://vmpete.com/2010/07/28/restoring-an-exchange-2007-mailbox-using-equallogics-asmme/ in 2010. But all of that has changed. Veeam is very capable of making application consistent backups that leverage VSS writers (Exchange, SQL, File). Again, I encourage you to experiment with this. The Veeam Explorer for Exchange shows how you have every bit of the granular recovery abilities (and much more) that one had in a mailbox restore using ASM/ME

    SAN array based snapshotting can still has it’s place, but there are real caveats. In-guest volumes try to work around some of those caveats to provide value-add benefits, but in-turn create many more operational challenges. Protection strategies grow exponentially more difficult when it comes to protecting in-guest volumes with something other than just array based snapshotting or replication. I literally feel this benefit almost daily, as my own environment has become so much easier to manage and protect after the elimination of guest attached volumes.

    Thanks for reading, and I hope this gives you some ideas. Remember, you can find me at http://vmpete.com

    - Pete

Speak Your Mind

*