VMware Recovery Explained

VMware is a computer virtualisation service that has made waves in the computing world. Since its 1999 introduction, VMware has evolved into a virtualisation system used from an individual, to a multinational enterprise level. Often the whole IT system or service of an organisation is staked on VMware's reliability and backup capabilities. When so much is at stake, it's important to have a recovery plan in place to deal with potential disaster.

As VMware recovery is a very complex procedure, it's easy to become bogged down in confusing acronyms and technical jargon that don't help you prepare your IT systems for data loss or failure. Let’s cut through the jargon and pin point the key terms you need to know about VMware recovery.

VMFS File System Corruption

VMFS stands for Virtual Machine File System, the method by which VMware organises and stores data. VMFS file systems are used by two popular VMware products: ESX server and the flagship VMware Infrastructure.

VMFS is a very flexible file system that doesn't work like a standard hard drive, and is cluster-based. It can extend over many servers and can grow in size without reformatting or loosing data.

In the unlikely event a VMFS file system is corrupted, it's important to have data recovery done by experts. The file system is quite complex and its cluster-based nature means recovery is often not as straight forward as simple HDD recovery.

VMDK Corruption

The VMDK file type may be more familiar than VMFS. The acronym stands for Virtual Machine Disk and is essentially a separate software-based hard disk on top of a physical one. The VMDK file type is used by VMware to store a virtual operating system, and contains all of the files, system settings and programs within that virtual layer.

VMDKs can be corrupted for a few reasons, ranging from software issues on the host computer to physical drive issues. Fortunately VMware recovery services can often recompile VMDK files, or retrieve data from within them.

RAID Failure

RAID (Redundant Array of Inexpensive Disks) failure can cause data loss regardless of it being within a virtualised environment. However, if your virtualised operating system is running off a RAID storage solution then specialised VMware recovery is the way to go. RAID failure generally happens when the RAID controller has a problem with its redundancy patterns. A physical failure of one of the disks in a RAID array may also cause data loss.

Accidentally deleted files

It's not an uncommon problem on any computer system, but when a file has accidentally been deleted within a virtualised operating system traditional data recovery techniques will not always work.

As virtualised systems use the VDMK file type to store all their data, it is sometimes possible to retrieve lost files from the file system without scanning the hard disk.

Our VMware data recovery expertise

It's always a good idea to review your data security and backup systems regularly to circumvent the need for data recovery. However, if problems do occur within your VMware system it's worthwhile contacting Ontrack data recovery, we have an official partnership with VMware and use the lastest possible technology in our range of VMware recovery solutions.

Remote Data Recovery for VMware

Kroll Ontrack is the only data recovery company equipped to perform VMware recoveries remotely in a secure manner. When disaster strikes, a specialist engineer can start work on recovery within the hour and our global team is available to work 24/7 on emergency cases if the data loss affects business operations or the company is facing great financial, operational or reputation losses.

So how does  Remote Data Recovery for VMware work? Following are four real-life case studies of VMware data recovery for Singapore customers ranging from IT hosting companies to mid-size businesses.

Missing VMDK in VMWare on RAID 5

In this case the RAID hardware was restarted when a hardware vendor was troubleshooting an issue on the client’s RAID. This caused an ungraceful shut-down when the VMWare hosts were still running. And when the RAID was brought back online, the VMDK for a critical Virtual Machine was missing. The client contacted Kroll Ontrack to start a Remote Data Recovery session.

Initial inspection showed that some areas of the VMFS file system had become corrupted, including the inode for the required VMDK file. Using proprietary tools, Kroll Ontrack engineers were able to recover and sequence the data fragments from the damaged VMDK file. Further investigation inside this file showed that some additional corruption existed inside the internal EXT (Linux) file system. Thanks to targeted repairs at the Guest File System level, the engineer was able to rebuild the internal volume and recover all available data.

Result: All critical data was recovered intact, with only some mild damage to the Directory Tree.

Accidental deletion of critical Virtual Machine

On a Friday afternoon the customer mistakenly deleted a critical Virtual Machine from a VMFS datastore. A Kroll Ontrack engineer was able to locate and sequence the fragments of data to rebuild the deleted Virtual Disk. Once rebuilt, the internal file system was inspected and found to contain no errors. This virtual disk was recovered both as a bootable VMDK file, as well as extracting the NTFS data to external storage.

Result: A full recovery of data was achieved.

Failed RAID drives containing a VMware Datastore

This case required an in-lab data recovery because the eight-drive RAID 5 contained two mechanically failed drives. The RAID array contained a VMware Datastore hosting approximately 14 Virtual Machines. Prior to contacting Kroll Ontrack, the customer had attempted to rebuild the RAID with no success.

Once we received the drives in our Singapore cleanroom, a total of nine drives were imaged to Kroll Ontrack’s servers. During the process, three of these drives were found to contain I/O errors. The data was mapped and the RAID was rebuilt. At this stage some corruption was discovered as a result of an earlier incorrect rebuild. The engineer was able to identify the best configuration, including rebuilding a degraded drive from the Parity on remaining drives.

Next, some mild VMFS corruption was repaired to allow access to the Virtual Machines. In the following step, seven critical Virtual Disks were examined, several of which were found to contain light corruption. The engineers continued with File System error repairs where possible and data was extracted from all seven critical Virtual Disks, including several SQL databases. Finally, bootable VMDK files were provided for the machines which were found to contain no structure damage.

Result: In spite of the complex nature this was a very successful recovery, with all but a handful of the customer’s data recovered successfully.

Power failure results in lost VM

This unlucky customer “lost” a critical VM after a power failure. The customer connected the Datastore which they believed had held the missing Virtual Machine to Kroll Ontrack's Remote Data Recovery server, however initial inspection showed that the wrong Datastore was being examined. Our engineer  was able to examine some log files contained on the incorrect Datastore to direct the customer to the correct LUN.

Once the correct Datastore was presented, it was discovered that the critical Virtual Disk contained several snapshots that had become disassociated after the VM was powered up without the snapshots in place. As a result, several weeks of data were missing from the Virtual Disk. Kroll Ontrack quickly located the orphaned snapshots and force-merged them to the base file. Next, some mild File System damage was repaired and all resulting data was extracted, including a SQL database.

Result: All critical data was recovered successfully and returned to a very relieved customer.