Sunday, 13 May 2012

Corrupted Windows 7 VDI Base Disks with XenDesktop 5.5 MCS


This is perhaps a good example of why not to use iSCSI/FC storage for a large pool of Windows 7 XenDesktops.

The Scenario

In this scenario of around 120 pooled desktops, to get around the issue with SCSI reservation conflicts/locks, the virtual desktops had been spread across three FC VMFS datastores.

Fig. 1: Citrix Desktop Studio → Configuration → Hosts
This configuration means that each datastore holds a copy of the baseDisk which makes up Hard disk 1 or SCSI (0:0) of every pooled desktop on that datastore.

An NFS storage environment is much more scaleable, there is no need for multiple datastores to try to minimize the SCSI reservation issues, and hence no need to have multiple copies of the baseDisk – with NFS there would be just one datastore and one baseDisk. A very rough rule of thumb with iSCSI/FC datastores is a limit of 20 VMDKs per datastore, whereas NFS is unlimited.

Also, XenDesktop – at least in version 5.5 – does not have the logic to load balance VDIs across the datastores. A new catalog of 15 VDIs across 3 datastores will indeed place 5 VDIs per datastore, but subsequently adding 1 more at a time to the catalog will put the VDI in the first datastore each time and not use the other two.

The Problem

It is noticed that VDIs on one of the three datastores are failing to boot, and Windows 7 keeps looping through the Startup Repair dialog.

Fig. 2: Windows 7 Startup Repair → Restart your computer to complete the repairs.
The Resolution

The problem seen above is caused by a corrupted baseDisk – why the corruption happened is another matter.

There are two resolutions:
1) {Recommended} Recreate the catalog from scratch – this will recreate all the baseDisks in every datastore.
2) Copy a known working baseDisk from another datastore, and replace the corrupted baseDisk VMDK with the known good one.

1 comment:

  1. Thanks for this. I'm having an issue with a Xendesktop pilot of only 2 desktops getting stuck in startup repair. They are running on our production EMC NFS export as well. I can cancel the startup repair option and they will boot to windows.

    Any ideas what could be causing this?

    ReplyDelete