Friday, 9 November 2012

VMware SRM Error ‘VMX cannot be found’ and EMC CLARiiON AdminFractured Mirrors

The Problem

Running a Test failover in VMware Site Recovery Manager, revealed the following error:

Error - Virtual machine file '[snap-XXX] XXX/XXX.vmx' cannot be found on recovered datastore.

Analysis of the mounted snapshot in the DR site revealed that indeed the VMX was not there. The virtual machine in question, had only been moved into the datastore a few days previously, so the conclusion was that the replication is not running every 15 minutes as it should have been.

Logging into the Navisphere Web UI revealed that the Secondary Image of some of the Remote Mirrors had become AdminFractured.

Fig. 1: AdminFractured Secondary Image

Right-clicking on one of the affected mirrors > selecting Properties > and navigating to the ‘Secondary Image’ tab, revealed the following Last Image Error:

Unable to create protective snap session on secondary array. Snapview returned an error on an attempt to create a protective snap session due to lack of free cache LUN's on secondary array. As a result of the error the mirror will be admin fractured. Add more cache LUN's on the secondary array and retry the sync request. (0x7152863e)

Fig. 2: Administratively fractured Secondary Image 'Last Image Error'

The Culprit

The error above refers to free cache LUN’s and these are found in the ‘Reserved LUN Pool’ on the secondary array. Navigating to the ‘Reserved LUN Pool’ or RLP via the Navisphere GUI > right-click and select Properties - revealed a misconfiguration.

Fig. 3: Reserved LUN Pool

Fig. 4: Reserved LUN Pool Properties

The reserved LUN pool is recommended to be 20% of the size of all the LUNs added up (it doesn’t have to be and if have only a few LUNs and/or few changes, then can configure it lower.) What had happened is that due to an ISP failure, the replication that was working, and was originally setup locally, had failed for a long time, and so – in order to re-replicate the data after the replication link was restored – additional storage was added to the reserved LUN pool. Unfortunately, the storage that was added was added in one big chunk (one big LUN – notice the 917.149 GB LUN in the image above.) The reserved LUN pool needs to be made up of small chunk sized LUNs, so that when replicating multiple LUNs, different mirrors can utilise different chunks – if you have one big reserved LUN then this can only be used by one replicating datastore.

Recommended size for a reserved pool LUN is found by:

i. Calculate 20% of the size of all the LUNs added up
ii. Divide this value by a number less than the ‘Maximum number of reserved LUNs’ for the array

You then create your pool of reserved LUNs of that size and number.

Fig. 5: Maximum number of reserved LUNs on CLARiiON AX/CX arrays

The Fix

The solution is easily achieved by creating more LUNs, then right-clicking the Reserved LUN Pool, selecting configure and add/removing LUNs as required (here we removed the wrong sized RLP LUN first, deleted it, and recreated the LUNs in the same RAID pool.)

Fig. 6: Reserved LUN Pool – Configure

Finally, you want to re-synchronize the AdminFractured mirrors by right-clicking the Secondary Image and selecting ‘Synchronize…’

Fig. 7: Secondary Image – Synchronize…

And you can monitor the synchronization progress on the Remote Mirror Properties – Secondary Image tab.

Fig. 8: Remote Mirror Synchronization in Progress

THE END!

The Final Word

To help understanding of RLP, monitor that as the secondary image is synchronized, gradually, more and more Reserved LUN Pool LUNs get allocated.

Fig. 9: Three RLP LUNs Allocated

Fig. 10:  Eight RLP LUNs Allocated

When synchronization is finished, RLP LUNs get un-allocated and return to ‘Free’ status.

1 comment:

  1. Excellent post - Thanks for the information really enjoyed reading it. Please visit my emi testing lab page and please leave comments.

    ReplyDelete