Running a Test failover in VMware Site Recovery Manager, revealed the following error:
Error - Virtual machine file '[snap-XXX] XXX/XXX.vmx' cannot be found on recovered datastore.
Analysis of the mounted snapshot in the DR site revealed that indeed the VMX was not there. The virtual machine in question, had only been moved into the datastore a few days previously, so the conclusion was that the replication is not running every 15 minutes as it should have been.
Logging into the Navisphere Web UI revealed that the Secondary Image of some of the Remote Mirrors had become AdminFractured.
Fig. 1: AdminFractured Secondary Image
Right-clicking on one of the affected mirrors > selecting Properties > and navigating to the ‘Secondary Image’ tab, revealed the following Last Image Error:
Unable to create protective snap session on secondary array. Snapview returned an error on an attempt to create a protective snap session due to lack of free cache LUN's on secondary array. As a result of the error the mirror will be admin fractured. Add more cache LUN's on the secondary array and retry the sync request. (0x7152863e)
Fig. 2: Administratively fractured Secondary Image 'Last Image Error'
The error above refers to free cache LUN’s and these are found in the ‘Reserved LUN Pool’ on the secondary array. Navigating to the ‘Reserved LUN Pool’ or RLP via the Navisphere GUI > right-click and select Properties - revealed a misconfiguration.
Fig. 3: Reserved LUN Pool
Fig. 4: Reserved LUN Pool Properties
The reserved LUN pool is recommended to be 20% of the size of all the LUNs added up (it doesn’t have to be and if have only a few LUNs and/or few changes, then can configure it lower.) What had happened is that due to an ISP failure, the replication that was working, and was originally setup locally, had failed for a long time, and so – in order to re-replicate the data after the replication link was restored – additional storage was added to the reserved LUN pool. Unfortunately, the storage that was added was added in one big chunk (one big LUN – notice the 917.149 GB LUN in the image above.) The reserved LUN pool needs to be made up of small chunk sized LUNs, so that when replicating multiple LUNs, different mirrors can utilise different chunks – if you have one big reserved LUN then this can only be used by one replicating datastore.
Recommended size for a reserved pool LUN is found by:
i. Calculate 20% of the size of all the LUNs added up
ii. Divide this value by a number less than the ‘Maximum number of reserved LUNs’ for the array
You then create your pool of reserved LUNs of that size and number.
Fig. 5: Maximum number of reserved LUNs on CLARiiON AX/CX arrays
The solution is easily achieved by creating more LUNs, then right-clicking the Reserved LUN Pool, selecting configure and add/removing LUNs as required (here we removed the wrong sized RLP LUN first, deleted it, and recreated the LUNs in the same RAID pool.)
Fig. 6: Reserved LUN Pool – Configure
Finally, you want to re-synchronize the AdminFractured mirrors by right-clicking the Secondary Image and selecting ‘Synchronize…’
Fig. 7: Secondary Image – Synchronize…
And you can monitor the synchronization progress on the Remote Mirror Properties – Secondary Image tab.
Fig. 8: Remote Mirror Synchronization in Progress
The Final Word
To help understanding of RLP, monitor that as the secondary image is synchronized, gradually, more and more Reserved LUN Pool LUNs get allocated.
Fig. 9: Three RLP LUNs Allocated
Fig. 10: Eight RLP LUNs Allocated
When synchronization is finished, RLP LUNs get un-allocated and return to ‘Free’ status.