These notes were compiled whilst watching the SRM5 videos at http://vmwarelearning.com by Andrew Ellwood. The videos are well worth watching when you have the time – if you don’t then these notes will suffice as a primer. The videos go into vSphere replication as opposed to array-based replication, but – even if you will be using array-based replication – the information contained is very useful when planning for SRM5.
SRM Video 1: SRM5 Concepts
Four-Step Disaster Recovery with SRM:
1. At the protected site, SRM shuts down the virtual machines, starting with the virtual machines designated with the lowest priority
2. At the recovery site, SRM prepares the datastore groups for failover of the protected virtual machines
3. To provide more resources for the virtual machines to be powered on at the recovery site, SRM suspends any virtual machines running at the recovery site that are designated as noncritical
4. SRM restarts virtual machines at the recovery site, starting with the virtual machines that are designated as the highest priority
. Planned Migration – synchronizes storage and cancels recovery if errors are encountered
Note: Planned Migration can also be used for migrating to a different IP subnet
ii. Disaster Recovery – attempt to synchronize storage but otherwise use the most recent storage synchronization data. Continue recovery even if errors are encountered
SRM Features: Automated Failback
One-button re-protect and failback
Support for Various Disaster Recovery Topologies:
- Active-passive failover
- Active-active failover
- Bidirectional failover
- Shared recovery sites
SRM Video 2: Installing SRM5
VMware vCenter Server 5.0 must be installed at both the protected site and the recovery site. A single instance of the VMware vSphere Client installed with the VMware vCenter Site Recovery Manager Plug-In, can manage both sites.
- Windows 64-bit O/S
- Database to hold SRM information (including VM configurations, protection groups, recovery plans)
- Network connectivity from SRM Server to vCenter Server
- Install SRM at both the protected and recovery site
- Can install on physical or virtual (works well to separate vCenter and SRM in a virtual environment)
- Install the SRAs (Storage Replication Adapters) on the SRM Server host
- If vCenter Server workload is high, consider installing SRM on a separate system
- Place SRM in your management cluster and network (if you have one)
Order of Installation:
1. Install vCenter at the protected site and the recovery site
2. Configure the vCenter Server inventory
3. Configure the SRM database
4. Install SRM Server on both sites
5. Install the SRM plug-in
6. Pair the sites
Configuring the vCenter Server inventory:
Hosts and Clusters
Fig. 1: Resource pools design for SRM
Fig. 2: VMs and Templates design for SRM
Even if you do not plan to use vSphere Replication, you can install vSphere Replication when prompted – this just makes the binaries and VRMS and VRS OVF’s available.
Fig. 3: Install vSphere Replication
Use fully qualified domain names (FQDNs) wherever possible, and lowercase for everything (avoids random issues with capitalization)
Need an account with administrator level access to the vCenter
Can automatically generate a certificate or purchase a PKCS#12
Info to Register the VMware vCenter Site Recovery Manager Extension:
- Local Site Name
- Administrator E-mail (to receive notifications)
- Additional E-mail
- SRM_DB pre-created
- ODBC DSN Setup (can be done from the SRM Installer)
- Database user account
Install the VMware vCenter Site Recovery Manager plug-in
Note: Deploy to management servers in both sites
SRM Video 3: Site Pairing
Getting Started > 1. Connect the Sites > Configure Connection
Note: Alternative is – Summary > Configure Connection
Fig. 4: vSphere Client: Home > Solutions and Applications > Site Recovery – Sites
vCenter servers default port = 443
SRM servers default port = 8095
SRM Video 4: Storage Replication – part 1 (vSphere Replication)
Note: Array-based replication and VR (vSphere Replication) are complementary solutions and can be used together.
With array-based replication, the LUN design has to be done in such a way that only the servers requiring replication are replicated. This could be done on a one server per LUN basis, or multiple replicated servers per LUN, with un-replicated LUNs for local servers. An advantage with vSphere Replication is that it can be done on a per-VM level as opposed to per datastore.
VR cannot automatically reprotect.
vSphere Replication Setup:
Important: vCenter needs to be configured with a managed IP Address
Done via the vSphere Client and Site Recovery plug-in
Fig. 5: vSphere Replication Deployment Example
SRM Video 5: Storage Replication – part 2 (vSphere Replication)
VRMS = vSphere Replication Management Server
VRS = vSphere Replication Server (does the heavy lifting)
In large deployment, you may have multiple VRS servers, but only need one VRMS per site.
SRM Video 6: Inventory Mapping
Inventory mappings are optional but recommended.
An inventory mapping specifies the resources that a recovered virtual machine uses.
Four elements can be mapped:
- Compute resources, such as resource pools
- Virtual machine folders
- Placeholder datastore
For a virtual machine that is not fully mapped, then its mappings must be individually configured when you protect it!
Fig. 6: vSphere Client: Home > Solutions and Applications > Site Recovery – Sites > Getting Started
Fig. 7: Example Resource Mappings
Placeholder datastores are where we put the virtual machine configuration files on the recovery site in anticipation of that VM failing over.
Fig.8: Placeholder_VMs datastore (in recovery site)
SRM Video 7: Protection Groups
A protection group:
Typically contains virtual machines that are related in some way:
- A three-tier application (application server, database server, web server)
- Virtual machines whose virtual disk files are part of the same datastore group
Note: For VMs that share a datastore group, you cannot just fail-over one, you must fail-over all!
Fig. 9: Protection Groups
SRM Video 8: Creating a Recovery Plan
What was the failure?
- complete site failure
- SAN failure
- ISP failure
You can have many recovery plans (for different scenarios) or just one (for the complete site failure.)
An SRM recovery plan includes:
- A list of virtual machines from the protection groups
- A startup and priority order for those virtual machines
- Any custom steps added before or after virtual machine startup
SRM recovery plans enable an administrator to determine the following:
- Which protection groups are to be included in the recovery plan
- Which virtual machines at the recovery site are to be suspended?
- The power-on order of protected virtual machines during failover
A recovery plan is created at the recovery site (if you lost the main site with your recovery plans in…)
vSphere Client: Home > Solutions and Applications > Site Recovery – Recovery Plans: Create Recovery Plan
You get to choose a Test Network!
Can leverage Priorities and Dependencies to get VMs to start in the correct order!
Add Non-Critical VM – can pause a VM to release resource for more critical servers.
SRM Video 9: Testing a Recovery Plan
A recovery plan is essentially a script!
Fig. 10: Recovery Plan Test Button
Fig. 11: Option to replicate recent changes to the recovery site
Fig. 12: Recovery Plan Cleanup Button
SRM Video 10: SRM Failover
Fig. 13: Recovery Plans – Recovery Button
Click of the ‘Recovery’ button occurs on the recovery site (because the protected site is down.)
SRM automates many of the tasks required at the time of failover:
- Powers down the protected virtual machines
- Stops data replication and enables the recovery site datastores
- Rescans the ESXi hosts at the recovery site
- Suspends nonessential virtual machines if specified in the recovery plan
- Powers on virtual machines at the recovery site
Running the Recovery plan:
Tick ‘I understand that this process will permanently alter the virtual machines and infrastructure of both the protected and recovery datacenters’
- Planned Migration (if errors occur it will stop)
- Disaster Recovery
SRM Video 11: SRM Failback
One button reprotect when using Array-Based Replication:
- Reverse the replication of storage
- Reprotect the VMs in the reverse direction
- The original protected site becomes the recovery site
Of course, if the original protected site is gone, there is nothing to failback to, in which case – using the templates from the recovery site – we can rebuild the configuration in the reverse direction!