Friday, 27 September 2013

Resizing SnapMirror Volumes

Here we run through re-sizing a SnapMirror source and destination volume in our lab:


We have a source volume (v_src) that’s 300g in size, and destination volume (v_dst) that’s 300g in size, we want to increase the size of the source volume to 1200g:

NASRC> vol size v_src 1200g

NADST> snapmirror update -S nasrc:v_src nadst:v_dst
Transfer aborted: destination volume too small; it must be equal to or larger than the source volume.

NADST> snapmirror break v_dst  
snapmirror break: Destination v_dst is now writable.
Volume size is being retained for potential snapmirror resync.  If you would like to grow the volume and do not expect to resync, set vol option fs_size_fixed to off.

NADST> vol size v_dst 1200g    
vol size: Volume has the fixed filesystem size option set.
    
NADST> vol options v_dst fs_size_fixed off
NADST> vol size v_dst 1200g         
vol size: Flexible volume 'v_dst' size set to 1200g.
    
NADST> snapmirror resync -S nasrc:v_src nadst:v_dst 
The resync base snapshot will be: nadst(4055372815)_v_dst.6    
These newer snapshots will be deleted from the destination:
    hourly.0   
These older snapshots have already been deleted from the source and will be deleted from the destination:
    nadst(4055372815)_v_dst.5  
Are you sure you want to resync the volume? y

Volume v_dst will be briefly unavailable before coming back online.

Revert to resync base snapshot was successful!

The End!

Volume Reallocate on a SnapMirror Destination

NOTE: This is just a NetApp enthusiasts blog, please don’t take anything here as gospel!

Thet’s consider a scenario where we’ve got a SnapMirror destination, we’ve added some disks to the aggregate, and now we want to reallocate so that the volume isn’t just on the original disks. The diagram below illustrates this.

Image: Expanding an Aggregate, then Reallocating to “Spread” the volume
Why might you want to do this? Well, it’s usually for performance reasons. It’s unlikely the SnapMirror updates will be that big, so writes here aren’t really the problem, but what if you’re using NDMP to backup from the destination and it’s running slow reading its load from just an un-reallocated volume on the aggregate...

Here we’d use the:

reallocate start -f -p /vol/v_dst

{-f “force” only because we’ve added additional disks, -p for “physical reallocate” to “spread” the snapshots also - see here for more info}

But, you can’t run reallocate on a read-only SnapMirror destination, the SnapMirror must first be broken otherwise the below error appears:

Unable to start reallocation scan on '/vol/v_dst':
 readonly volume

Without further ado, the procedure is:

NADST> snapmirror break v_dst
NADST> reallocate start -f -p /vol/v_dst
NADST> reallocate status

{Wait for the reallocate to complete}

NADST> snapmirror resync -S NASRC:v_src NADST:v_dst
The resync base snapshot will be: nadst(4055372815)_v_dst.3
These older snapshots have already been deleted from the source and will be deleted from the destination:
    nadst(4055372815)_v_dst.2
Are you sure you want to resync the volume? y

NADST> snapmirror status

{Wait for the resync to complete, and then to check the last transfer size}

NADST> snapmirror status -l

Note: If you have deduped volumes, exercise caution!

Friday, 13 September 2013

A New SIM Recipe!

No, this isn’t anything to do with “The Sims” - perhaps if it was I’d get more hits ;-)

Below is my recipe for building what might be an optimally useful Clustered ONTAP 8.2 SIM, right up to the point where it can be halted and a snapshot taken in a pre-cluster setup - so you only need to do this once ever (twice to make the second node which needs a different system id for a two node cluster.) Then the SIM’s ready for future use to clone to your heart’s content. Of course, the simulator work’s wonderfully without modification - here we’re just tinkering :)

Recipe - Overview

1) Obtain the Clustered ONTAP 8.2 Simulator
2) Unpack, copy to working folder, copy again for node 2, and import into VMware Workstation
3) VMware Workstation - edit Virtual Hardware (e.g. add additional NICs and resize HDD)
4) First boot - set bootargs (for power off on halt, and node 2’s sysid)
5) Zero Disks
6) Abort cluster setup, trash the 2 virtual shelves of 14 x 1GB disks, add a shelf of virtual SSD, and 3 shelves of virtual 9GB HDDs
7) Assign 3 virtual SSD disks for root aggregate
8) Re-zero disks
9) Abort cluster setup, halt, and take snapshots!

Note: You might wonder why I’m using virtual SSD for the root aggregate, well, this isn’t something you’d ever do in real-life, it’s just so I have a shelf with SSDs in to play around with Flash Pool in the future.

Recommend Reading and References

Since this material has already been pretty much blogged to death on this blog, I’m just going to refer to the previous posts below with additional notes in the section afterwards:

[1] NetApp Simulator 8.1.2 Cluster Mode Installation Walkthrough

[2] VMware Workstation – Cannot Assign Available PCI Slot

[3] Lab Series 01: Part 4 – NetApp Data ONTAP 8.1.2 Clustered ONTAP Simulator Build Recipe 01

[4] NetApp Simulator for Clustered ONTAP 8.2 RC Setup Notes

[5] Clustered ONTAP 8.2 SIM: Maximizing Available Usable Space

Recipe - In More Detail

1) Obtain the Clustered ONTAP 8.2 Simulator

See [1]

2) Unpack, copy once (for node 2), and import into VMware Workstation

See [1]

3) VMware Workstation - Edit VMware Hardware (add NICs)

Here I’ve added four additional Network Adapters to the simulator. The NICs all use the NAT network. Also, the ‘Hard Disk 4 (IDE)’ is expanded from 250GB to 444GB
The first time you try to power on the simulator it will error, see [2] for how to edit the VMX file to make it work.

Image: Modifying the Data ONTAP Simulator Hardware

4) First boot - set bootargs

Ctrl-C at the “Hit [Enter] to boot immediately, or any other key for command prompt”

An option I like to set is:

VLOADER> setenv bootarg.vm.no_poweroff_on_halt false

(Just means the SIM powers off when you tell it too!)

See [4] to see how to set system id for the second node.

5) Zero Disks

Ctrl-C and Option 4 from the boot menu

See [5]

6) Abort cluster setup, trash the 2 virtual shelves of 14 x 1GB disks, add a shelf of virtual SSD, and 3 shelves of virtual 9GB HDDs

Follow [5] - the only difference is that we’re adding a shelf of SSD (500MB SSD disks are type 35):

sudo vsim_makedisks -n 14 -t 35 -a 0
sudo vsim_makedisks -n 14 -t 36 -a 1
sudo vsim_makedisks -n 14 -t 36 -a 2
sudo vsim_makedisks -n 14 -t 36 -a 3

7) Assign 3 virtual SSD disks for root aggregate

Still following [5]:

storage show disk
disk assign v4.16 v4.17 v4.18
disk show
halt

8) Re-zero disks

Still following [5]

9) Abort cluster setup, halt, and take snapshots!

APPENDIX A: Thoughts on a simple and elegant naming convention

Not related to this post, something I was thinking about at the same time though - rebuilding the lab is great opportunity to start afresh with a new naming convention!

Characters 1&2 for the system’s brand of OS
Characters 3&4 for the system’s type
Characters 5&6 just numbers starting 01

For example:

MSDC01 = MicroSoft Domain Controller 01
NA7M01 = NetApp 7-Mode system 01
NAC1-01 = NetApp Cluster 1 C-Mode system 01

APPENDIX B: A Few Starter Commands to Try on Your New CDOT Lab

security login password

storage aggregate rename -aggregate aggr0 -newname a_01

storage disk option modify -autoassign off -node *

storage disk assign -owner NAC1-01 -disk NAC1-01:v5.*
storage disk assign -owner NAC1-01 -disk NAC1-01:v6.*
storage disk assign -owner NAC1-01 -disk NAC1-01:v7.*

storage disk option modify -autoassign on -node *

system node autosupport modify -node * -state disable

storage aggregate add-disks -aggregate a_01 -diskcount 3 -disktype SSD

system node run -node NAC1-01 vol size vol0 1250m
system node run -node NAC1-01 reallocate start -f -p /vol/vol0


storage aggregate create -aggregate a_03 -node NAC1-01 -diskcount 42 -disktype FCAL -maxraidsize 14 -raidtype raid4

Wednesday, 11 September 2013

CDOT 8.2: How to Demonstrate QoS

The following post briefly details how to demonstrate QoS on NetApp Clustered ONTAP 8.2. A cluster called cluster1 has been configured, along with a CIFS enabled Vserver with volume and share. The demonstration is run from a Windows Server 2012 Domain Controller in the lab.

1) Map a drive to the CIFS volume

net use V: \\vserver1\vs1_vol1

2) Iometer Setup

Download iometer from http://www.iometer.org and install.

i) Run Iometer
ii) Select the server (in this case ADDC1010)
iii) ‘Disk Targets’ tab, tick V only, and configure:

Maximum Disk Size = 20000
Starting Disk Sector = 0
No. of Outstanding IOs = 16

Image: Configuring Iometer 1 - ‘Disk Targets’ tab

iv) ‘Access Specifications’ tab, choose “4K; 50% Read; 0% random

Image: Configuring Iometer 2 - ‘Access Specifications’ tab

v) ‘Test Setup’ tab

Test Description = QoS Test
Ramp Up Time = 4 seconds
Record Results = none

Image: Configuring Iometer 3 - ‘Test Setup’ tab
vi) ‘Results Display’ tab

Change ‘Update Frequency’ to 4 seconds, and click the Green flag to start.

Image: Configuring Iometer 4 - ‘Results Display’ tab

NOTE: Iometer creates at 80’000KB (20’000 x 4KB) file called iobw.tst in the V drive.

3) QoS Configuration

NOTE: PuTTY output from this step is in the APPENDIX B.

i) Login as admin to the Cluster Management IP of cluster1
ii) Run these commands:

cluster1::> QoS
cluster1::qos> policy-group show
cluster1::qos> statistics performance show -iterations 4

NOTE: The IOPS you see here should be close to what Iometer is showing.

cluster1::qos> policy-group create -policy-group pg_vserver1 -vserver vserver1 -max-throughput 1000iops

cluster1::qos> policy-group show
cluster1::qos> vserver modify -vserver vserver1 -qos-policy-group pg_vserver1
cluster1::qos> statistics performance show -iterations 4

NOTE: The IOPS you see now should be close to the policy-group setting.

iii) Notice that the IOPs as registered by Iometer are declining
iv) Stop Iometer

APPENDIX A: QoS Test Output

cluster1::> QoS

cluster1::qos> policy-group show
This table is currently empty.

cluster1::qos>
cluster1::qos> statistics performance show -iterations 4
Policy Group           IOPS      Throughput   Latency
-------------------- -------- --------------- ----------
-total-                  4709       18.02MB/s     2.71ms
User-Best-Effort         4581       17.89MB/s     2.78ms
_System-Best-Effort        93      141.86KB/s   535.00us
_System-Background         35           0KB/s        0ms
-total-                  4424       17.13MB/s     2.67ms
User-Best-Effort         4366       17.04MB/s     2.68ms
_System-Background         35           0KB/s        0ms
_System-Best-Effort        23       87.90KB/s     4.85ms
-total-                  4934       19.10MB/s     2.86ms
User-Best-Effort         4881       19.06MB/s     2.85ms
_System-Background         35           0KB/s        0ms
_System-Best-Effort        18       42.70KB/s     9.61ms
-total-                  4356       16.85MB/s     2.96ms
User-Best-Effort         4314       16.84MB/s     2.96ms
_System-Background         35           0KB/s        0ms
_System-Best-Effort         7       12.11KB/s    18.13ms

cluster1::qos> policy-group create -policy-group pg_vserver1 -vserver vserver1 -max-throughput 1000iops

cluster1::qos> policy-group show
Name             Vserver     Class        Wklds Throughput
---------------- ----------- ------------ ----- ------------
pg_vserver1      vserver1    user-defined -     0-1000IOPS

cluster1::qos> vserver modify -vserver vserver1 -qos-policy-group pg_vserver1

cluster1::qos> statistics performance show -iterations 4
Policy Group           IOPS      Throughput   Latency
-------------------- -------- --------------- ----------
-total-                  1099        4.10MB/s    15.24ms
pg_vserver1              1047        4.08MB/s    16.00ms
_System-Background         30           0KB/s        0ms
_System-Best-Effort        22       14.87KB/s   283.00us
-total-                  1100        4.09MB/s    15.21ms
pg_vserver1              1044        4.06MB/s    16.02ms
_System-Background         28           0KB/s        0ms
_System-Best-Effort        28       25.13KB/s   111.00us
-total-                  1109        4.08MB/s    15.05ms
pg_vserver1              1043        4.07MB/s    16.00ms
_System-Background         35           0KB/s        0ms
_System-Best-Effort        31       12.56KB/s    99.00us
-total-                  1096        4.09MB/s    15.35ms
pg_vserver1              1051        4.09MB/s    16.01ms
_System-Background         35           0KB/s        0ms
_System-Best-Effort        10           0KB/s        0ms

APPENDIX B: Some Useful Space Reporting Commands

cluster1::> volume show
cluster1::> volume show -vserver vserver1 -volume vs1_vol1 -field space-guarantee,space-guarantee-enabled
cluster1::> volume show-space -vserver vserver1 -volume vs1_vol1
cluster1::> volume show-footprint -vserver vserver1 -volume vs1_vol1
cluster1::> volume show-space vs1_vol1

NOTE: The above commands provide tons of information such as how to find - Inodes, Inodes Percent, Filesystem Metadata size... and on and on...

Saturday, 7 September 2013

Notes on Planning, Implementing, and Using MetroCluster

Following on from the previous two posts on SyncMirror (Part 1 & Part 2); this post is my home for a few notes on MetroCluster - planning, implementing and using. MetroCluster is about as complicated as it gets with NetApp - but, when you get your head around it, it is a surprisingly simple solution and very elegant too!

First things first

You’ll want to read:

Best Practices for MetroCluster Design and Implementation (78 pages)
By Jonathan Bell, April 2013, TR-3548
NOTE: I’ve found that the US links tend to be the most up-to-date ones!

Brief Synopsis

Two types of MetroCluster - Stretch (SMC) and Fabric-Attached (FMC):
1) SMC spans maximum distance of 500m*
2) FMC spans maximum distance of 200km*
*Correct at the time of writing.

Image: Types of NetApp MetroCluster
MetroCluster uses SyncMirror to replicate data.
                                                                                                                                          
FMC requires 4 fabric switches (NetApp provided Brocade/Cisco only) - 2 in each site (and additionally 4 ATTO bridges - 2 in each site- if using SAS shelves.)

1. Planning

Essential Reading/Research:

1.1) MetroCluster Planning Worksheet
NOTE: This can be found on page 70/1 of the document below.

1.2) High-Availability and MetroCluster Configuration Guide (232 pages)

NOTE: The above link is for Data ONTAP 8.1. For Data ONTAP documentation specific to your 8.X version, go to http://support.netapp.com > Documentation > Product Documentation > Data ONTAP 8

1.3) MetroCluster Production Documentation

This link currently links to:

Configuring a MetroCluster system with SAS disk shelves and FibreBridge 6500N bridges (28 pages)

Fabric-attached MetroCluster Systems Cisco Switch Configuration Guide (35 pages)

Fabric-attached MetroCluster™ Systems: Brocade® Switch Configuration Guide (32 pages)

Instructions for installing Cisco 9148 switch into NetApp cabinet (5 pages)

Specifications for the X1922A Dual-Port, 2-Gb, MetroCluster Adapter (2 pages)

1.4) Interoperability Matrix Tool (new location for the MetroCluster Compatibility Matrix)

The old PDF which goes up to ONTAP versions 7.3.6/8.0.4/8.1.2 is here:

1.5) Product Documentation for FAS/V-Series Controller Model
http://support.netapp.com > Documentation > Production Documentation

1.6) Product Documentation for Shelves
http://support.netapp.com > Documentation > Production Documentation

2. Implementing

NOTE: This post is only intended as rough notes, and is missing a lot of detail, for more details please refer to the above documents.

2.1) Rack and Stack
2.2) Cabling
2.2) Shelf power-up and setting shelf IDs
2.3) Configuring ATTO SAS bridges {FMC}
2.4) Configuring fabric switches {FMC}
2.5) Controller power-up and setup
2.6) Licensing
2.7) Assigning Disks
2.8) Configuring SyncMirror*
*Check out my previous SyncMirror posts Part 1 & Part 2)

2.6) Licensing

license add XXXXXXX # cf
license add XXXXXXX # cf_remote
license add XXXXXXX # syncmirror_local

NOTE: syncmirror_local requires a reboot to enable, but you can just use cf takeover/giveback!

2.7) Assigning Disks

sysconfig
disk show -v
disk assign -p 1 -s 1234567890 0b.23 {SMC}
disk assign -p 0 sw1:3* {FMC}
disk assign -p 1 sw3:5* {FMC}
storage show disk -p

NOTE: “When assigning disk ownership, always assign all disks on the same loop or stack to the same storage system and pool as well as the same adaptor.” Use Pool 0 for the local disks and Pool 1 for the remote disks.

3: Using

3.1) Recovering from a Site Failure

3.1.1) Restrict access to the failed site (FENCING - IMPORTANT)
3.1.2) Force the surviving node into takeover mode

SiteB> cf forcetakeover -d

3.1.3) Remount volumes from the failed node (if needed)
3.1.4) Recover LUNs of the failed node (if needed)

NOTE: If you have an iSCSI-attached host and the “options cf.takeover.change_fsid on” (default), you will need to recover LUNS from the failed node.

3.1.5) Fix failures caused by the disaster
3.1.6) Reestablish the MetroCluster configuration (including giveback)

NOTE: Here, SiteB controller has “taken over” a failed controller SiteA (which is in the “disaster site”)

To validate you can access the storage in Site A:

SiteB(takeover)> aggr status -r

To switch to the console of the recovered Site A controller:

SiteB(takeover)> partner

On determining the remote site is accessible. Turn Site A controller on. To determine status of aggregates for both sites:

SiteB/SiteA> aggr status

If the aggregates in the disaster site are showing online, need to change the state to offline:

SiteB/SiteA> aggr offline aggr_SiteA_01

To re-create the mirrored aggregate (here we choose the “disaster site” aggregate as the victim):

SiteB/SiteA> aggr mirror aggr_SiteA_rec_SiteB_01 -v aggr_SiteA_01

To check resyncing progress:

SiteB/SiteA> aggr status

NOTE: When aggregates are ready they transition to mirrored.

After all aggregates have been re-joined, return to the SiteB node and do the giveback:

SiteB/SiteA> partner
SiteB(takeover)> cf giveback

One final thing you might want to do is rename the aggregates back to how they were before the disaster:

SiteA> aggr rename aggr_SiteA_rec_SiteB_01 aggr_SiteA_01

3.2) Maintenance

If you’re just after simple maintenance (not site fail-over or anything like that):

cf disable

Do your work (i.e. re-cabling, power things down or up), then - when finished:

aggr status

Wait* for the aggregates to transition from resyncing to mirrored*:
*Here you need to watch for the rate of change on either side - too many changes and it might take forever for the resync to complete, hence the need for a maintenance window!

cf enable

You see, I told you it was simple :)

NOTE: You might want to do an options autosupport.doit “MC Maintence” to let NetApp support know.

NetApp SyncMirror Part 1/2: How to Configure SyncMirror

Walkthrough

In the following post, the Data ONTAP 8.1.2 7-Mode Simulator is used to demonstrate how to configure SyncMirror.
The starting point is 2 shelves of 14 virtual disks. One shelf is going to act as disk Pool0 containing one half of the mirrored aggregate; another shelf as disk Pool1 containing the other half of the mirrored aggregate.

To find the System ID:

NTAP> sysconfig

To show disk ownership and pools:

NTAP> disk show -v

Out of the box, the SIM has 14 disks are assigned to Pool0 - that is all disks v5.*. To assign 14 disks to Pool1:

NTAP> disk assign -p 1 -s 4055372815 v4.*

{Where 4055372815 is the System ID found from sysconfig}

NOTE: For a physical HA Pair: “When assigning disk ownership, always assign all disks to the same loop or stack to the same storage system and pool as well as to the same adaptor.”

The SyncMirror license must be enabled to create a mirrored aggregate; add the license and reboot to enable:

NTAP> license add RIQTKCL #syncmirror_local
NTAP> reboot

NOTE: For a physical HA Pair: Use cf takeover and cf giveback to non-disruptively reboot both nodes in the HA Pair.

To display disks per pool with their physical size:

NTAP> aggr status -r

The simulator starts off with 3 disks in aggr0 in Pool0 containing the root volume, this leaves 11 disks spare in Pool0. To create an 11 disk mirrored aggregate

NTAP> aggr create syncmirror_aggr1 -m -n 22@1027m

The output of the above provides the next command:

NTAP> aggr create syncmirror_aggr1 -m -d v5.32 v5.29 v5.28 v5.27 v5.26 v5.25 v5.24 v5.22 v5.21 v5.20 v5.19 -d v4.27 v4.26 v4.25 v4.24 v4.22 v4.21 v4.20 v4.19 v4.18 v4.17 v4.16

NTAP> aggr status
           Aggr State    Status         Options
syncmirror_aggr1 online  raid_dp, aggr
                         mirrored
                         64-bit
          aggr0 online   raid_dp, aggr  root
                         64-bit

The next bit is where we move the root volume to our mirrored aggregate; destroy aggr0; zero spares (the 3 disks used by aggr0 need zeroing); add four more disks to the mirrored aggregate (2 per pool), leaving one spare per pool; run a volume reallocate on vol0 (see here for why.)

df -h vol0
vol create vol0new syncmirror_aggr1 808m
ndmpd on
ndmpcopy /vol/vol0 /vol/vol0new
vol options vol0new root
reboot

{System reboots}

vol status
vol offline vol0
vol destroy vol0
aggr status
aggr offline aggr0
aggr destroy aggr0
disk zero spares
vol rename vol0new vol0

aggr status -r
aggr add syncmirror_aggr1 -d v4.28 v4.29 -d v5.16 v5.17
reallocate -f -p /vol/vol0
reallocate status

NOTE: For a physical Metrocluster environment - where you would have the cf and cf_remote licenses installed - having the root volume on a mirrored aggregate is a requirement. Otherwise the system would error with “Root volume is not mirrored. A takeover of this filer may not be possible in case of a disaster”.

NOTE 2: There are a couple of (pretty obvious) rules for adding disks to a mirrored aggregate
“1) The number of disks must be an even number and must be evenly divided between the two plexes.”
“2) Each plex must have disks from different pools and have equivalent bytes-per-sector sizes.”

Image: How Disk Pools and Plexes Make up a SyncMirror Mirrored Aggregate
APPENDIX: Output of vol status, aggr status, aggr status -r

ntap> vol status
         Volume State   Status            Options
           vol0 online  raid_dp, flex     root
                        mirrored
                        64-bit
ntap> aggr status
           Aggr State   Status            Options
syncmirror_aggr1 online  raid_dp, aggr     root
                        mirrored
                        64-bit
ntap> aggr status -r
Aggregate syncmirror_aggr1 (online, raid_dp, mirrored) (block checksums)
  Plex /syncmirror_aggr1/plex0 (online, normal, active, pool0)
    RAID group /syncmirror_aggr1/plex0/rg0 (normal, block checksums)

      RAID Disk Device   Pool     Phys(MB)
      --------- ------   ----     --------
      dparity   v5.32      0      1027
      parity    v5.29      0      1027
      data      v5.28      0      1027
      data      v5.27      0      1027
      data      v5.26      0      1027
      data      v5.25      0      1027
      data      v5.24      0      1027
      data      v5.22      0      1027
      data      v5.21      0      1027
      data      v5.20      0      1027
      data      v5.19      0      1027
      data      v5.16      0      1027
      data      v5.17      0      1027

  Plex /syncmirror_aggr1/plex1 (online, normal, active, pool1)
    RAID group /syncmirror_aggr1/plex1/rg0 (normal, block checksums)

      RAID Disk Device   Pool     Phys(MB)
      --------- ------   ----     --------
      dparity   v4.27      1      1027
      parity    v4.26      1      1027
      data      v4.25      1      1027
      data      v4.24      1      1027
      data      v4.22      1      1027
      data      v4.21      1      1027
      data      v4.20      1      1027
      data      v4.19      1      1027
      data      v4.18      1      1027
      data      v4.17      1      1027
      data      v4.16      1      1027
      data      v4.28      1      1027
      data      v4.29      1      1027


Pool1 spare disks

RAID Disk       Device   Pool     Phys(MB)
---------       ------   ----     --------
spare           v4.32      1      1027

Pool0 spare disks

RAID Disk       Device   Pool     Phys(MB)
---------       ------   ----     --------
spare           v5.18      0      1027

CONTINUED IN PART 2...

NetApp SyncMirror Part 2/2: How to do Data Recovery with SyncMirror

Carrying on from Part 1...

The Resynchronize Process is in three steps:

1. Split the aggregate
2. Change the state of the plex
3. Rejoin a plex (which resynchronizes the mirrored aggregate)

1. Split the aggregate

To check the current status of the volumes, aggregate, and their plexes:

NTAP> vol status
NTAP> aggr status
NTAP> aggr status -v

NOTE: In this lab, both volume vol0, and aggregate syncmirror_aggr1, show State=online, and Status=mirrored. Abridged outputs from the CLI are in the Appendix B below.

To perform the split:

NTAP> aggr split syncmirror_aggr1/plex0 aggr1_split

To review the new status of the volumes, aggregate, and their plexes:

NTAP> vol status
NTAP> aggr status
NTAP> aggr status -v

Image: Splitting Aggregates

2. Change the state of the plex

In my lab, I only had the root volume on the mirrored aggregate. I’m going to set the volume that exists on aggr1_split as root - since there was already a vol0, on splitting the aggregate the volume on the split mirror was called vol0(1) - then reboot the node to set the volume as root; and then offline the syncmirror_aggr1 plex1.

vol options vol0(1) root
reboot

{System reboots}

vol status
vol offline vol0
vol destroy vol0
vol rename vol0(1) vol0
vol status
aggr status -v
aggr offline syncmirror_aggr1

3. Rejoin a Plex

Here you have to choose which aggregate is to be the victim. The victim aggregate is overwritten (re-synced to) and needs to be offline first. We already have syncmirror_aggr1 offline, so:

aggr status
aggr mirror aggr1_split -v syncmirror_aggr1
This will destroy the contents of syncmirror_aggr1.  Are you sure? Y
aggr status
aggr status -V

APPENDIX A: States of a Plex

Active =The plex is online and available for use
Failed =One or more of the RAID groups in the plex have failed
Inactive =The plex is not available for use
Normal =All RAID groups are functional
Out-of-Date =The plex is not available for reads or writes
Re-synching =The plex contents are resynchronizing with the contexts of the other plex in the aggregate

APPENDIX B: Lab CLI Output (Abridged)

ntap> vol status

         Volume State           Status            Options
           vol0 online          raid_dp, flex     root
                                mirrored
                                64-bit

ntap> aggr status

           Aggr State           Status            Options
syncmirror_aggr1 online         raid_dp, aggr     root
                                mirrored
                                64-bit
ntap> aggr status -v

           Aggr State           Status            Options
syncmirror_aggr1 online         raid_dp, aggr     root...
                                mirrored
                                64-bit

Volumes: vol0

Plex /syncmirror_aggr1/plex0: online, normal, active
RAID group /syncmirror_aggr1/plex0/rg0: normal, block checksums

Plex /syncmirror_aggr1/plex1: online, normal, active
RAID group /syncmirror_aggr1/plex1/rg0: normal, block checksums

ntap> aggr split syncmirror_aggr1/plex0 aggr1_split

ntap> vol status

         Volume State           Status            Options
           vol0 online          raid_dp, flex     root
                                64-bit
        vol0(1) online          raid_dp, flex
                                64-bit
ntap> aggr status

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr
                                64-bit
syncmirror_aggr1 online         raid_dp, aggr     root
                                64-bit
ntap> aggr status -v

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     ...
                                64-bit

Volumes: vol0(1)

Plex /aggr1_split/plex0: online, normal, active
RAID group /aggr1_split/plex0/rg0: normal, block checksums

syncmirror_aggr1 online         raid_dp, aggr     root...
                                64-bit

Volumes: vol0

Plex /syncmirror_aggr1/plex1: online, normal, active
RAID group /syncmirror_aggr1/plex1/rg0: normal, block checksums

ntap> vol options vol0(1) root
ntap> reboot

login as: root

ntap> vol status

         Volume State           Status            Options
           vol0 online          raid_dp, flex
                                64-bit
        vol0(1) online          raid_dp, flex     root
                                64-bit
ntap> vol offline vol0

Volume 'vol0' is now offline.

ntap> vol destroy vol0

Are you sure you want to destroy volume 'vol0'? y
Volume 'vol0' destroyed.

ntap> vol rename vol0(1) vol0

ntap> vol status

         Volume State           Status            Options
           vol0 online          raid_dp, flex     root
                                64-bit
ntap> aggr status

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     root
                                64-bit
syncmirror_aggr1 online          raid_dp, aggr
                                64-bit

ntap> aggr status -v

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     root...
                                64-bit

Volumes: vol0

Plex /aggr1_split/plex0: online, normal, active
RAID group /aggr1_split/plex0/rg0: normal, block checksums

syncmirror_aggr1 online         raid_dp, aggr     ...
                                64-bit

Volumes:

Plex /syncmirror_aggr1/plex1: online, normal, active
RAID group /syncmirror_aggr1/plex1/rg0: normal, block checksums

ntap> aggr offline syncmirror_aggr1

Aggregate 'syncmirror_aggr1' is now offline.

ntap> aggr status

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     root
                                64-bit
syncmirror_aggr1 offline        raid_dp, aggr     lost_write_protect=off
                                64-bit

ntap> aggr mirror aggr1_split -v syncmirror_aggr1

This will destroy the contents of syncmirror_aggr1. Are you sure? Y

ntap> aggr status

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     root
                                resyncing
                                64-bit

ntap> aggr status -v

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     root...
                                resyncing        
                                64-bit

Volumes: vol0

Plex /aggr1_split/plex0: online, normal, active
RAID group /aggr1_split/plex0/rg0: normal, block checksums

Plex /aggr1_split/plex1: online, normal, resyncing
RAID group /aggr1_split/plex1/rg0: normal, block checksums

ntap> aggr status

           Aggr State           Status            Options
    aggr1_split online          raid_dp, aggr     root
                                mirrored
                                64-bit

THE END!