“Aggregate Relocate” with Data ONTAP 8.1 7-Mode


UPDATE: As pointed out by Mizen (thank you very much Mizen), there's a NetApp KB article that much better details doing this if you've got a HA-pair - How to move an aggregate between software disk-owned HA pairs. - and it also includes the following to offline an aggregate with volumes in:
FILER1>priv set diag
FILER1*>aggr offline -a

Introduction

Aggregate Relocate (ARL) was introduced in Clustered Data ONTAP 8.2, and this allowed a storage admin to non-disruptively relocate aggregate ownership within a HA-pair. The main use case being for nondisruptive controller upgrade operations.

The Clustered ONTAP CLI command for ARL is*:
storage aggregate relocation start
*Check out the “Clustered Data ONTAP 8.2 High-Availability Configuration Guide” for more information.

This post though is about 7-Mode, and ARL is not a feature in Data ONTAP 8.2 7-Mode and it is extremely unlikely it will ever be a feature of Data ONTAP 7-Mode (do a search for “relocate” in the “Data ONTAP 8.2 Release Notes For 7-Mode” or “Data ONTAP 8.2 High Availability and MetroCluster Configuration Guide For 7-Mode” and you will find no mention of relocate in 7-Mode since it does not exist!)

So, what is the title getting at?

Well, one feature of 7-mode that is not a feature of Clustered ONTAP**, is that you can offline an aggregate, remove ownership of this disks, then assign those disks to another 7-mode controller, online the aggregate, and have all the volumes, LUNs, etcetera return. Of course, this operation is disruptive.

**Why is it not possible in Clustered ONTAP? Well, Clustered ONTAP has four replicated database (RDB) units and one of these is the VLDB (volume location database), if disks holding a foreign aggregate and volumes are added to a controller running Clustered ONTAP, these volumes do not exist in the VLDB so cannot be brought back!

Walkthough: How to Disruptively Relocate an Aggregate from one 7-Mode Controller to Another

In the following lab illustration, we do not have a physical HA-Pair to play with, just the 8.1.2 7-Mode Simulator. We have a controller (ctra) with two aggregates - aggr0 (3 disks) and aggr1 (24 disks); aggr0 has the root volume (vol0) and will not be touched, aggr1 has 3 volumes and each volume has a LUN. To demonstrate the process, we will offline aggr1 and remove ownership of all the disks from ctra, then do a 4a wipe of ctra and bring this up as ctrb, then assign aggr1’s disks to ctrb, bring aggr1 online, and verify the volumes and LUNs are still there!

Note: A fuller CLI output is contained in the Appendix - here we essentially just record the commands.

# Verify Aggregates, Volumes, LUNs and Disks #

ctra> aggr status
          aggr1 online
          aggr0 online

ctra> vol status
           vol0 online
           vol1 online 
           vol2 online
           vol3 online

ctra> aggr status -v aggr1
        Volumes: vol1, vol2, vol3

ctra> lun show
        /vol/vol1/lun1   (r/w, online, mapped)
        /vol/vol2/lun2   (r/w, online, mapped)
        /vol/vol3/lun3   (r/w, online, mapped)

ctra> aggr status -r aggr0
Aggregate aggr0 (online, raid_dp)
Device
------
v5.16 v5.17 v5.18

ctra> aggr status -r aggr1
Aggregate aggr1 (online, raid_dp)
Device
------
v4.16 v4.17 v4.18 v4.19 v4.20 v4.21 v4.22 v4.24 v4.25 v4.26 v4.27 v4.28 v4.29  
v5.19 v5.20 v5.21 v5.22 v5.24 v5.25 v5.26 v5.27 v5.28 v5.29 v5.32  

ctra> aggr status -s
Pool0 spare disks
Device
------
v4.32

# Unmap LUNs, Offline LUNs, Offline Volumes, and try to Offline aggr1 #

ctra> lun unmap /vol/vol1/lun1 igroup1
ctra> lun unmap /vol/vol2/lun2 igroup1
ctra> lun unmap /vol/vol3/lun3 igroup1


ctra> lun offline /vol/vol1/lun1
ctra> lun offline /vol/vol2/lun2
ctra> lun offline /vol/vol3/lun3

ctra> vol offline vol1
ctra> vol offline vol2
ctra> vol offline vol3

ctra> priv set advanced
ctra*> aggr offline aggr1
aggr offline: Cannot offline aggregate 'aggr1' because it contains one or more flexible volumes.

Note 1: aggr1 could not be set offline!
Note 2: If you try to remove_ownership of a disk from on online aggregate you get:

ctra*> disk remove_ownership v4.16 -f
Disk v4.16 will have its ownership removed
Ownership Remove request failed for disk v4.16. Reason:Disk is part of an online aggregate or volume. Changing its owner is not allowed, because that may cause aggregate, volume, or filer outage.
Disk ownership remove request failed.

# For a physical HA-pair disable controller failover #

ctra> cf disable

# Disable disk autoassign #

ctra> options disk.auto_asign off
ctrb> options disk.auto_asign off

# Boot into maintenance mode #

ctra> reboot

Press Ctrl-C for Boot Menu.
Selection (5) for Maintenance mode boot

# Offline aggregate aggr1 and remove disk ownership of its disks #

*> aggr status
*> aggr offline aggr1
*> disk remove_ownership -f v4.16 v4.17 v4.18 v4.19 v4.20 v4.21 v4.22 v4.24 v4.25 v4.26 v4.27 v4.28 v4.29
*> disk remove_ownership -f v5.19 v5.20 v5.21 v5.22 v5.24 v5.25 v5.26 v5.27 v5.28 v5.29 v5.32
*> disk show -a
*> halt

# Reassign the disks to ctrb and bring aggregates, volumes, and LUNs online #

ctrb> aggr status -r
ctrb> disk assign v4.16 v4.17 v4.18 v4.19 v4.20 v4.21 v4.22 v4.24 v4.25 v4.26 v4.27 v4.28 v4.29
ctrb> disk assign v5.19 v5.20 v5.21 v5.22 v5.24 v5.25 v5.26 v5.27 v5.28 v5.29 v5.32
ctrb> aggr status
ctrb> aggr status -r
ctrb> aggr online aggr1
ctrb> vol status
ctrb> vol online vol1
ctrb> vol online vol2
ctrb> vol online vol3
ctrb> lun show
ctrb> lun online /vol/vol1/lun1
ctrb> lun online /vol/vol1/lun1
ctrb> lun online /vol/vol1/lun1

ctrb> lun show
        /vol/vol1/lun1   (r/w, online)
        /vol/vol2/lun2   (r/w, online)
        /vol/vol3/lun3   (r/w, online)

# Re-enable disk autoassign #

ctra> options disk.auto_asign on
ctrb> options disk.auto_asign on

# For a physical HA-pair re-enable controller failover #

ctrb> cf enable

THE END: 7-Mode Disruptive Aggregate Relocate is done!

Comments

  1. Hello :
    I have a question if you can help me in case of one controller is broken doan , can we migrate its aggregates to the partner controller with this method in 7-mode ?
    Regards
    Sofane

    ReplyDelete
  2. Hello Sofane,

    Did the storage takeover work?

    If cf was enabled ("cf enable" in 7-mode) then one controller failing shouldn't be a problem, it's partner will take it over.

    If you halt the whole system and boot into maintenance mode, then your should be able to assign all disks to the one working node (no guarantees there), but you'd need to recreate CIFS shares/NFS exports etc. (for the imported aggregates) once the system comes up.

    Once you assign all disks, check you can see all the aggregates ("aggr status" in maintenance mode). If you can, all good.

    Cheers, VC

    ReplyDelete
  3. HI VC,
    Thanks for your reply and for your time
    Yes the second node is on takeover mode the first node is down and I want to Manage their disks by 2nd node no matter If I loss the volumes I don't have shares juste vmware data store that I can recreate
    My question if I enter second node in maintenance mode so I can manger the disk of the broken node and assign theme to 2nd node as you said Because in normal mode noway to get access to them .?

    My 2nd question if you don't mind if I have a node hardware is OK but I think a problem in firmware software is theore a way to get it back ? when I connect by console no thing shown .

    Regards
    Sofiane



    ReplyDelete
  4. Hi VC

    Thanks for your time and reply

    So as I understand in take over mode i have to go in maintenance mode on the survived node and assign partner's dead node disk to it . for me no matter i lose data of that disks.

    My 2nd question if you don't mind Is theire a way to kake node up if I t z a firmware problem the node is up but no signal what I connect the console.

    Regards
    Sofiane

    ReplyDelete

Post a Comment