CAVEAT LECTOR! This
is a totally unofficial post by a NetApp enthusiast, and these actions were
performed on a SIM, not a real live production system!
Setup
Our starting point is a brand new, post-‘Cluster Setup
Wizard’, Clustered ONTAP 8.2.1 single-node cluster.
We’ve completed a very basic cluster setup so that we
have something to get back when we do the cluster restore:
storage disk assign -node NACLU7-01 -all
storage aggregate create aggr1 -diskcount 12
system license add NFSLICENCECODE
system timeout modify 0
vserver setup
In the Vserver setup script, we create a Vserver for NFS,
with 3 data volumes, a data LIF for NFS ...
Configuring System
Configuration Backup
The open source FileZilla FTP Server is used as a
destination to upload System Configuration Backups to. The following commands
configure System Configuration Backup, and test:
set -privilege advanced
system configuration backup settings modify
-destination ftp://10.10.10.11 -username ftpuser
system configuration backup settings set-password
system configuration backup show
system configuration backup create -node NACLU7-01
-backup-type node -backup-name 20141009NACLU7-01
system configuration backup create -node NACLU7-01
-backup-type cluster -backup-name 20141009NACLU7
system configuration backup upload -node NACLU7-01
-backup 20141009NACLU7-01.7z -destination ftp://10.10.10.11
system configuration backup upload -node NACLU7-01
-backup 20141009NACLU7.7z -destination ftp://10.10.10.11
Breaking the
Single-Node Cluster
We reboot the load, go into maintenance mode, and destroy
the root aggregate aggr0!
::> reboot
Ctrl-C for Boot Menu
Selection (5) Maintenance mode boot
*>
aggr status
*>
aggr offline aggr0
*>
aggr destroy aggr0
*>
halt
When the node reboots it will get to the below error and then
reboot, and continue in this loop until we fix it:
raid.assim.tree.noRootVol:error]:
No usable root volume was found!
Fixing the
Single-Node Cluster
If we were moving the root aggregate we would have pre-created
a new root aggregate. As it is in this instance, we have to run (4) from the
Boot Menu to create a new root aggregate. For this to work without wiping the
data aggregate that is still intact, we must unassign all disks, and assign
just the 3 disks we want to be used for the new root aggregate.
An Aside...
Note: If it is
unknown what are the spare disks, it is possible to temporarily set that data
aggregate as root, and boot into that and run a::>
::> storage
disk show -container spare
This requires
selecting option 5 from the boot menu and:
Ctrl-C for
Boot Menu
Selection
(5) Maintenance mode boot
We set the existing
data aggregate - aggr1 - to ha_policy cfo and root. After we reboot, the node
will boot from a newly created skeleton root volume (AUTOROOT.)
*> aggr options aggr1
ha_policy cfo
*> aggr options aggr1 root
*> halt
Fixing the
Single-Node Cluster (Continued)
Ctrl-C for Boot Menu
Selection (5) Maintenance mode boot
*>
aggr offline aggr1
*>
disk remove_ownership all
*>
disk show # should show no disks
*>
aggr status # should show no aggregates
*>
disk assign v5.28
*>
disk assign v5.29
*>
disk assign v5.32
*>
halt
After the node reboots:
Ctrl-C for Boot Menu
Selection (4) Clean configuration and initialize
all disks
The node will boot to a login prompt, and when you login
with the original credentials you will get this System Message:
A new root volume
was detected. This node is not fully operational.
Fixing the
Single-Node Cluster: Recovering the Data Aggregate(s)
Reboot again and go back into Maintenance mode to
reassign disks (cannot do this in the Clustershell in the current cluster state)
and get the data aggregate back
::> reboot
Ctrl-C for Boot Menu
Selection (5) Maintenance mode boot
*>
disk assign all
*>
aggr status
*>
aggr online aggr1
*>
aggr options aggr0 root
*>
halt
Fixing the
Single-Node Cluster: Recovery
After the system reboots, and we’ve logged back into the ‘temporary’
Clustershell, run these commands (Note:
The node mgmt1 LIF is remembered):
storage aggregate show
network interface show
set diag
system configuration backup download -node local
-source ftp://10.10.10.11/20141009NACLU7.7z
system configuration backup download -node local
-source ftp://10.10.10.11/20141009NACLU7-01.7z
system configuration recovery cluster recreate -from
backup -backup 20141009NACLU7.7z
Note: We recovery
from the cluster backup since this contains the node information.
Warning: This
command will destroy your existing cluster. It will rebuild a new single-node
cluster consisting of this node by using the contents of the specified backup
package. This command should only be used to recover from a disaster. Do not
perform any other recovery operations while this operation is in progress. This
command will cause all the cluster applications on this node to restart,
causing an interruption in CLI and Web interface.
Then halt the node:
::> halt
On reboot, press to access the loader
prompt, and run these commands:
VLOADER>
unsetenv bootarg.init.boot_recovery
VLOADER>
boot_ontap
It’s Fixed!
When the Single-Node Cluster boots up this time, you log
in, and run -
cluster show
set advanced
cluster ring show
volume show
...
Everything should be back to full health!
Comments
Post a Comment