Thursday, 18 December 2014

Root Volume Not Working Properly: Recovery Required (On a SIM)

Note: This post is based on NetApp Clustered ONTAP 8.2.1 Simulators

If you use NetApp Simulators a lot, likely you’ll come across the following at:

Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED

- or -

SYSTEM MESSAGES
The root volume (/mroot) is dangerously low on space (less than 10MB). To make space available, delete old Snapshot copies, delete unneeded files, and/or expand the root volume’s capacity. After enough space is made available, reboot this controller...

This error is totally unlikely to ever happen on a production system. The reason why it happens on SIMs is that the root volume is so tiny - like less than 900MB - whereas production systems will be over 250GB or probably much much more (CDOT systems come out of the factory with vol0 set at 95% the size of the 3 disk root aggregate, so pretty much at least 95% of the size of the smallest disk which is usually > 900GB these days!)

So, if you’re reading this post, odds are you’ve got this error, so how to fix?

1) If you’ve not logged into the console already, do so.
2) You’ll find yourself on an NODENAME::> prompt, at that prompt type ::>

node run local

3) Disable vol0’s snapshot schedule >

snap sched vol0 0 0 0

4) Delete any snapshots of vol0 >

snap delete -a vol0

5) Set vol0’s snap reserve to 0% >

snap reserve vol0 0

Now if you do a >
df vol0
- you should have plenty of available capacity, and we could reboot the node to get the CDOT SIM back up again, but there’s more we can do in the nodeshell!

6) Disable aggregate snapshots >

aggr status
snap sched -A aggr0 0 0 0

{Replace aggr0 with the correct name if it is not the name of the root aggregate}

7) Delete all aggregate snapshots >

snap delete -a -A aggr0

8) Verify/set aggr0’s snap reserve to 0% >

snap reserve -A aggr0 0

9) Check the size of aggr0, and attempt to set vol0 to be 100% of that size >

df -A aggr0
df vol0
vol size vol0 921600k

- but it will error and tell you “Cannot grow root volume to more than 95% of the available aggregate size which is currently ...”; set vol0 to be that size >

vol size vol0 870664k

{Replace the sizes above if you get different from your SIM}

Finally check the size of vol0 with >

df vol0

- and reboot to the CDOT SIM to get the cluster back up and running again.

NODENAME> exit
NODENAME::> reboot

When the CDOT SIM is back up, there is yet more we can do!

10) If you have spare disks, add them to aggr0, and expand vol0 to 95% of aggr0’s size. The SIM comes with 3 x 1GB disks in a RAID-DP, I reckon 7 is an excellent number, so we’ll increase aggr0 to 7 disk as below:

Login to the cluster ::>

storage disk show -container unassigned
storage disk assign -node NODENAME -all true
storage aggr add-disks -aggregate aggr0 -diskcount 4
df -A aggr0
system node run local vol size vol0 4372700k

11) If you didn’t have spare disks (or even if you did), you could convert aggr0 to RAID4 (again, it’s just a SIM), then add that disk to aggr0.

From the clustershell ::>

storage aggregate modify -aggregate aggr0 -raidtype raid4
storage aggregate show -aggregate aggr0 -fields state
storage aggregate add-disks -aggregate aggr0 -diskcount 1
disk show -container-type aggregate -aggregate aggr0

And again increase the size of vol0 to 95% of aggr0’s size like we did in 9 and 10.

12) Finally, there is some advanced tidy up of vol0 we can do via the Systemshell (Note: This blog post is just about Simulators - in the real world, the Systemshell should only be used under advice and guidance from NetApp Support!)

::> set -privilege diagnostic
::*> security login unlock -username diag
::*> security login password -username diag
::*> systemshell

login: diag
Password: {As set above}

% cd /mroot/etc/log
% pwd
% ls
% rm *.log.*
% ls

% cd /mroot/etc/log/mlog
% pwd
% ls
% rm *.log.*
% ls

% cd /mroot/etc/software
% pwd
% ls
% rm *
% ls

% exit

::> set -privilege admin
::> df vol0

Note: /mroot/etc/software will not exist unless some software has been downloaded to the SIM in its lifetime.

THE END...

Recreating the Error?

This is super simple if you really want to see it:

::> system node run local
> df vol0
> vol size vol0 XXXXXXXXk

{Where XXXXXXXXk is a size like just than 1k more than the size of vol0!}

No comments:

Post a Comment