Wednesday, 1 January 2014

Clustered ONTAP Storage Admins' Health Checks


Here is a very rough and ready list of a few health-check type commands for the Clustered Data ONTAP storage admin. Of course, with OnCommand Unified Manager, there shouldn’t be much of a need to run manual checks; still, it’s nice keep a list up your sleeve. They’ve been grouped into type of object. Some of these commands will require going into advanced or diag privilege level (set -privilege advanced/diag.)

Note: Some of these commands only work in Clustered ONTAP 8.2 or later.

Dashboard Show Commands

dashboard alarm show
dashboard health vserver show
dashboard performance show
dashboard storage show

Cluster Health Check Commands

cluster show
cluster ring show
cluster ha show
cluster ping-cluster -node NODENAME
date {or} cluster date show
system license show
system license show -fields expiration-date
debug vreport show
event log show -severity EMERGENCY
event log show -severity ALERT
event log show -severity CRITICAL
event log show -severity ERROR
event log show -severity WARNING

Node Health Check Commands

node show
storage failover show
storage failover show -instance
storage failover show -fields hwassist,hwassist-partner-ip,hwassist-partner-port,hwassist-health-check-interval,hwassist-retry-count,hwassist-status
system node image show
event route show
event destination show
system node autosupport show
ndpmd status
spm show -node * -state !running
system health ?
system health node-connectivity shelf show -node NODENAME
system health node-connectivity disk show -node NODENAME -status !OK
system node run -node NODENAME storage show acp
system node run -node NODENAME sysconfig -c
system node run -node NODENAME sysconfig -V
system node run -node NODENAME netstat -s
system node run -node NODENAME netstat -p tcp
system node run -node NODENAME fru_led status
system environment sensors show -node NODENAME

Aggregate Health Check Commands

storage aggregate show
storage aggregate show -state !online
storage aggregate show -aggregate * -percent-used >75
storage aggregate show -aggregate * -raidstatus !”raid_dp,normal”
storage aggregate show -fields free-space-realloc
storage aggregate show -fields percent-snapshot-space
system node run -node NODENAME snap sched -A
system node run -node NODENAME snap reserve -A
system node run -node NODENAME snap list -A

Disk Health Check Commands

storage disk show -state broken
storage disk show -state reconstructing
storage disk show -state spare
storage disk show -average-latency > 10 -fields average-latency,aggregate

Volume Health Check Commands

vol show
vol show -state !online
vol show -vserver * -volume * -percent-used >79
vol show -vserver * -volume * -percent-used <60
vol show -snapshot-policy none -type !DP
vol show -fields percent-snapshot-space
vol show -snapshot-space-used > 80 -fields percent-snapshot-space,snapshot-space-used
vol show -percent-snapshot-space 0
df -gigabyte -volume USING_OUTPUT_FROM_THE_ABOVE {to check for volumes consuming lots of snapshots space but with no snapshot reserve}
vol show -space-guarantee volume
vol show -is-sis-logging-enabled false
vol show -fields read-realloc
vol show -snapshot-count 0
vol show -snapshot-count > 200
vol snap show -create-time <"mon Dec 29 00:00:00 2013"
vol snap show -snapshot *snapmirror* -create-time <"mon Dec 29 00:00:00 2013"
vol snap show -snap !hourly.*,!weekly.*,!daily.*
df -i -vserver * -volume * -percent-inodes-used >79
vol efficiency show
vol efficiency show -fields schedule,last-op-end

Network Port/Interface Health Check Commands

net port show
net int show
net int show -is-home false
net int show -status-oper !up
net int failover-groups show
net int show -fields failover-policy,failover-group,use-failover-group

LUN Health Check Commands

lun show -mapped unmapped
lun show -lun *.rws
lun show -lun *.aux

SnapMirror Health Check Commands

snapmirror show
snapmirror show -healthy false
snapmirror show -status !Idle
snapmirror check -destination-path PATH -foreground true
snapmirror show -schedule "-"
snapmirror show -state !Snapmirrored

Performance Health Check Commands

statistics show -object ?
statistics show -node ? -object ? -instance ?
statistics periodic
statistics periodic -object lif -node NODENAME -instance NODENAME:LIFNAME

::> system node run -node NODENAME
> sysstat -x 1
> sysstat -M 1
> stats
> stats start
> stats stop

No comments:

Post a Comment