Wednesday, 29 January 2014

Clustered ONTAP Daily Health Checks Script

The following post is part based on the earlier post from 1st January: Clustered ONTAP Storage Admins' Health Checks. Here we present a few Clustered ONTAP Storage Admins daily health checks (really there’s too much for a daily checks list …) Please feel free to modify as you see fit. Some of the commands quite nicely display the power of the Clustershell CLI.

Note:

## Two hashes or more is a comment
# One hash you can unhash (for Data ONTAP 8.2+/ where the command needs a date/ where the command needs the local cluster name)

##############################
## CDOT DAILY CHECKS SCRIPT ##
##############################

rows 0
set diag
y

###########################
## Analyze The Event Log ##
###########################

event log show -severity emergency
event log show -severity alert
event log show -severity critical
event log show -severity error
event log show -severity warning
## example for the last 24 hours
# event log show -time "01/21/2014 09:00:00".."01/22/2014 09:00:00" -severity !informational,!notice,!debug

#############################
## Display Some Dashboards ##
#############################

dashboard alarm show
dashboard performance show

####################
## Cluster Checks ##
####################

cluster show
storage failover show
## 2-Node Clusters
# cluster ha show
date
## CDOT 8.2+
# cluster date show

###############################################
## License Checks (not really a daily check) ##
###############################################

system license show -fields expiration-date

#################
## Node Checks ##
#################

node show -fields health
system health alert show -fields indication-time
## ... and if they're old alerts you can delete them with
# system health alert delete -node * -monitor * -alert-id * -alerting-resource *
system node run -node * -command fru_led status

###############################################
## NDMPD check for jobs running and snapshot ##
###############################################

ndmpd status -fields data-state,data-operation,mover-state,mover-mode
snapshot show -snapshot snapshot_for_backup.* -fields create-time

########################
## Autosupport checks ##
########################

system node autosupport show -state !enable
system node autosupport history show -status !ignore -fields status,last-update

###############################
## Aggregate and Disk Checks ##
###############################

storage aggregate show -state !online
storage aggregate show -aggregate * -percent-used >75
storage aggregate show -aggregate * -raidstatus !”raid_dp,normal”
storage disk show -state broken
storage disk show -container unassigned
sto disk show -container-type aggregate -average-latency > 20 -fields average-latency,aggregate

###################
## Volume Checks ##
###################

vol show -state !online
vol show -vserver * -volume * -percent-used >79 -fields state,size,available,percent-used,space-guarantee -type RW
vol show -vserver * -volume * -percent-used <33 -fields state,size,available,percent-used,space-guarantee -type RW 
vol show -snapshot-policy none -type RW -fields volume,size,available,used
vol show -snapshot-space-used > 99 -type RW -fields percent-snapshot-space,snapshot-space-used
vol show -space-guarantee volume -type RW -fields volume,size,available,used
vol show -is-sis-logging-enabled true -type RW -fields volume,sis-space-saved-percent
vol show -is-sis-logging-enabled false -type RW -volume !vol0 -fields volume,size,used
df -i -vserver * -volume * -percent-inodes-used >79
vol efficiency show -fields progress,schedule,policy,last-op-end,state

#####################
## Snapshot Checks ##
#####################

## CDOT 8.2+
# vol show -snapshot-count 0
## CDOT 8.2+
# vol show -snapshot-count > 200
## CHANGE THE DATE - use http://www.timeanddate.com/date/dateadd.html
# vol snap show -create-time <"Wed Oct 09 00:00:00 2013" -fields state,size,create-time,owners
vol snap show -snap !hourly.*,!weekly.*,!daily.*,!snapmirror.*,!*smvi*,!eloginfo*,!exchsnap* -fields state,size,create-time,owners

####################
## Network Checks ##
####################

net port show -link !up
net int show -is-home false
net int show -status-oper !up

################
## SAN Checks ##
################

lun show -mapped unmapped -lun !*rws,!*aux
lun show -lun *.rws # Example - SMBR flexclones
lun show -lun *.aux # Example - failed SME jobs
fcp int show -status-oper !up
iscsi int show -status-oper !up

#######################
## SnapMirror Checks ##
#######################

snapmirror show -healthy false
snapmirror show -status !Idle
snapmirror show -state !snapmirrored
## CHANGE THE LOCAL to CLUSTER you're running the command from
## ... for the snapmirror command below since schedule displays on destination cluster only!
# snapmirror show -schedule "-" -fields state,status -source-cluster !LOCAL
## Compare the following two outputs, should roughly have same number of not RW vols as snapmirrors to this cluster
# snapmirror show -destination-cluster LOCAL -fields destination-volume
# vol show -type !RW


4 comments:

  1. am new into cdot, its very usefull. Thank you!!

    Also am looking cdot-administration commands and performance and cdot-ontap upgrade cmds. (solaipoovan@yahoo.com)

    I appreciate your help!

    ReplyDelete
  2. Hi. First of all you have really great blog. Thanks for your posts.

    Can you explain me syntaxis of this row line "vol show -vserver * -volume * -percent-used <33 -fields="" -type="" available="" o:p="" percent-used="" rw="" size="" space-guarantee="" state="">"
    It's not clear for me.

    ReplyDelete
    Replies
    1. Hello Smasher, afraid it was blogger not liking the less than sign. Should be corrected now. Cheers, vCosonok

      Delete