Patch Upgrade Script for 6 Node Cluster from 8.2.X to 8.2.3P5

Catching up on some old stuff I was going to add to the blog. The below is an upgrade script created for upgrading a 6-Node NAS cluster from 8.2.2P1 to 8.2.3P5. It includes a fairly an*l set of checks and stuff.

Check out this video of Automated nondisruptive upgrade using System Manager 8.3.1 for what the latest version of cDOT can do.

################################################################
## CDOT PATCH UPGRADE: CLU1N1/2/3/4/5/6 from 8.2.X to 8.2.3P5 ##
################################################################

###########
## NOTES ##
###########

# 1) Verified there are no faults/issues in ASUP that require fixing.
# 2) Obtain and review the Upgrade Advisor output for upgrade from 8.2.X to 8.2.3P5
# 3) Connect to the SP for the HA pair you're upgrading so you can watch the upgrade process.
# 4) The commands below can be run connected to the cluster Mgmt LIF.
# 5) In this scenario CLU1 is only a SnapMirror source, so we should to update and quiesce snapmirrors on DR (CLU1DR), and quiesce snapmirrors on the vault (CLU1SV)

##############################################
## i: PREP 1 (DO BEFORE THE UPGRADE WINDOW) ##
##############################################

## CHECK THE EVENT LOGS:
event log show -severity CRITICAL -time >24h
event log show -severity EMERGENCY -time >24h
event log show -severity ALERT -time >24h
event log show -severity ERROR -time >24h
event log show -severity WARNING -time >24h

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
set admin

## REMOVE/REPLACE ANY BROKEN DISKS:
storage disk show -state broken
storage disk show -state maintenance|pending|reconstructing -fields state
# Wait for the any maintenance/reconstruction to complete!

## VERIFY NODE HEALTH, ELIGIBILITY AND EPSILON:
set advanced
y
cluster show
set admin

## CHECK FOR OFFLINE AGGREGATES:
storage aggregate show -state !online

## CHECK FOR OFFLINE VOLUMES:
volume show -state !online

## VERIFY DATA SVMS ARE RUNNING:
vserver show -admin-state !running -type data

## VERIFY CPU AND DISK UTIL NEVER EXCEEDS 50%:
node run -node * -command sysstat -c 10 -x 3

## VERIFY LIFS ARE HOME:
network interface show
network interface show -is-home false

## And revert if not ...
# network interface revert -vserver VSERVERNAME -lif LIFNAME
# network interface revert *

## VERIFY LIF FAILOVER (i.e. they failover to other data ports on the same network):
network interface show -role data -failover

## VERIFY DATA LIFS DON'T AUTO-REVERT:
network interface show -role data -field auto-revert

## VERIFY NTP CONFIGURATION, DATE AND TIME:
system services ntp server show
cluster date show

## VERIFY CURRENT SOFTWARE IMAGES:
system node image show

## VERIFY SFO IS ENABLED AND TAKEOVER IS POSSIBLE:
storage failover show -fields enabled,possible

###################################################################
## ii: PREP 2 (CAN DO WITH PERMISSION BEFORE THE UPGRADE WINDOW) ##
###################################################################

## DOWNLOAD THE SOFTWARE IMAGE TO CONTROLLERS:
system node image get -node * -package http://YOURWEBSERVER/823P5_q_image.tgz -replace-package true -background true
system node image show-update-progress -node *
# Note: You can Ctrl-C out of show-update-progress

## INSTALL THE DOWNLOADED SOFTWARE IMAGE:
system node image update -node * -package 823P5_q_image.tgz -node * -background true
system node image show-update-progress -node *

## VERIFY SOFTWARE IMAGES HAVE BEEN UPDATED:
system node image show

###################################
## iii: PREP 3 - SNAPMIRROR WORK ##
###################################

## CONNECT TO CLU1::>
## VERIFY AND QUIESCE SNAPMIRRORS ON CLU1 (IF THERE ARE ANY):
snapmirror show -healthy false -fields healthy,lag-time
snapmirror show -healthy true -fields healthy,lag-time
snapmirror show -status Quiesced
snapmirror quiesce *
snapmirror show -status !Quiesced

## CONNECT TO CLU1SV::>
## VERIFY AND QUIESCE SNAPMIRRORS ON CLU1SV (THE VAULT OF CLU1):
snapmirror show -healthy false -fields healthy,lag-time
snapmirror show -healthy true -fields healthy,lag-time
snapmirror show -status Quiesced
snapmirror quiesce *
snapmirror show -status !Quiesced

## CONNECT TO CLU1DR::>
## VERIFY, UPDATE, AND QUIESCE SNAPMIRRORS ON CLU1DR (THE DR OF CLU1)
snapmirror show -healthy false -fields healthy,lag-time
snapmirror show -healthy true -fields healthy,lag-time
snapmirror update *
snapmirror show -status !Idle
snapmirror show -healthy false -fields healthy,lag-time
snapmirror show -healthy true -fields healthy,lag-time
snapmirror show -status Quiesced
snapmirror quiesce *
snapmirror show -status !Quiesced

#############################################################################################
## NOTE: IF THERE ARE ANY LONG-RUNNING SNAPMIRRORS, USE THE FOLLOWING COMMAND TO ABORT THEM #
# snapmirror abort -destination-path DESTINATION -h                                         #
#############################################################################################

################
## iv: PREP 4 ##
################

## CONNECT BACK TO CLU1::>

## JOBS:
job show
job delete *
job show
# Note: Jobs which don't support delete will not be deleted!

## SET THE NEW IMAGE AS DEFAULT:
system node image show
system image modify {-node * -iscurrent false} -isdefault true
system node image show
# Note: Verify the new image is set as default on all nodes!

#################################
## iii: UPGRADE: INITIAL STEPS ##
#################################

## HEALTH CHECKS INCLU1DING:
event log show -severity CRITICAL,EMERGENCY,ALERT,ERROR,WARNING -time >1h

## SEND ASUP:
autosupport invoke -node * -type all -message "Starting_NDU_823P5"
autosupport history show -last-update >1h -field status
# Proceed once all nodes have ASUP history status of "sent-successful"!

## DISABLE ASUP:
autosupport modify -node * -state disable

## IF NOT ALREADY DONE, DISABLE AUTO-GIVEBACK:
storage failover show -fields auto-giveback
storage failover modify -node * -auto-giveback false

############################
## NAMES IN THIS TEXTFILE ##
############################

# CONTROLLER PARTNER
####################
# CLU1N1     CLU1N2
# CLU1N2     CLU1N1
# CLU1N3     CLU1N4
# CLU1N4     CLU1N3
# CLU1N5     CLU1N6
# CLU1N6     CLU1N5

######################
## ORDER OF UPGRADE ##
######################

########################################################################################
# Using "::> set adv; cluster ring show" and "::> set adv; cluster show"               #
# ... we have already determined that Epsilon is on N1, and N1 is also the RDB master. #
# So the order of upgrade will be:                                                     #
# N3,N4 (move Epsilon to N3) N5,N6 and finally N2,N1                                   #
########################################################################################

##################################
## A: UPGRADE: 1ST NODE: CLU1N3 ##
##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N3
network interface migrate-all -node CLU1N3

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE: 
network interface show -data-protocol nfs|cifs -role data -home-node CLU1N3

## TAKEOVER:
storage failover takeover -ofnode CLU1N3
# -override-vetoes true (if necessary -usually CIFS)

## CLU1N3
## ... boots up to the Waiting for giveback state
## Verify that the takeover was successful
storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:
storage failover giveback -ofnode CLU1N3
# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set advanced
y
system node upgrade-revert show -node CLU1N3
set admin
# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:
vserver services web show -name spi
## If the SPI isn't enabled for any vserver, then enable it
# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:
network interface show -is-home false
network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network interface show
network port show
system node run -node CLU1N3 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
cluster show
set admin

##################################
## B: UPGRADE: 2ND NODE: CLU1N4 ##
##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N4
network interface migrate-all -node CLU1N4

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network interface show -data-protocol nfs|cifs -role data -home-node CLU1N4

## TAKEOVER:
storage failover takeover -ofnode CLU1N4
# -option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5
# -override-vetoes true (if necessary - usually CIFS)

## CLU1N4
## ... boots up to the Waiting for giveback state
## Verify that the takeover was successful
storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:
storage failover giveback -ofnode CLU1N4
# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set advanced
y
system node upgrade-revert show -node CLU1N4
set admin
# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:
vserver services web show -name spi
## If the SPI isn't enabled for any vserver, then enable it
# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:
network interface show -is-home false
network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS: 
network interface show
network port show
system node run -node CLU1N4 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
cluster show
set admin

################
## C: EPSILON ##
################

## MOVE EPSILON TO THE 1ST HA PAIR THAT'S BEEN UPGRADED:
set advanced
y
clu modify -node CLU1N1 -epsilon false
clu modify -node CLU1N3 -epsilon true
cluster show
set admin
# IMPORTANT: VERIFY THAT EPSILON IS TRUE ON THE UPGRADED NODE (CLU1N3)

#############################################################
## N.B. For additional HA pairs we effectively repeat A, B ##
#############################################################

##################################
## A: UPGRADE: 3RD NODE: CLU1N5 ##
##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N5
network interface migrate-all -node CLU1N5

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network interface show -data-protocol nfs|cifs -role data -home-node CLU1N5

## TAKEOVER:
storage failover takeover -ofnode CLU1N5
# -override-vetoes true (if necessary - usually CIFS)

## CLU1N5
## ... boots up to the Waiting for giveback state
## Verify that the takeover was successful
storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:
storage failover giveback -ofnode CLU1N5
# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set advanced
y
system node upgrade-revert show -node CLU1N5
set admin
# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:
vserver services web show -name spi
## If the SPI isn't enabled for any vserver, then enable it
# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:
network interface show -is-home false
network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network interface show
network port show
system node run -node CLU1N5 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
cluster show
set admin

##################################
## B: UPGRADE: 4TH NODE: CLU1N6 ##
##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N6
network interface migrate-all -node CLU1N6

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network interface show -data-protocol nfs|cifs -role data -home-node CLU1N6

## TAKEOVER:
storage failover takeover -ofnode CLU1N6
# -option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5
# -override-vetoes true (if necessary - usually CIFS)

## CLU1N6
## ... boots up to the Waiting for giveback state
## Verify that the takeover was successful
storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:
storage failover giveback -ofnode CLU1N6
# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set advanced
y
system node upgrade-revert show -node CLU1N6
set admin
# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:
vserver services web show -name spi
## If the SPI isn't enabled for any vserver, then enable it
# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:
network interface show -is-home false
network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network interface show
network port show
system node run -node CLU1N6 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
cluster show
set admin

##################################
## A: UPGRADE: 5TH NODE: CLU1N2 ##
##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N2
network interface migrate-all -node CLU1N2

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network interface show -data-protocol nfs|cifs -role data -home-node CLU1N2

## TAKEOVER:
storage failover takeover -ofnode CLU1N2
# -override-vetoes true (if necessary - usually CIFS)

## CLU1N2
## ... boots up to the Waiting for giveback state
## Verify that the takeover was successful
storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:
storage failover giveback -ofnode CLU1N2
# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set advanced
y
system node upgrade-revert show -node CLU1N2
set admin
# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:
vserver services web show -name spi
## If the SPI isn't enabled for any vserver, then enable it
# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:
network interface show -is-home false
network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network interface show
network port show
system node run -node CLU1N2 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
cluster show
set admin

##################################
## B: UPGRADE: 6TH NODE: CLU1N1 ##
##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N1
network interface migrate-all -node CLU1N1

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network interface show -data-protocol nfs|cifs -role data -home-node CLU1N1

## TAKEOVER:
storage failover takeover -ofnode CLU1N1
# -option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5
# -override-vetoes true (if necessary - usually CIFS)

## CLU1N1
## ... boots up to the Waiting for giveback state
## Verify that the takeover was successful
storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:
storage failover giveback -ofnode CLU1N1
# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set advanced
y
system node upgrade-revert show -node CLU1N1
set admin
# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:
vserver services web show -name spi
## If the SPI isn't enabled for any vserver, then enable it
# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:
network interface show -is-home false
network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network interface show
network port show
system node run -node CLU1N1 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set advanced
y
cluster ring show -unitname vldb
cluster ring show -unitname mgmt
cluster ring show -unitname vifmgr
cluster ring show -unitname bcomd
cluster show
set admin

#####################
## X: POST UPGRADE ##
#####################

## CHECK VERSION:
version

## VERIFY PROTOCOL SERVICE:
vserver nfs show
vserver cifs show

## RE-ENABLE AND SEND ASUP:
autosupport modify -node * -state enable
autosupport show
autosupport invoke -node * -type all -message "Finished_NDU_823P5"
autosupport history show -last-update >1h -field status -status sent-successful
# Want to see ASUP history status of "sent-successful"!

## RESUME QUIESCED SNAPMIRRORS ON CLU1DR::>
snapmirror show
snapmirror resume *
snapmirror show -status Quiesced
snapmirror show -healthy False

## RESUME QUIESCED SNAPMIRRORS ON CLU1SV::>
snapmirror show
snapmirror resume *
snapmirror show -status Quiesced
snapmirror show -healthy False

## RESUME QUIESCED SNAPMIRRORS ON CLU1::>
snapmirror show
snapmirror resume *
snapmirror show -status Quiesced
snapmirror show -healthy False

## HEALTH CHECKS INCLUDING:
event log show -severity CRITICAL,EMERGENCY,ALERT,ERROR,WARNING -time >1h

Comments