Patch Upgrade Script for 6 Node Cluster from 8.2.X to 8.2.3P5

Catching up on some old stuff I was going to add to the blog. The below is an upgrade script created for upgrading a 6-Node NAS cluster from 8.2.2P1 to 8.2.3P5. It includes a fairly an*l set of checks and stuff.

Check out this video of Automated nondisruptive upgrade using System Manager 8.3.1 for what the latest version of cDOT can do.

################################################################

## CDOT PATCH UPGRADE: CLU1N1/2/3/4/5/6 from 8.2.X to 8.2.3P5 ##

################################################################

###########

## NOTES ##

###########

# 1) Verified there are no faults/issues in ASUP that require fixing.

# 2) Obtain and review the Upgrade Advisor output for upgrade from 8.2.X to 8.2.3P5

# 3) Connect to the SP for the HA pair you're upgrading so you can watch the upgrade process.

# 4) The commands below can be run connected to the cluster Mgmt LIF.

# 5) In this scenario CLU1 is only a SnapMirror source, so we should to update and quiesce snapmirrors on DR (CLU1DR), and quiesce snapmirrors on the vault (CLU1SV)

##############################################

## i: PREP 1 (DO BEFORE THE UPGRADE WINDOW) ##

##############################################

## CHECK THE EVENT LOGS:

event log show -severity CRITICAL -time >24h

event log show -severity EMERGENCY -time >24h

event log show -severity ALERT -time >24h

event log show -severity ERROR -time >24h

event log show -severity WARNING -time >24h

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

set admin

## REMOVE/REPLACE ANY BROKEN DISKS:

storage disk show -state broken

storage disk show -state maintenance|pending|reconstructing -fields state

# Wait for the any maintenance/reconstruction to complete!

## VERIFY NODE HEALTH, ELIGIBILITY AND EPSILON:

set advanced

cluster show

set admin

## CHECK FOR OFFLINE AGGREGATES:

storage aggregate show -state !online

## CHECK FOR OFFLINE VOLUMES:

volume show -state !online

## VERIFY DATA SVMS ARE RUNNING:

vserver show -admin-state !running -type data

## VERIFY CPU AND DISK UTIL NEVER EXCEEDS 50%:

node run -node * -command sysstat -c 10 -x 3

## VERIFY LIFS ARE HOME:

network interface show

network interface show -is-home false

## And revert if not ...

# network interface revert -vserver VSERVERNAME -lif LIFNAME

# network interface revert *

## VERIFY LIF FAILOVER (i.e. they failover to other data ports on the same network):

network interface show -role data -failover

## VERIFY DATA LIFS DON'T AUTO-REVERT:

network interface show -role data -field auto-revert

## VERIFY NTP CONFIGURATION, DATE AND TIME:

system services ntp server show

cluster date show

## VERIFY CURRENT SOFTWARE IMAGES:

system node image show

## VERIFY SFO IS ENABLED AND TAKEOVER IS POSSIBLE:

storage failover show -fields enabled,possible

###################################################################

## ii: PREP 2 (CAN DO WITH PERMISSION BEFORE THE UPGRADE WINDOW) ##

###################################################################

## DOWNLOAD THE SOFTWARE IMAGE TO CONTROLLERS:

system node image get -node * -package http://YOURWEBSERVER/823P5_q_image.tgz -replace-package true -background true

system node image show-update-progress -node *

# Note: You can Ctrl-C out of show-update-progress

## INSTALL THE DOWNLOADED SOFTWARE IMAGE:

system node image update -node * -package 823P5_q_image.tgz -node * -background true

system node image show-update-progress -node *

## VERIFY SOFTWARE IMAGES HAVE BEEN UPDATED:

system node image show

###################################

## iii: PREP 3 - SNAPMIRROR WORK ##

###################################

## CONNECT TO CLU1::>

## VERIFY AND QUIESCE SNAPMIRRORS ON CLU1 (IF THERE ARE ANY):

snapmirror show -healthy false -fields healthy,lag-time

snapmirror show -healthy true -fields healthy,lag-time

snapmirror show -status Quiesced

snapmirror quiesce *

snapmirror show -status !Quiesced

## CONNECT TO CLU1SV::>

## VERIFY AND QUIESCE SNAPMIRRORS ON CLU1SV (THE VAULT OF CLU1):

snapmirror show -healthy false -fields healthy,lag-time

snapmirror show -healthy true -fields healthy,lag-time

snapmirror show -status Quiesced

snapmirror quiesce *

snapmirror show -status !Quiesced

## CONNECT TO CLU1DR::>

## VERIFY, UPDATE, AND QUIESCE SNAPMIRRORS ON CLU1DR (THE DR OF CLU1)

snapmirror show -healthy false -fields healthy,lag-time

snapmirror show -healthy true -fields healthy,lag-time

snapmirror update *

snapmirror show -status !Idle

snapmirror show -healthy false -fields healthy,lag-time

snapmirror show -healthy true -fields healthy,lag-time

snapmirror show -status Quiesced

snapmirror quiesce *

snapmirror show -status !Quiesced

#############################################################################################

## NOTE: IF THERE ARE ANY LONG-RUNNING SNAPMIRRORS, USE THE FOLLOWING COMMAND TO ABORT THEM #

# snapmirror abort -destination-path DESTINATION -h #

#############################################################################################

################

## iv: PREP 4 ##

################

## CONNECT BACK TO CLU1::>

## JOBS:

job show

job delete *

job show

# Note: Jobs which don't support delete will not be deleted!

## SET THE NEW IMAGE AS DEFAULT:

system node image show

system image modify {-node * -iscurrent false} -isdefault true

system node image show

# Note: Verify the new image is set as default on all nodes!

#################################

## iii: UPGRADE: INITIAL STEPS ##

#################################

## HEALTH CHECKS INCLU1DING:

event log show -severity CRITICAL,EMERGENCY,ALERT,ERROR,WARNING -time >1h

## SEND ASUP:

autosupport invoke -node * -type all -message "Starting_NDU_823P5"

autosupport history show -last-update >1h -field status

# Proceed once all nodes have ASUP history status of "sent-successful"!

## DISABLE ASUP:

autosupport modify -node * -state disable

## IF NOT ALREADY DONE, DISABLE AUTO-GIVEBACK:

storage failover show -fields auto-giveback

storage failover modify -node * -auto-giveback false

############################

## NAMES IN THIS TEXTFILE ##

############################

# CONTROLLER PARTNER

####################

# CLU1N1 CLU1N2

# CLU1N2 CLU1N1

# CLU1N3 CLU1N4

# CLU1N4 CLU1N3

# CLU1N5 CLU1N6

# CLU1N6 CLU1N5

######################

## ORDER OF UPGRADE ##

######################

########################################################################################

# Using "::> set adv; cluster ring show" and "::> set adv; cluster show" #

# ... we have already determined that Epsilon is on N1, and N1 is also the RDB master. #

# So the order of upgrade will be: #

# N3,N4 (move Epsilon to N3) N5,N6 and finally N2,N1 #

########################################################################################

##################################

## A: UPGRADE: 1ST NODE: CLU1N3 ##

##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:

network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N3

network interface migrate-all -node CLU1N3

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:

network interface show -data-protocol nfs|cifs -role data -home-node CLU1N3

## TAKEOVER:

storage failover takeover -ofnode CLU1N3

# -override-vetoes true (if necessary -usually CIFS)

## CLU1N3

## ... boots up to the Waiting for giveback state

## Verify that the takeover was successful

storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:

storage failover giveback -ofnode CLU1N3

# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:

storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:

set advanced

system node upgrade-revert show -node CLU1N3

set admin

# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:

vserver services web show -name spi

## If the SPI isn't enabled for any vserver, then enable it

# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:

network interface show -is-home false

network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:

network interface show

network port show

system node run -node CLU1N3 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

cluster show

set admin

##################################

## B: UPGRADE: 2ND NODE: CLU1N4 ##

##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:

network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N4

network interface migrate-all -node CLU1N4

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:

network interface show -data-protocol nfs|cifs -role data -home-node CLU1N4

## TAKEOVER:

storage failover takeover -ofnode CLU1N4

# -option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5

# -override-vetoes true (if necessary - usually CIFS)

## CLU1N4

## ... boots up to the Waiting for giveback state

## Verify that the takeover was successful

storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:

storage failover giveback -ofnode CLU1N4

# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:

storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:

set advanced

system node upgrade-revert show -node CLU1N4

set admin

# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:

vserver services web show -name spi

## If the SPI isn't enabled for any vserver, then enable it

# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:

network interface show -is-home false

network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:

network interface show

network port show

system node run -node CLU1N4 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

cluster show

set admin

################

## C: EPSILON ##

################

## MOVE EPSILON TO THE 1ST HA PAIR THAT'S BEEN UPGRADED:

set advanced

clu modify -node CLU1N1 -epsilon false

clu modify -node CLU1N3 -epsilon true

cluster show

set admin

# IMPORTANT: VERIFY THAT EPSILON IS TRUE ON THE UPGRADED NODE (CLU1N3)

#############################################################

## N.B. For additional HA pairs we effectively repeat A, B ##

#############################################################

##################################

## A: UPGRADE: 3RD NODE: CLU1N5 ##

##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:

network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N5

network interface migrate-all -node CLU1N5

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:

network interface show -data-protocol nfs|cifs -role data -home-node CLU1N5

## TAKEOVER:

storage failover takeover -ofnode CLU1N5

# -override-vetoes true (if necessary - usually CIFS)

## CLU1N5

## ... boots up to the Waiting for giveback state

## Verify that the takeover was successful

storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:

storage failover giveback -ofnode CLU1N5

# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:

storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:

set advanced

system node upgrade-revert show -node CLU1N5

set admin

# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:

vserver services web show -name spi

## If the SPI isn't enabled for any vserver, then enable it

# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:

network interface show -is-home false

network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:

network interface show

network port show

system node run -node CLU1N5 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

cluster show

set admin

##################################

## B: UPGRADE: 4TH NODE: CLU1N6 ##

##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:

network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N6

network interface migrate-all -node CLU1N6

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:

network interface show -data-protocol nfs|cifs -role data -home-node CLU1N6

## TAKEOVER:

storage failover takeover -ofnode CLU1N6

# -option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5

# -override-vetoes true (if necessary - usually CIFS)

## CLU1N6

## ... boots up to the Waiting for giveback state

## Verify that the takeover was successful

storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:

storage failover giveback -ofnode CLU1N6

# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:

storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:

set advanced

system node upgrade-revert show -node CLU1N6

set admin

# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:

vserver services web show -name spi

## If the SPI isn't enabled for any vserver, then enable it

# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:

network interface show -is-home false

network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:

network interface show

network port show

system node run -node CLU1N6 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

cluster show

set admin

##################################

## A: UPGRADE: 5TH NODE: CLU1N2 ##

##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:

network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N2

network interface migrate-all -node CLU1N2

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:

network interface show -data-protocol nfs|cifs -role data -home-node CLU1N2

## TAKEOVER:

storage failover takeover -ofnode CLU1N2

# -override-vetoes true (if necessary - usually CIFS)

## CLU1N2

## ... boots up to the Waiting for giveback state

## Verify that the takeover was successful

storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:

storage failover giveback -ofnode CLU1N2

# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:

storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:

set advanced

system node upgrade-revert show -node CLU1N2

set admin

# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:

vserver services web show -name spi

## If the SPI isn't enabled for any vserver, then enable it

# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:

network interface show -is-home false

network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:

network interface show

network port show

system node run -node CLU1N2 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

cluster show

set admin

##################################

## B: UPGRADE: 6TH NODE: CLU1N1 ##

##################################

## MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:

network interface show -data-protocol nfs|cifs -role data -curr-node CLU1N1

network interface migrate-all -node CLU1N1

## VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:

network interface show -data-protocol nfs|cifs -role data -home-node CLU1N1

## TAKEOVER:

storage failover takeover -ofnode CLU1N1

# -option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5

# -override-vetoes true (if necessary - usually CIFS)

## CLU1N1

## ... boots up to the Waiting for giveback state

## Verify that the takeover was successful

storage failover show

## IMPORTANT: WAIT 8 MINUTES!

## GIVEBACK:

storage failover giveback -ofnode CLU1N1

# -override-vetoes true (if necessary - usually CIFS)

## VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:

storage failover show-giveback

## VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:

set advanced

system node upgrade-revert show -node CLU1N1

set admin

# Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps

## (???) RE-ENABLE THE SPI WEB SERVICE:

vserver services web show -name spi

## If the SPI isn't enabled for any vserver, then enable it

# vserver services web modify -name spi -enabled true -vserver *

## REVERT THE LIFS BACK TO THE NODE:

network interface show -is-home false

network interface revert *

## VERIFY THAT DATA IS BEING SERVED TO CLIENTS:

network interface show

network port show

system node run -node CLU1N1 -command uptime

## VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:

set advanced

cluster ring show -unitname vldb

cluster ring show -unitname mgmt

cluster ring show -unitname vifmgr

cluster ring show -unitname bcomd

cluster show

set admin

#####################

## X: POST UPGRADE ##

#####################

## CHECK VERSION:

version

## VERIFY PROTOCOL SERVICE:

vserver nfs show

vserver cifs show

## RE-ENABLE AND SEND ASUP:

autosupport modify -node * -state enable

autosupport show

autosupport invoke -node * -type all -message "Finished_NDU_823P5"

autosupport history show -last-update >1h -field status -status sent-successful

# Want to see ASUP history status of "sent-successful"!

## RESUME QUIESCED SNAPMIRRORS ON CLU1DR::>

snapmirror show

snapmirror resume *

snapmirror show -status Quiesced

snapmirror show -healthy False

## RESUME QUIESCED SNAPMIRRORS ON CLU1SV::>

snapmirror show

snapmirror resume *

snapmirror show -status Quiesced

snapmirror show -healthy False

## RESUME QUIESCED SNAPMIRRORS ON CLU1::>

snapmirror show

snapmirror resume *

snapmirror show -status Quiesced

snapmirror show -healthy False

## HEALTH CHECKS INCLUDING:

event log show -severity CRITICAL,EMERGENCY,ALERT,ERROR,WARNING -time >1h

Cosonok's IT Blog

Search This Blog

Patch Upgrade Script for 6 Node Cluster from 8.2.X to 8.2.3P5

Comments

Post a Comment