Catching up on some old stuff I was going to add to the
blog. The below is an upgrade script created for upgrading a 6-Node NAS cluster
from 8.2.2P1 to 8.2.3P5. It includes a fairly an*l set of checks and stuff.
Check out this video of Automated nondisruptive
upgrade using System Manager 8.3.1 for what the latest version of cDOT can
do.
################################################################
##
CDOT PATCH UPGRADE: CLU1N1/2/3/4/5/6 from 8.2.X to 8.2.3P5 ##
################################################################
###########
##
NOTES ##
###########
#
1) Verified there are no faults/issues in ASUP that require fixing.
#
2) Obtain and review the Upgrade Advisor output for upgrade from 8.2.X to
8.2.3P5
#
3) Connect to the SP for the HA pair you're upgrading so you can watch the upgrade
process.
#
4) The commands below can be run connected to the cluster Mgmt LIF.
#
5) In this scenario CLU1 is only a SnapMirror source, so we should to update
and quiesce snapmirrors on DR (CLU1DR), and quiesce snapmirrors on the vault
(CLU1SV)
##############################################
##
i: PREP 1 (DO BEFORE THE UPGRADE WINDOW) ##
##############################################
##
CHECK THE EVENT LOGS:
event
log show -severity CRITICAL -time >24h
event
log show -severity EMERGENCY -time >24h
event
log show -severity ALERT -time >24h
event
log show -severity ERROR -time >24h
event
log show -severity WARNING -time >24h
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
set
admin
##
REMOVE/REPLACE ANY BROKEN DISKS:
storage
disk show -state broken
storage
disk show -state maintenance|pending|reconstructing -fields state
#
Wait for the any maintenance/reconstruction to complete!
##
VERIFY NODE HEALTH, ELIGIBILITY AND EPSILON:
set
advanced
y
cluster
show
set
admin
##
CHECK FOR OFFLINE AGGREGATES:
storage
aggregate show -state !online
##
CHECK FOR OFFLINE VOLUMES:
volume
show -state !online
##
VERIFY DATA SVMS ARE RUNNING:
vserver
show -admin-state !running -type data
##
VERIFY CPU AND DISK UTIL NEVER EXCEEDS 50%:
node
run -node * -command sysstat -c 10 -x 3
##
VERIFY LIFS ARE HOME:
network
interface show
network
interface show -is-home false
##
And revert if not ...
#
network interface revert -vserver VSERVERNAME -lif LIFNAME
#
network interface revert *
##
VERIFY LIF FAILOVER (i.e. they failover to other data ports on the same
network):
network
interface show -role data -failover
##
VERIFY DATA LIFS DON'T AUTO-REVERT:
network
interface show -role data -field auto-revert
##
VERIFY NTP CONFIGURATION, DATE AND TIME:
system
services ntp server show
cluster
date show
##
VERIFY CURRENT SOFTWARE IMAGES:
system
node image show
##
VERIFY SFO IS ENABLED AND TAKEOVER IS POSSIBLE:
storage
failover show -fields enabled,possible
###################################################################
##
ii: PREP 2 (CAN DO WITH PERMISSION BEFORE THE UPGRADE WINDOW) ##
###################################################################
##
DOWNLOAD THE SOFTWARE IMAGE TO CONTROLLERS:
system
node image get -node * -package http://YOURWEBSERVER/823P5_q_image.tgz
-replace-package true -background true
system
node image show-update-progress -node *
#
Note: You can Ctrl-C out of show-update-progress
##
INSTALL THE DOWNLOADED SOFTWARE IMAGE:
system
node image update -node * -package 823P5_q_image.tgz -node * -background true
system
node image show-update-progress -node *
##
VERIFY SOFTWARE IMAGES HAVE BEEN UPDATED:
system
node image show
###################################
##
iii: PREP 3 - SNAPMIRROR WORK ##
###################################
##
CONNECT TO CLU1::>
##
VERIFY AND QUIESCE SNAPMIRRORS ON CLU1 (IF THERE ARE ANY):
snapmirror
show -healthy false -fields healthy,lag-time
snapmirror
show -healthy true -fields healthy,lag-time
snapmirror
show -status Quiesced
snapmirror
quiesce *
snapmirror
show -status !Quiesced
##
CONNECT TO CLU1SV::>
##
VERIFY AND QUIESCE SNAPMIRRORS ON CLU1SV (THE VAULT OF CLU1):
snapmirror
show -healthy false -fields healthy,lag-time
snapmirror
show -healthy true -fields healthy,lag-time
snapmirror
show -status Quiesced
snapmirror
quiesce *
snapmirror
show -status !Quiesced
##
CONNECT TO CLU1DR::>
##
VERIFY, UPDATE, AND QUIESCE SNAPMIRRORS ON CLU1DR (THE DR OF CLU1)
snapmirror
show -healthy false -fields healthy,lag-time
snapmirror
show -healthy true -fields healthy,lag-time
snapmirror
update *
snapmirror
show -status !Idle
snapmirror
show -healthy false -fields healthy,lag-time
snapmirror
show -healthy true -fields healthy,lag-time
snapmirror
show -status Quiesced
snapmirror
quiesce *
snapmirror
show -status !Quiesced
#############################################################################################
##
NOTE: IF THERE ARE ANY LONG-RUNNING SNAPMIRRORS, USE THE FOLLOWING COMMAND TO
ABORT THEM #
#
snapmirror abort -destination-path DESTINATION -h #
#############################################################################################
################
##
iv: PREP 4 ##
################
##
CONNECT BACK TO CLU1::>
##
JOBS:
job
show
job
delete *
job
show
#
Note: Jobs which don't support delete will not be deleted!
##
SET THE NEW IMAGE AS DEFAULT:
system
node image show
system
image modify {-node * -iscurrent false} -isdefault true
system
node image show
#
Note: Verify the new image is set as default on all nodes!
#################################
##
iii: UPGRADE: INITIAL STEPS ##
#################################
##
HEALTH CHECKS INCLU1DING:
event
log show -severity CRITICAL,EMERGENCY,ALERT,ERROR,WARNING -time >1h
##
SEND ASUP:
autosupport
invoke -node * -type all -message "Starting_NDU_823P5"
autosupport
history show -last-update >1h -field status
#
Proceed once all nodes have ASUP history status of "sent-successful"!
##
DISABLE ASUP:
autosupport
modify -node * -state disable
##
IF NOT ALREADY DONE, DISABLE AUTO-GIVEBACK:
storage
failover show -fields auto-giveback
storage
failover modify -node * -auto-giveback false
############################
##
NAMES IN THIS TEXTFILE ##
############################
#
CONTROLLER PARTNER
####################
#
CLU1N1 CLU1N2
#
CLU1N2 CLU1N1
#
CLU1N3 CLU1N4
#
CLU1N4 CLU1N3
#
CLU1N5 CLU1N6
#
CLU1N6 CLU1N5
######################
##
ORDER OF UPGRADE ##
######################
########################################################################################
#
Using "::> set adv; cluster ring show" and "::> set adv;
cluster show" #
#
... we have already determined that Epsilon is on N1, and N1 is also the RDB
master. #
#
So the order of upgrade will be:
#
#
N3,N4 (move Epsilon to N3) N5,N6 and finally N2,N1 #
########################################################################################
##################################
##
A: UPGRADE: 1ST NODE: CLU1N3 ##
##################################
##
MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network
interface show -data-protocol nfs|cifs -role data -curr-node CLU1N3
network
interface migrate-all -node CLU1N3
##
VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network
interface show -data-protocol nfs|cifs -role data -home-node CLU1N3
##
TAKEOVER:
storage
failover takeover -ofnode CLU1N3
#
-override-vetoes true (if necessary -usually CIFS)
##
CLU1N3
##
... boots up to the Waiting for giveback state
##
Verify that the takeover was successful
storage
failover show
##
IMPORTANT: WAIT 8 MINUTES!
##
GIVEBACK:
storage
failover giveback -ofnode CLU1N3
#
-override-vetoes true (if necessary - usually CIFS)
##
VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage
failover show-giveback
##
VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set
advanced
y
system
node upgrade-revert show -node CLU1N3
set
admin
#
Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps
##
(???) RE-ENABLE THE SPI WEB SERVICE:
vserver
services web show -name spi
##
If the SPI isn't enabled for any vserver, then enable it
#
vserver services web modify -name spi -enabled true -vserver *
##
REVERT THE LIFS BACK TO THE NODE:
network
interface show -is-home false
network
interface revert *
##
VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network
interface show
network
port show
system
node run -node CLU1N3 -command uptime
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
cluster
show
set
admin
##################################
##
B: UPGRADE: 2ND NODE: CLU1N4 ##
##################################
##
MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network
interface show -data-protocol nfs|cifs -role data -curr-node CLU1N4
network
interface migrate-all -node CLU1N4
##
VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network
interface show -data-protocol nfs|cifs -role data -home-node CLU1N4
##
TAKEOVER:
storage
failover takeover -ofnode CLU1N4
#
-option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5
#
-override-vetoes true (if necessary - usually CIFS)
##
CLU1N4
##
... boots up to the Waiting for giveback state
##
Verify that the takeover was successful
storage
failover show
##
IMPORTANT: WAIT 8 MINUTES!
##
GIVEBACK:
storage
failover giveback -ofnode CLU1N4
#
-override-vetoes true (if necessary - usually CIFS)
##
VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage
failover show-giveback
##
VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set
advanced
y
system
node upgrade-revert show -node CLU1N4
set
admin
#
Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps
##
(???) RE-ENABLE THE SPI WEB SERVICE:
vserver
services web show -name spi
##
If the SPI isn't enabled for any vserver, then enable it
#
vserver services web modify -name spi -enabled true -vserver *
##
REVERT THE LIFS BACK TO THE NODE:
network
interface show -is-home false
network
interface revert *
##
VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network
interface show
network
port show
system
node run -node CLU1N4 -command uptime
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
cluster
show
set
admin
################
##
C: EPSILON ##
################
##
MOVE EPSILON TO THE 1ST HA PAIR THAT'S BEEN UPGRADED:
set
advanced
y
clu
modify -node CLU1N1 -epsilon false
clu
modify -node CLU1N3 -epsilon true
cluster
show
set
admin
#
IMPORTANT: VERIFY THAT EPSILON IS TRUE ON THE UPGRADED NODE (CLU1N3)
#############################################################
##
N.B. For additional HA pairs we effectively repeat A, B ##
#############################################################
##################################
##
A: UPGRADE: 3RD NODE: CLU1N5 ##
##################################
##
MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network
interface show -data-protocol nfs|cifs -role data -curr-node CLU1N5
network
interface migrate-all -node CLU1N5
##
VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network
interface show -data-protocol nfs|cifs -role data -home-node CLU1N5
##
TAKEOVER:
storage
failover takeover -ofnode CLU1N5
#
-override-vetoes true (if necessary - usually CIFS)
##
CLU1N5
##
... boots up to the Waiting for giveback state
##
Verify that the takeover was successful
storage
failover show
##
IMPORTANT: WAIT 8 MINUTES!
##
GIVEBACK:
storage
failover giveback -ofnode CLU1N5
#
-override-vetoes true (if necessary - usually CIFS)
##
VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage
failover show-giveback
##
VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set
advanced
y
system
node upgrade-revert show -node CLU1N5
set
admin
#
Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps
##
(???) RE-ENABLE THE SPI WEB SERVICE:
vserver
services web show -name spi
##
If the SPI isn't enabled for any vserver, then enable it
#
vserver services web modify -name spi -enabled true -vserver *
##
REVERT THE LIFS BACK TO THE NODE:
network
interface show -is-home false
network
interface revert *
##
VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network
interface show
network
port show
system
node run -node CLU1N5 -command uptime
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
cluster
show
set
admin
##################################
##
B: UPGRADE: 4TH NODE: CLU1N6 ##
##################################
##
MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network
interface show -data-protocol nfs|cifs -role data -curr-node CLU1N6
network
interface migrate-all -node CLU1N6
##
VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network
interface show -data-protocol nfs|cifs -role data -home-node CLU1N6
##
TAKEOVER:
storage
failover takeover -ofnode CLU1N6
#
-option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5
#
-override-vetoes true (if necessary - usually CIFS)
##
CLU1N6
##
... boots up to the Waiting for giveback state
##
Verify that the takeover was successful
storage
failover show
##
IMPORTANT: WAIT 8 MINUTES!
##
GIVEBACK:
storage
failover giveback -ofnode CLU1N6
#
-override-vetoes true (if necessary - usually CIFS)
##
VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage
failover show-giveback
##
VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set
advanced
y
system
node upgrade-revert show -node CLU1N6
set
admin
#
Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps
##
(???) RE-ENABLE THE SPI WEB SERVICE:
vserver
services web show -name spi
##
If the SPI isn't enabled for any vserver, then enable it
#
vserver services web modify -name spi -enabled true -vserver *
##
REVERT THE LIFS BACK TO THE NODE:
network
interface show -is-home false
network
interface revert *
##
VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network
interface show
network
port show
system
node run -node CLU1N6 -command uptime
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
cluster
show
set
admin
##################################
##
A: UPGRADE: 5TH NODE: CLU1N2 ##
##################################
##
MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network
interface show -data-protocol nfs|cifs -role data -curr-node CLU1N2
network
interface migrate-all -node CLU1N2
##
VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network
interface show -data-protocol nfs|cifs -role data -home-node CLU1N2
##
TAKEOVER:
storage
failover takeover -ofnode CLU1N2
#
-override-vetoes true (if necessary - usually CIFS)
##
CLU1N2
##
... boots up to the Waiting for giveback state
##
Verify that the takeover was successful
storage
failover show
##
IMPORTANT: WAIT 8 MINUTES!
##
GIVEBACK:
storage
failover giveback -ofnode CLU1N2
#
-override-vetoes true (if necessary - usually CIFS)
##
VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage
failover show-giveback
##
VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set
advanced
y
system
node upgrade-revert show -node CLU1N2
set
admin
#
Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps
##
(???) RE-ENABLE THE SPI WEB SERVICE:
vserver
services web show -name spi
##
If the SPI isn't enabled for any vserver, then enable it
#
vserver services web modify -name spi -enabled true -vserver *
##
REVERT THE LIFS BACK TO THE NODE:
network
interface show -is-home false
network
interface revert *
##
VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network
interface show
network
port show
system
node run -node CLU1N2 -command uptime
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
cluster
show
set
admin
##################################
##
B: UPGRADE: 6TH NODE: CLU1N1 ##
##################################
##
MIGRATE LIFS AWAY FROM THE NODE TO BE UPGRADED:
network
interface show -data-protocol nfs|cifs -role data -curr-node CLU1N1
network
interface migrate-all -node CLU1N1
##
VERIFY THAT THE LIFS MIGRATED TO ANOTHER NODE:
network
interface show -data-protocol nfs|cifs -role data -home-node CLU1N1
##
TAKEOVER:
storage
failover takeover -ofnode CLU1N1
#
-option allow-version-mismatch is not required 8.2.2P1 to 8.2.3P5
#
-override-vetoes true (if necessary - usually CIFS)
##
CLU1N1
##
... boots up to the Waiting for giveback state
##
Verify that the takeover was successful
storage
failover show
##
IMPORTANT: WAIT 8 MINUTES!
##
GIVEBACK:
storage
failover giveback -ofnode CLU1N1
#
-override-vetoes true (if necessary - usually CIFS)
##
VERIFY THAT ALL AGGREGATES HAVE BEEN RETURNED:
storage
failover show-giveback
##
VERIFY THE 3 UPGRADE PHASES HAVE COMPLETED:
set
advanced
y
system
node upgrade-revert show -node CLU1N1
set
admin
#
Note: Each "vers" has 3 phases - pre-root, pre-apps, post-apps
##
(???) RE-ENABLE THE SPI WEB SERVICE:
vserver
services web show -name spi
##
If the SPI isn't enabled for any vserver, then enable it
#
vserver services web modify -name spi -enabled true -vserver *
##
REVERT THE LIFS BACK TO THE NODE:
network
interface show -is-home false
network
interface revert *
##
VERIFY THAT DATA IS BEING SERVED TO CLIENTS:
network
interface show
network
port show
system
node run -node CLU1N1 -command uptime
##
VERIFY RDBs ARE ONLINE AND SYNCHRONIZED:
set
advanced
y
cluster
ring show -unitname vldb
cluster
ring show -unitname mgmt
cluster
ring show -unitname vifmgr
cluster
ring show -unitname bcomd
cluster
show
set
admin
#####################
##
X: POST UPGRADE ##
#####################
##
CHECK VERSION:
version
##
VERIFY PROTOCOL SERVICE:
vserver
nfs show
vserver
cifs show
##
RE-ENABLE AND SEND ASUP:
autosupport
modify -node * -state enable
autosupport
show
autosupport
invoke -node * -type all -message "Finished_NDU_823P5"
autosupport
history show -last-update >1h -field status -status sent-successful
#
Want to see ASUP history status of "sent-successful"!
##
RESUME QUIESCED SNAPMIRRORS ON CLU1DR::>
snapmirror
show
snapmirror
resume *
snapmirror
show -status Quiesced
snapmirror
show -healthy False
##
RESUME QUIESCED SNAPMIRRORS ON CLU1SV::>
snapmirror
show
snapmirror
resume *
snapmirror
show -status Quiesced
snapmirror
show -healthy False
##
RESUME QUIESCED SNAPMIRRORS ON CLU1::>
snapmirror
show
snapmirror
resume *
snapmirror
show -status Quiesced
snapmirror
show -healthy False
##
HEALTH CHECKS INCLUDING:
event
log show -severity CRITICAL,EMERGENCY,ALERT,ERROR,WARNING -time >1h
Comments
Post a Comment