BES-53248 Upgrade from 3.4.4.6 to 3.10.0.3

Note: The RCF upgrade procedure has changed now and does not need clear config anymore see: Install the Reference Configuration File (RCF)

If you need to upgrade your BES-53248 Cluster Switches from 3.4.4.6 to 3.10.0.3, then you can do it in one step, but  there are a couple of NetApp KBs you need to be aware of:

  1. Error! in configuration script file at line number XX when applying a new RCF
  2. BES-53248 ISL down when upgrading to EFOS 3.7.0.4 or later

The following walkthrough below, takes into account the need to fix the ISL when we have one switch running 3.4.4.6 and one running 3.10.0.3, and also the need to run "clear config" when applying the updated RCF file.

  • Note 1: Here we're going from EFOS 3.4.4.6 to 3.10.0.3 and RCF from 1.6 to 1.9. For latest version of the matrix see here.
  • Note 2: Official documentation is here: AFF and FAS Switch Documentation
  • Note 3: I did this for a cluster on ONTAP 9.9.1. If you're on 9.9.1 and your switch is on 3.4.4.6, you need to upgrade the switch before you can upgrade ONTAP (to have a supported configuration.)

BES-53248 Switch Compatibility Matrix (April 2023)

High-Level Steps
  1. Prepare for BES-53248 switch upgrade
  2. Install EFOS and Download RCF
  3. Reboot Switch 1 to apply new EFOS version
  4. Reconfigure Switch 1 (needs a console connection)
    1. Clear config
    2. Restore basic config
    3. Apply new RCF
    4. Enable SSH
  5. Fix ISL
  6. Reboot Switch 1 after applying RCF
  7. Bring up ports on Switch 1
  8. Repeat 2 to 7 for Switch 2 (skip 5 as you won't need to fix the ISL)
  9. Final Steps
Note: Could we do it with just one reboot per switch!? Well, not for Switch 1 in the above process as it needs a reboot to get the ISL working. Also, I like to apply the RCF onto an EFOS version supported for that RCF to avoid unexpected errors, hence rebooting once for the EFOS and once after applying the RCF. Arguably it only needs 3 reboots, 2 for Switch 1, and 1 for Switch 2, but consistency in a process is a good 😊.

Low-Level Walkthrough

Note: The switches are SW1 and SW2 below.

1. Prepare for BES-53248 switch upgrade

Connect to the ONTAP cluster and run a few checks and commands like below:
  • timeout show
  • timeout modify -timeout 0
  • cluster show
  • net int show -is-home false
  • net int show -ipspace Cluster
  • net port show -ipspace Cluster
  • device-discovery show 
  • autosupport invoke -node * -type all -message "MAINT=4h Upgrading BES-53248 to 3.10.0.3 and RCF 1.9"
  • net int show -ipspace Cluster -fields auto-revert
  • net int modify -vserver Cluster -auto-revert false -lif *

Basically, we're checking cluster LIFs and cluster ports are fine, and the nodes are cabled to both switches. We disable auto-revert as we'll down the ports on the cluster switch we're working on. Also, applying the RCF re-enables the ports and we don't want Cluster LIFs auto-reverting until the switch is 100% ready for them.

2. Install EFOS and Download RCF

Firstly, we run a few commands on the switch to again verify node connectivity and get an output of the running config:

  • SW1> enable
  • SW1# terminal length 0
  • SW1# show isdp neighbors
  • SW1# show run

Then we install the new EFOS software and download the RCF file:

  • SW1# show version
  • SW1# show bootvar
  • SW1# copy active backup
  • SW1# show bootvar
  • SW1# ping HTTP_SERVER_IP
  • SW1# copy http://HTTP_SERVER_IP/EFOS-3.10.0.3.stk active
  • SW1# show bootvar
  • SW1# copy http://HTTP_SERVER_IP/BES-53248-RCF-v1.9-Cluster-HA.txt nvram:script BES-53248-RCF-v1.9-Cluster-HA.scr
  • SW1# script list

Note: Just say Y to the "The file being downloaded has potential problems. Do you want to save this file? Y" as we'll apply it later.

Then we shutdown the node ports on SW1 (which will make the LIFs move to ports connected to SW2):

  • SW1# show port all
  • SW1(config)# config
  • SW1(config-if)# interface 0/1-0/12
  • SW1(config-if)# shutdown
  • SW1(config-if)# exit
  • SW1(config)# exit
  • SW1# show port all

3. Reboot Switch 1 to apply new EFOS version

Firstly, check the ONTAP cluster is still okay (all ports attached to Cluster Switch 1 will be down):

  • cluster show
  • net int show -role cluster

And then we reboot the switch after doing a write memory to save the  running config as is (answer y to the prompts):

  • SW1# write memory
  • SW1# reload

And when the switch is back up, validate the new software:
  • SW1> enable
  • SW1# terminal length 0
  • SW1# show bootvar
  • SW1# show version
4.1. Clear Switch 1's config
  • SW1# clear config
Answer Y to the "Are you sure you want to clear the configuration?"

The switch will reset back to a login prompt. The username will be admin with no password. Follow the prompts to reset the admin password.

4-2. Restore basic config

The basic config of this switches is very simple. The details will come from the output of 'show run' that you got previously.

en
hostname SWITCH_NAME
serviceport protocol none
y
network protocol none
y
serviceport ip SWITCH_IP NETWORK_MASK GATEWAY
show serviceport
show network

For more on the basic config, please see: NetApp (Broadcom) BES-53248 Cluster Switch Notes: How to Setup

4-3. Apply new RCF
  • SW1# script list
  • SW1# script apply BES-53248-RCF-v1.9-Cluster-HA.scr
When we apply the script file, the switch ports used for the cluster network will be re-enabled and we want them to stay down for now.
  • SW1# show port all
  • SW1(config)# config
  • SW1(config-if)# interface 0/1-0/12
  • SW1(config-if)# shutdown
  • SW1(config-if)# exit
  • SW1(config)# exit
  • SW1# show port all
4-4. Enable SSH

To enable SSH on Switch 1, run the following commands:

config
crypto key generate rsa
crypto key generate dsa
crypto key generate ecdsa 521
exit
ip ssh server enable
show ip ssh

And let's to a write memory to save the new config:
  • SW# write memory
5. Fix ISL

After the RCF has been applied the ISL does not come up. This is a known issue (see the KB above.) To restore the ISL, on Switch 2 run this command:
  • SW2> enable
  • SW2# driv "port ce an=1 speed=100000"
6. Reboot Switch 1

The we return to switch 1 and do a reload:
  • SW1# reload
7. Bring up ports on Switch 1

Firstly, verify the port-channel is up:
  • SW1# show port-channel all
Then we return to the cluster and re-enable auto-revert (so when we up the ports on the cluster switch, the LIFs automatically re-home):
  • net int modify -vserver Cluster -auto-revert true -lif *
  • net int show -role cluster
Then we re-enable the ports on the cluster switch:
  • SW1> enable
  • SW1# configure
  • SW1(config)# interface 0/1-0/12
  • SW1(config-if)# no shutdown
  • SW1(config-if)# exit
  • SW1(config)# exit
  • SW1# write memory
And validate the cluster is healthy:
  • cluster show
  • net int show -role cluster
Now we have two BES-53248 cluster switches, running on different EFOS and different RCF.

8. Repeat 2 to 7 for Switch 2 (skip 5 as you won't need to fix the ISL)

Disable auto-revert on cluster LIFs again:
  • net int show -ipspace Cluster -fields auto-revert
  • net int modify -vserver Cluster -auto-revert false -lif *
Then repeat steps 2 to 7 for Switch 2 (skip 5.)

9. Final Steps

The final step is to check log collection is enabled in ONTAP for the cluster switches.
  • system switch ethernet log setup-password
  • system switch ethernet log setup-password
  • system switch ethernet log enable-collection
Note: The setup-password is done for both cluster switches.

Re-enable the CLI timeout and send autosupports.
  • timeout modify -timeout 30
  • autosupport invoke -node * -type all -message "MAINT=END Upgraded BES-53248 to 3.10.0.3 and RCF 1.9"
All in all, this work should be completed in well under 4 hours.

THE END

Other Notes on EFOS 3.10.0.3

1) sntp commands are now ntp
2) enable password YOURPASSWORD no longer works. You must do (pick which encryption type your want):

# enable password encryption-type aes / md5 / sha256 / sha512

And then it will prompt for old password, new password, and confirm new password.

Troubleshooting Utilization Pre-ONTAP Upgrade


node run -node node_name -command sysstat -c 10 -x 3

sysstat -m 1

Check which domain is high using the node shell command:

sysstat -M 1

Ensure the Nwk_Exmpt and/or WAFL_Ex are the busiest domains. This means ONTAP is processing foreground or background WAFL workloads. hostOS can be excluded as this is background.

Use this to check for top consumers:

wafltop show -v cpu -i 10 -n 10

Comments