Sunday, 11 February 2018

How to Upgrade CN1610 Firmware from 1.1.0.X to 1.2.0.7

Something I was involved with towards the end of last year, upgrading a load of production NetApp CN1610 cluster switches - with various versions of FASTPATH 1.1.0.X - to 1.2.0.7. I had to document a procedure. The upgrade procedure here is slightly different to the official procedure (which I think is a bit convoluted, and injects a risk of the wrong switch being rebooted and causing an outage). This is easy to do, non-disruptive, and shouldn’t take longer than 2 hours.

Image: CN1610 rear view

Preparation

- Run Config Advisor and verify no issues are flagged regards cluster cabing
- Run the following commands from Clustershell to double-check that each node has a connection to both cluster switches::>


node run * options cdpd.enable


If cdpd is not enabled, enable it temporarily. After enabling cdpd it may take a few minutes to discover neighbors::>


node run * options cdpd.enable on


Inspect the show-neighbors outputs::>


node run * cdpd show-neighbors
node run * cdpd show-neighbors -v


If cdpd was previously not enabled, disable it::>


node run * options cdpd.enable off


- Verify the cluster ping-cluster output::>


set advanced
cluster ping-cluster -node NODE1
set admin


You should have no basic connectivity failures and ‘Larger than PMTU communication’ success on all paths.
Note: It is sufficient to run ping-cluster on one node (takes a while to run). If issues are detected, run ping-cluster against the node(s) with issue.

- Acquire login credentials to the cluster switches

Pre-Requisites

1) TFTP Server for uploading firmware/RCF to the switch (I personally use 'Solarwinds Free TFTP Server')
2) Required software:

From the NetApp CN1610 Cluster Switch Software Download Instructions page -
- download:

- FASTPATH 1.2.0.7 for CN1610 (NetApp_CN1610_1.2.0.7.stk)
- RCF 1.2 (this is the text file CN1610_CS_RCF_v1.2.txt)

Upgrading Cluster Switch Firmware and RCF

1) Pre-Upgrade Clustershell Tasks

1.1) Send AutoSupports::>


autosupport invoke -node * -type all -message "MAINT=2h Upgrading Cluster Switch Firmware"


1.2) Check Cluster Health (all nodes should be healthy)::>


cluster show


1.3) Check Cluster Ports (all cluster ports should be up)::>


net port show -role cluster


1.4) Check Cluster LIFs (all cluster LIFs should be home)::>


net int show -role cluster


1.5) Verify that Cluster LIFs are set to auto-revert (auto-revert should be set to true)::>


net int show -role cluster -fields auto-revert


2) Upgrading Cluster Switch 1 Firmware and RCF

2.1) Connect to cluster switch 1

Note: To find the cluster switch management address use the command ‘show serviceport

2.2) To check current FASTPATH and RCF versions, run the below commands:


(CN1610-SW1) > enable
(CN1610-SW1) # show version
(CN1610-SW1) # show running-config


Note: The RCF version is listed in the description for interface 3/64 in the ‘show running-config’ output.

2.3) To upgrade FASTPATH, the commands are:


(CN1610-SW1) # show bootvar
(CN1610-SW1) # copy active backup


This will take a few minutes...


(CN1610-SW1) # show bootvar
(CN1610-SW1) # copy tftp://{YOUR_TFTP_SERVER}/NetApp_CN1610_1.2.0.7.stk active


This will take a few minutes...


(CN1610-SW1) # show bootvar


The next-active boot-image version should show as 1.2.0.7.

2.4) If the RCF version was not 1.2 (most likely it will be 1.1), you need to upgrade the RCF - the commands to do this are:


(CN1610-SW1) # show running-config config_backup.scr
(CN1610-SW1) # copy tftp://{YOUR_TFTP_SERVER}/CN1610_CS_RCF_v1.2.txt nvram:script CN1610_CS_RCF_v1.2.scr
(CN1610-SW1) # script list
(CN1610-SW1) # script apply CN1610_CS_RCF_v1.2.scr
(CN1610-SW1) # show running-config
(CN1610-SW1) # show port-channel 3/1


Note: The RCF version is listed in the description for interface 3/64 in the ‘show running-config’ output.

2.5) Save running configuration (so it becomes the startup configuration when you reboot) and reboot cluster switch 1:


(CN1610-SW1) # write memory
(CN1610-SW1) # reload


SWITCH REBOOTS

2.6) After reboot, verify FASTPATH and RCF versions have been updated, and the port-channel is working:


(CN1610-SW1) > enable
(CN1610-SW1) # show version
(CN1610-SW1) # show running-config
(CN1610-SW1) # show port-channel 3/1


Note: The RCF version is listed in the description for interface 3/64 in the ‘show running-config’ output.

2.7) Additionally: verify that SSH is enabled (RCF 1.2 disables telnet):


(CN1610-SW1) # ip ssh protocol 2
(CN1610-SW1) # config
(CN1610-SW1) (Config)# crypto key generate rsa
(CN1610-SW1) (Config)# crypto key generate dsa
(CN1610-SW1) (Config)# exit
(CN1610-SW1) # ip ssh server enable
(CN1610-SW1) # show ip ssh
(CN1610-SW1) # write memory
(CN1610-SW1) # exit
(CN1610-SW1) > logout


3) Confirm the cluster is healthy after the Cluster Switch 1 upgrade

3.1) Check Cluster Health (all nodes should be healthy)::>


cluster show


3.2) Check Cluster Ports (all cluster ports should be up)::>


net port show -role cluster


3.3) Check Cluster LIFs (all cluster LIFs should be home)::>


net int show -role cluster


4) Upgrading Cluster Switch 2 Firmware and RCF

4.1) Connect to cluster switch 2

Note: To find the cluster switch management address use the command ‘show serviceport

4.2) To check current FASTPATH and RCF versions, run the below commands:


(CN1610-SW2) > enable
(CN1610-SW2) # show version
(CN1610-SW2) # show running-config


Note: The RCF version is listed in the description for interface 3/64 in the ‘show running-config’ output.

4.3) To upgrade FASTPATH, the commands are:


(CN1610-SW2) # show bootvar
(CN1610-SW2) # copy active backup


This will take a few minutes...


(CN1610-SW2) # show bootvar
(CN1610-SW2) # copy tftp://{YOUR_TFTP_SERVER}/NetApp_CN1610_1.2.0.7.stk active


This will take a few minutes...


(CN1610-SW2) # show bootvar


The next-active boot-image version should show as 1.2.0.7.

4.4) If the RCF version was not 1.2 (most likely it will be 1.1), you need to upgrade the RCF - the commands to do this are:


(CN1610-SW2) # show running-config config_backup.scr
(CN1610-SW2) # copy tftp://{YOUR_TFTP_SERVER}/CN1610_CS_RCF_v1.2.txt nvram:script CN1610_CS_RCF_v1.2.scr
(CN1610-SW2) # script list
(CN1610-SW2) # script apply CN1610_CS_RCF_v1.2.scr
(CN1610-SW2) # show running-config
(CN1610-SW2) # show port-channel 3/1


Note: The RCF version is listed in the description for interface 3/64 in the ‘show running-config’ output.

4.5) Save running configuration (so it becomes the startup configuration when you reboot) and reboot cluster switch 2:


(CN1610-SW2) # write memory
(CN1610-SW2) # reload


SWITCH REBOOTS

4.6) After reboot, verify FASTPATH and RCF versions have been updated, and the port-channel is working:


(CN1610-SW2) > enable
(CN1610-SW2) # show version
(CN1610-SW2) # show running-config
(CN1610-SW2) # show port-channel 3/1


Note: The RCF version is listed in the description for interface 3/64 in the ‘show running-config’ output.

4.7) Additionally: verify that SSH is enabled (RCF 1.2 disables telnet):


(CN1610-SW2) # ip ssh protocol 2
(CN1610-SW2) # config
(CN1610-SW2) (Config)# crypto key generate rsa
(CN1610-SW2) (Config)# crypto key generate dsa
(CN1610-SW2) (Config)# exit
(CN1610-SW2) # ip ssh server enable
(CN1610-SW2) # show ip ssh
(CN1610-SW2) # write memory
(CN1610-SW2) # exit
(CN1610-SW2) > logout


5) Post-Upgrade ClusterShell Tasks

5.1) Send AutoSupports::>


autosupport invoke -node * -type all -message MAINT=END


5.2) Check Cluster Health (all nodes should be healthy)::>


cluster show


5.3) Check Cluster Ports (all cluster ports should be up)::>


net port show -role cluster


5.4) Check Cluster LIFs (all cluster LIFs should be home)::>


net int show -role cluster


5.5) Verify the cluster network with cluster ping-cluster::>


set advanced
cluster ping-cluster -node NODE1
set admin


You should have no basic connectivity failures and ‘Larger than PMTU communication’ success on all paths.
Note: It is sufficient to run ping-cluster on one node (takes a while to run). If issues are detected, run ping-cluster against the node(s) with issue.

5.6) Check cluster-switch output::>


cluster-switch show -type cluster-network


THE END

5 comments:

  1. Hello Vidad,

    so in this procedure, you are avoiding the realocation of the cluster lifs?I already done this, a couple of times, and always i had to double check everything twice. It is a simple procedure, involving some risks!

    ReplyDelete
    Replies
    1. Hi Pedro,
      Many thanks for reading. I'm very surprised to get a comment on Sunday.
      Indeed, I don't recommend following the official procedure. It is completely unnecessary to disable LIF failover, move the LIFs, down the ports, ... CDOT is resilient enough to operate fine if a switch goes down (for whatever reason), if it cannot, then that's a bug that would need engineering to fix.
      I remember in 8.1.2 (I think), there was a bug which meant the cluster sessions didn't get setup correctly, and rebooting/powering off a switch could cause an outage (this happened to a customer of mine). But that particular bug was fix ages ago (I can't remember the version - maybe 8.1.4).
      I've seen a customer use the official procedure once, and they caused an outage because they rebooted the wrong switch (which is a disaster if it's hosting all the cluster LIFs and the procedure has disabled failover.)
      Also, for this particular piece of work, the customer didn't want us logging into the clustershell, so it's much easier to just request checks than to have to liase with an offshore team to move LIFs around and down/up ports. We must have upgraded around a dozen cluster switch pairs the way above, without any issue.
      Those are my reasons. As always, this blog is unofficial - NetApp would most definately want you to follow the official process (doesn't matter that Cosonok think's it's unnecessarily convoluted.)
      Cheers,
      VC

      Delete
  2. Why would netapp recommend two switchs if it cannot survive cluster network with only one? I always use the official procedures, and in this case i've followed it a couple of times, but you have a certain good reasons to make in another way.Let my say that this specfific procedure regarding the lifs, is pointed in exam.

    On Sunday?Is it a different day for reading or take some relax knowledge?After a month without any post, i was expecting some good material :D

    Kind Regards
    PF

    ReplyDelete
    Replies
    1. I agree, Sunday is a very good day for catching up with stuff (I often post stuff on a Sunday, unless I'm busy with my new passion - a Lotus Elise). Apologies no post for a month, and hope the new material hasn't been absolute crap. This may be a quiet year on the blog front... Cheers, VC

      Delete
  3. wow...thats a huge sensatorial car!!Were i live... it is impossible to afford one of those. At least in IT Professional job eheheh. There are materials for everyone, so dont worry!

    ReplyDelete