What Happens when Ports Go Missing ...

In the post we cover 5 scenarios that you might encounter in the field, when removing network cards, doing headswaps (non-disruptive ARL, or disruptive), etcetera. Here the version of NetApp ONTAP is 8.3.2.

1) After Ethernet card removal, have lost a port that was a home port for a data LIF.

2) After Ethernet card removal, have lost a port that was part of an IFGRP.

3) After Ethernet card removals, have lost both ports that were part of an IFGRP.

4) After a head swap, have lost both cluster ports on the Epsilon node (2-node cluster)

5) After a head swap, have lost both cluster ports on the non-Epsilon/out-of-quorum node (2-node cluster)

I’m using a 2-node simulator cluster to demonstrate these. The cluster is called CLU, and the two nodes are CLU-01 and CLU-02.

1) Lost e0j on CLU-01

Initial Setup:

CLU::> network interface show SVM1_NFS1

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------ ------- ------- ----

SVM1 SVM1_NFS1 up/up 10.3.6.1/8 CLU-01 e0j true

CLU::> network port show -node CLU-01 -port e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node port link mtu speed-admin speed-oper ipspace broadcast-domain

------ ---- ---- ---- ----------- ---------- ------- ----------------

CLU-01 e0j up 1500 auto 1000 Default Default

After halting system, and removing the port, booting back up, this is what we have:

CLU::> network port show -node CLU-01 -port e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node port link mtu speed-admin speed-oper ipspace broadcast-domain

------ ---- ---- --- ----------- ---------- ------- ----------------

CLU-01 e0j - - auto - Default Default

CLU::> network interface show SVM1_NFS1

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------ ------- ------- ----

SVM1 SVM1_NFS1 up/up 10.3.6.1/8 CLU-01 e0c false

To tidy up/resolve:

CLU::> set adv

CLU::*> network port delete -node CLU-01 -port e0j

Error: command failed: Operation can't be completed because port is either the home port or failover target of a LIF.

CLU::*> network interface modify -lif SVM1_NFS1 -vserver SVM1 -home-node CLU-01 -home-port e0c

CLU::*> net port delete -node CLU-01 -port e0j

2) Lost e0i on CLU-01 (e0j is also in the ifgrp)

Initial setup:

CLU::> network interface show SVM1_NFS1

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------ ------- ------- ----

SVM1 SVM1_NFS1 up/up 10.3.6.1/8 CLU-01 a0a true

CLU::> ifgrp show -node CLU-01 -ifgrp a0a -fields ports

node ifgrp ports

------ ----- -------

CLU-01 a0a e0h,e0i

CLU::> network port show -node CLU-01 -port e0i -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node port link mtu speed-admin speed-oper ipspace broadcast-domain

------ ---- ---- ---- ----------- ---------- ------- ----------------

CLU-01 e0i up 1500 auto 1000 Default -

After halting system, and removing the port, booting back up, this is what we have:

CLU::> network interface show SVM1_NFS1

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------ ------- ------- ----

SVM1 SVM1_NFS1 up/up 10.3.6.1/8 CLU-01 a0a true

CLU::> ifgrp show -node CLU-01 -ifgrp a0a -fields ports

node ifgrp ports

------ ----- -------

CLU-01 a0a e0h,e0i

CLU::> network port show -node CLU-01 -port e0i -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node port link mtu speed-admin speed-oper ipspace broadcast-domain

------ ---- ---- --- ----------- ---------- ------- ----------------

CLU-01 e0i - - auto - Default -

To tidy up/resolve:

CLU::*> set adv

CLU::*> ifgrp remove-port -ifgrp a0a -node CLU-01 -port e0i

Error: command failed: Port already has a lif bound.

CLU::*> net int modify -lif SVM1_NFS1 -home-node CLU-01 -home-port e0c -vserver SVM1

CLU::*> net int revert -lif SVM1_NFS1 -vserver SVM1

CLU::*> ifgrp remove-port -ifgrp a0a -node CLU-01 -port e0i

CLU::*> net port delete -node CLU-01 -port e0i

3) Lost e0i + e0j on CLU-02 (both ports in the ifgrp)

Initial Setup:

CLU::> network interface show SVM1_NFS1

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------ ------- ------- ----

SVM1 SVM1_NFS1 up/up 10.3.6.1/8 CLU-02 a0a true

CLU::> ifgrp show -node CLU-02 -ifgrp a0a -fields ports

node ifgrp ports

------ ----- -------

CLU-02 a0a e0i,e0j

CLU::> network port show -node CLU-02 -port e0i,e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node port link mtu speed-admin speed-oper ipspace broadcast-domain

------ ---- ---- ---- ----------- ---------- ------- ----------------

CLU-02 e0i up 1500 auto 1000 Default -

CLU-02 e0j up 1500 auto 1000 Default -

After halting system, and removing the ports, booting back up, this is what we have:

CLU::> network interface show SVM1_NFS1

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------ ------- ------- ----

SVM1 SVM1_NFS1 up/up 10.3.6.1/8 CLU-01 e0c false

CLU::> ifgrp show -node CLU-02 -ifgrp a0a -fields ports

node ifgrp ports

------ ----- -------

CLU-02 a0a e0i,e0j

CLU::> network port show -node CLU-02 -port e0i,e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node port link mtu speed-admin speed-oper ipspace broadcast-domain

------ ---- ---- --- ----------- ---------- ------- ----------------

CLU-02 e0i - - auto - Default -

CLU-02 e0j - - auto - Default -

To tidy up/resolve:

CLU::> set adv

CLU*::> net int modify -lif SVM1_NFS1 -home-node CLU-02 -home-port e0c -vserver SVM1

CLU*::> net int revert -lif SVM1_NFS1 -vserver SVM1

CLU*::> ifgrp delete -node CLU-02 -ifgrp a0a

CLU*::> net port delete -node CLU-02 -port e0i

CLU*::> net port delete -node CLU-02 -port e0j

Prelimaries for 4 and 5:

Cluster, Cluster LIFs, and Cluster Ports setup:

CLU::*> cluster show

Node Health Eligibility Epsilon

------ ------- ------------ -------

CLU-01 true true true

CLU-02 true true false

CLU::*> network interface show -role cluster

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------------ ------- ------- ----

Cluster

CLU-01_clus1 up/up 169.254.76.193/16 CLU-01 e0g true

CLU-01_clus2 up/up 169.254.126.4/16 CLU-01 e0h true

CLU-02_clus1 up/up 169.254.33.108/16 CLU-02 e0g true

CLU-02_clus2 up/up 169.254.130.213/16 CLU-02 e0h true

CLU::*> network port show -role cluster

Speed (Mbps)

Node Port IPspace Broadcast Domain Link MTU Admin/Oper

------ --------- ------- ---------------- ----- ------- ------------

CLU-01

e0g Cluster Cluster up 1500 auto/1000

e0h Cluster Cluster up 1500 auto/1000

CLU-02

e0g Cluster Cluster up 1500 auto/1000

e0h Cluster Cluster up 1500 auto/1000

Then we halt both nodes in the cluster:

CLU::*> halt !local -inhi -igno -skip

CLU::*> halt local -inhi -igno -skip

4) Lost e0g,e0h on CLU-01 (Node had Epsilon prior to 2-node cluster shutdown)

What we have:

CLU::> set adv

CLU::*> cluster show

Node Health Eligibility Epsilon

------ ------- ------------ -------

CLU-01 true true true

CLU-02 false true false

CLU::*> network interface show -role cluster

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------------ ------- ------- ----

Cluster

CLU-01_clus1 up/down 169.254.76.193/16 CLU-01 e0g true

CLU-01_clus2 up/down 169.254.126.4/16 CLU-01 e0h true

CLU-02_clus1 up/- 169.254.33.108/16 CLU-02 e0g true

CLU-02_clus2 up/- 169.254.130.213/16 CLU-02 e0h true

CLU::*> network port show -role cluster

Speed (Mbps)

Node Port IPspace Broadcast Domain Link MTU Admin/Oper

------ ---- ------- ---------------- ----- ------- ------------

CLU-01

e0g Cluster Cluster - - auto/-

e0h Cluster Cluster - - auto/-

Warning: Unable to list entries for vifmgr on node "CLU-02": RPC: Port mapper failure - RPC: Unable to send.

To fix (we are connected via the CLU-01's node management LIF):

CLU::*> broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports CLU-01:e0a

CLU::*> broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports CLU-01:e0b

CLU::*> net int modify -lif CLU-01_clus1 -vserver Cluster -home-port e0a -home-node CLU-01

CLU::*> net int modify -lif CLU-01_clus2 -vserver Cluster -home-port e0b -home-node CLU-01

CLU::*> net int revert -lif CLU-01_clus1 -vserver Cluster

CLU::*> net int revert -lif CLU-01_clus2 -vserver Cluster

CLU::*> net port delete -port e0g -node CLU-01

CLU::*> net port delete -port e0h -node CLU-01

Shows:

CLU::*> cluster show

Node Health Eligibility Epsilon

------ ------- ------------ -------

CLU-01 true true true

CLU-02 false true false

CLU::*> network interface show -role cluster

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------------ ------- ------- ----

Cluster

CLU-01_clus1 up/up 169.254.76.193/16 CLU-01 e0a true

CLU-01_clus2 up/up 169.254.126.4/16 CLU-01 e0b true

CLU-02_clus1 up/- 169.254.33.108/16 CLU-02 e0g true

CLU-02_clus2 up/- 169.254.130.213/16 CLU-02 e0h true

CLU::*> network port show -role cluster

Speed (Mbps)

Node Port IPspace Broadcast Domain Link MTU Admin/Oper

------ ---- ------- ---------------- ----- ------ ------------

CLU-01

e0a Cluster Cluster up 1500 auto/1000

e0b Cluster Cluster up 1500 auto/1000

Warning: Unable to list entries for vifmgr on node "CLU-02": RPC: Port mapper failure - RPC: Timed out.

2 entries were displayed.

5) Lost e0g,e0h on CLU-02 (Node didn't have Epsilon prior to 2-node cluster shutdown)

What we have:

CLU::> set adv

CLU::*> cluster show

Node Health Eligibility Epsilon

------ ------- ------------ -------

CLU-01 false true true

CLU-02 false true false

CLU::*> network interface show -role cluster

Logical Status Network Current Current Is

Vserver Interface Admin/Oper Address/Mask Node Port Home

------- ---------- ---------- ------------------ ------- ------- ----

Cluster

CLU-02_clus1 up/down 169.254.33.108/16 CLU-02 e0g true

CLU-02_clus2 up/down 169.254.130.213/16 CLU-02 e0h true

CLU::*> network port show -role cluster

Speed (Mbps)

Node Port IPspace Broadcast Domain Link MTU Admin/Oper

------ ---- ------- ---------------- ----- ------- ------------

CLU-02

e0g Cluster - - 1500 auto/-

e0h Cluster - - 1500 auto/-

To fix (we are connected via the CLU-02's node management LIF):

CLU::*> broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports CLU-02:e0a

Error: command failed: Cannot run this command because the system is not fully initialized. Wait a few minutes, and then try the command again.

OH SH*T!

The point of the post was to show how crucial the cluster ports are. If you’ve performed a headswap (ARL or disruptive), and haven’t fully considered how the cluster ports are going to work on the new platform, then you’ll be a bit stuck with an out-of-quorum node where you can’t make any changes. At this point it would either be a support case (support may have some secret diag commands to fix it), or you could physically restore the ports (i.e. if you were doing a headswap from FAS32XX with cluster ports on e1a and e2a, to a FAS80XX with cluster ports on e0a, e0c, you could move the 10 GbE cards from the FAS32XX to the FAS80XX).

Comments

Unknown8 June 2017 at 23:56
Hi, I just read through your procedure. Very nice tests and helpful information.

I will do a headswap in this way next week and as far as I understand cDOT and this guide it will be sufficient to have one working cluster-network port (after the swap). I would then migrate the cluster-lif of the port that will disappear (e4a) to a port that will survive (e1a) before I do the swap and that should do the trick and the nodes will be able to form quorum with the two lifs on the single port.

Would you agree with that?

Kind reagrds
Christian
Milther10 July 2019 at 09:24
Hi Vidad,
I followed your guide and did a disruptive headswap from a FAS3240 over to a FAS8040. I ran into the issue where i was not able to move the CLUSTER ports from e1a and e2a ahead of time due to not having the PCI card in the FAS8040. Come to find out that the 10gb PCI card in the FAS3240 was not compatible with the FAS8040, so was not able to Move the card over. We had no choice but to proceed, knowing that one of the nodes was going to be OUT OF QUORUM due to these cluster ports. But just wanted to point out to anyone going through this, that the way we got through it was by adding the e0a and e0c ports to the CLUSTER broadcast domain and creating new LIFS in the CLUSTER SVM on both nodes after almost completing the headswap. Waited a couple minutes and both nodes were now in quorum. Verified with the "Cluster Show" and "Cluster ring show" commands. So it is possible to fix up your cluster towards the end of the headswap process.. assuming you follow the instructions correctly and make it that far.

Cosonok's IT Blog

Search This Blog

What Happens when Ports Go Missing ...

Comments

Post a Comment