In the post we cover 5 scenarios that you might encounter
in the field, when removing network cards, doing headswaps (non-disruptive ARL,
or disruptive), etcetera. Here the version of NetApp ONTAP is 8.3.2.
1) After Ethernet card removal, have lost a port that was
a home port for a data LIF.
2) After Ethernet card removal, have lost a port that was
part of an IFGRP.
3) After Ethernet card removals, have lost both ports
that were part of an IFGRP.
4) After a head swap, have lost both cluster ports on the
Epsilon node (2-node cluster)
5) After a head swap, have lost both cluster ports on the
non-Epsilon/out-of-quorum node (2-node cluster)
I’m using a 2-node simulator cluster to demonstrate
these. The cluster is called CLU, and the two nodes are CLU-01 and CLU-02.
1) Lost e0j on CLU-01
Initial Setup:
CLU::>
network interface show SVM1_NFS1
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask
Node Port
Home
-------
---------- ---------- ------------ ------- ------- ----
SVM1 SVM1_NFS1 up/up
10.3.6.1/8 CLU-01 e0j
true
CLU::>
network port show -node CLU-01 -port e0j -fields
node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper
node port link mtu speed-admin speed-oper ipspace broadcast-domain
------
---- ---- ---- ----------- ---------- ------- ----------------
CLU-01
e0j up
1500 auto 1000 Default Default
After halting system, and removing the port, booting back
up, this is what we have:
CLU::>
network port show -node CLU-01 -port e0j -fields
node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper
node port link mtu speed-admin speed-oper ipspace
broadcast-domain
------
---- ---- --- ----------- ---------- ------- ----------------
CLU-01
e0j -
- auto - Default Default
CLU::>
network interface show SVM1_NFS1
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask
Node Port
Home
-------
---------- ---------- ------------ ------- ------- ----
SVM1 SVM1_NFS1
up/up 10.3.6.1/8 CLU-01
e0c
false
To tidy up/resolve:
CLU::>
set adv
CLU::*>
network port delete -node CLU-01 -port e0j
Error: command failed:
Operation can't be completed because port is either the home port or failover
target of a LIF.
CLU::*>
network interface modify -lif SVM1_NFS1 -vserver SVM1 -home-node CLU-01
-home-port e0c
CLU::*>
net port delete -node CLU-01 -port e0j
2) Lost e0i on CLU-01 (e0j is also in the ifgrp)
Initial setup:
CLU::>
network interface show SVM1_NFS1
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask
Node Port
Home
-------
---------- ---------- ------------ ------- ------- ----
SVM1 SVM1_NFS1 up/up
10.3.6.1/8 CLU-01 a0a
true
CLU::>
ifgrp show -node CLU-01 -ifgrp a0a -fields ports
node ifgrp ports
------
----- -------
CLU-01
a0a e0h,e0i
CLU::>
network port show -node CLU-01 -port e0i -fields
node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper
node port link mtu speed-admin speed-oper ipspace
broadcast-domain
------
---- ---- ---- ----------- ---------- ------- ----------------
CLU-01
e0i up
1500 auto 1000 Default -
After halting system, and removing the port, booting back
up, this is what we have:
CLU::>
network interface show SVM1_NFS1
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask
Node Port
Home
-------
---------- ---------- ------------ ------- ------- ----
SVM1 SVM1_NFS1
up/up 10.3.6.1/8 CLU-01
a0a
true
CLU::>
ifgrp show -node CLU-01 -ifgrp a0a -fields ports
node ifgrp ports
------
----- -------
CLU-01
a0a e0h,e0i
CLU::>
network port show -node CLU-01 -port e0i -fields
node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper
node port link mtu speed-admin speed-oper ipspace
broadcast-domain
------
---- ---- --- ----------- ---------- ------- ----------------
CLU-01
e0i -
- auto - Default -
To tidy up/resolve:
CLU::*>
set adv
CLU::*>
ifgrp remove-port -ifgrp a0a -node CLU-01 -port e0i
Error: command
failed: Port already has a lif bound.
CLU::*>
net int modify -lif SVM1_NFS1 -home-node CLU-01 -home-port e0c -vserver SVM1
CLU::*>
net int revert -lif SVM1_NFS1 -vserver SVM1
CLU::*>
ifgrp remove-port -ifgrp a0a -node CLU-01 -port e0i
CLU::*>
net port delete -node CLU-01 -port e0i
3) Lost e0i + e0j on CLU-02 (both ports in the ifgrp)
Initial Setup:
CLU::>
network interface show SVM1_NFS1
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask
Node Port
Home
-------
---------- ---------- ------------ ------- ------- ----
SVM1 SVM1_NFS1
up/up 10.3.6.1/8 CLU-02
a0a
true
CLU::>
ifgrp show -node CLU-02 -ifgrp a0a -fields ports
node ifgrp ports
------
----- -------
CLU-02
a0a e0i,e0j
CLU::>
network port show -node CLU-02 -port e0i,e0j -fields
node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper
node port link mtu speed-admin speed-oper ipspace
broadcast-domain
------
---- ---- ---- ----------- ---------- ------- ----------------
CLU-02
e0i up
1500 auto 1000 Default -
CLU-02
e0j up
1500 auto 1000 Default -
After halting system, and removing the ports, booting
back up, this is what we have:
CLU::>
network interface show SVM1_NFS1
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask
Node Port
Home
-------
---------- ---------- ------------ ------- ------- ----
SVM1 SVM1_NFS1
up/up 10.3.6.1/8 CLU-01
e0c
false
CLU::>
ifgrp show -node CLU-02 -ifgrp a0a -fields ports
node ifgrp ports
------
----- -------
CLU-02
a0a e0i,e0j
CLU::>
network port show -node CLU-02 -port e0i,e0j -fields
node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper
node port link mtu speed-admin speed-oper ipspace
broadcast-domain
------
---- ---- --- ----------- ---------- ------- ----------------
CLU-02
e0i -
- auto - Default -
CLU-02
e0j -
- auto - Default -
To tidy up/resolve:
CLU::>
set adv
CLU*::>
net int modify -lif SVM1_NFS1 -home-node CLU-02 -home-port e0c -vserver SVM1
CLU*::>
net int revert -lif SVM1_NFS1 -vserver SVM1
CLU*::>
ifgrp delete -node CLU-02 -ifgrp a0a
CLU*::>
net port delete -node CLU-02 -port e0i
CLU*::>
net port delete -node CLU-02 -port e0j
Prelimaries for 4 and 5:
Cluster, Cluster LIFs, and Cluster Ports setup:
CLU::*>
cluster show
Node Health
Eligibility Epsilon
------
------- ------------ -------
CLU-01
true true true
CLU-02
true true false
CLU::*>
network interface show -role cluster
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask Node
Port Home
-------
---------- ---------- ------------------ ------- ------- ----
Cluster
CLU-01_clus1 up/up 169.254.76.193/16 CLU-01 e0g
true
CLU-01_clus2 up/up 169.254.126.4/16 CLU-01 e0h
true
CLU-02_clus1 up/up 169.254.33.108/16 CLU-02 e0g
true
CLU-02_clus2 up/up 169.254.130.213/16 CLU-02 e0h
true
CLU::*>
network port show -role cluster
Speed (Mbps)
Node Port
IPspace Broadcast Domain Link
MTU Admin/Oper
------
--------- ------- ---------------- ----- ------- ------------
CLU-01
e0g
Cluster Cluster up 1500
auto/1000
e0h
Cluster Cluster up 1500
auto/1000
CLU-02
e0g
Cluster Cluster up 1500
auto/1000
e0h
Cluster Cluster up 1500
auto/1000
Then we halt both nodes in the cluster:
CLU::*>
halt !local -inhi -igno -skip
CLU::*>
halt local -inhi -igno -skip
4) Lost e0g,e0h on CLU-01 (Node had Epsilon prior to
2-node cluster shutdown)
What we have:
CLU::>
set adv
CLU::*>
cluster show
Node Health
Eligibility Epsilon
------
------- ------------ -------
CLU-01
true true true
CLU-02
false true
false
CLU::*>
network interface show -role cluster
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask Node
Port Home
-------
---------- ---------- ------------------ ------- ------- ----
Cluster
CLU-01_clus1 up/down 169.254.76.193/16 CLU-01 e0g
true
CLU-01_clus2 up/down 169.254.126.4/16 CLU-01 e0h
true
CLU-02_clus1 up/- 169.254.33.108/16 CLU-02 e0g
true
CLU-02_clus2 up/- 169.254.130.213/16 CLU-02 e0h
true
CLU::*>
network port show -role cluster
Speed (Mbps)
Node Port IPspace Broadcast Domain Link MTU
Admin/Oper
------
---- ------- ---------------- ----- ------- ------------
CLU-01
e0g Cluster Cluster - -
auto/-
e0h Cluster Cluster - -
auto/-
Warning: Unable to
list entries for vifmgr on node "CLU-02": RPC: Port mapper failure -
RPC: Unable to send.
To fix (we are connected via the CLU-01's node management
LIF):
CLU::*>
broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports
CLU-01:e0a
CLU::*>
broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports
CLU-01:e0b
CLU::*>
net int modify -lif CLU-01_clus1 -vserver Cluster -home-port e0a -home-node
CLU-01
CLU::*>
net int modify -lif CLU-01_clus2 -vserver Cluster -home-port e0b -home-node
CLU-01
CLU::*>
net int revert -lif CLU-01_clus1 -vserver Cluster
CLU::*>
net int revert -lif CLU-01_clus2 -vserver Cluster
CLU::*>
net port delete -port e0g -node CLU-01
CLU::*>
net port delete -port e0h -node CLU-01
Shows:
CLU::*> cluster show
Node Health
Eligibility Epsilon
------
------- ------------ -------
CLU-01
true true true
CLU-02
false true
false
CLU::*>
network interface show -role cluster
Logical
Status Network Current Current Is
Vserver
Interface Admin/Oper Address/Mask Node
Port
Home
-------
---------- ---------- ------------------ ------- ------- ----
Cluster
CLU-01_clus1 up/up 169.254.76.193/16 CLU-01 e0a
true
CLU-01_clus2 up/up 169.254.126.4/16 CLU-01
e0b true
CLU-02_clus1 up/- 169.254.33.108/16 CLU-02 e0g
true
CLU-02_clus2 up/- 169.254.130.213/16 CLU-02 e0h
true
CLU::*>
network port show -role cluster
Speed (Mbps)
Node Port IPspace Broadcast Domain Link MTU Admin/Oper
------
---- ------- ---------------- ----- ------ ------------
CLU-01
e0a Cluster Cluster up
1500 auto/1000
e0b Cluster Cluster up
1500 auto/1000
Warning: Unable to
list entries for vifmgr on node "CLU-02": RPC: Port mapper failure -
RPC: Timed out.
2 entries were
displayed.
5) Lost e0g,e0h on CLU-02 (Node didn't have Epsilon
prior to 2-node cluster shutdown)
What we have:
CLU::>
set adv
CLU::*>
cluster show
Node Health
Eligibility Epsilon
------
------- ------------ -------
CLU-01
false true
true
CLU-02
false true
false
CLU::*>
network interface show -role cluster
Logical Status
Network Current Current
Is
Vserver
Interface Admin/Oper Address/Mask Node
Port Home
-------
---------- ---------- ------------------ ------- ------- ----
Cluster
CLU-02_clus1 up/down 169.254.33.108/16 CLU-02 e0g
true
CLU-02_clus2 up/down 169.254.130.213/16 CLU-02 e0h
true
CLU::*>
network port show -role cluster
Speed (Mbps)
Node Port IPspace Broadcast Domain Link MTU
Admin/Oper
------
---- ------- ---------------- ----- ------- ------------
CLU-02
e0g Cluster - - 1500
auto/-
e0h Cluster - - 1500
auto/-
To fix (we are connected via the CLU-02's node management
LIF):
CLU::*>
broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports
CLU-02:e0a
Error: command
failed: Cannot run this command because the system is not fully initialized.
Wait a few minutes, and then try the command again.
OH SH*T!
The point of the post was to show how crucial the cluster
ports are. If you’ve performed a headswap (ARL or disruptive), and haven’t fully
considered how the cluster ports are going to work on the new platform, then
you’ll be a bit stuck with an out-of-quorum node where you can’t make any
changes. At this point it would either be a support case (support may have some
secret diag commands to fix it), or you could physically restore the ports
(i.e. if you were doing a headswap from FAS32XX with cluster ports on e1a and
e2a, to a FAS80XX with cluster ports on e0a, e0c, you could move the 10 GbE
cards from the FAS32XX to the FAS80XX).
Hi, I just read through your procedure. Very nice tests and helpful information.
ReplyDeleteI will do a headswap in this way next week and as far as I understand cDOT and this guide it will be sufficient to have one working cluster-network port (after the swap). I would then migrate the cluster-lif of the port that will disappear (e4a) to a port that will survive (e1a) before I do the swap and that should do the trick and the nodes will be able to form quorum with the two lifs on the single port.
Would you agree with that?
Kind reagrds
Christian
Hi Christian, yes, 1 cluster port is perfectly fine. As long as you have one that maps, you're good. Cheers, VC
DeleteHi Vidad,
ReplyDeleteI followed your guide and did a disruptive headswap from a FAS3240 over to a FAS8040. I ran into the issue where i was not able to move the CLUSTER ports from e1a and e2a ahead of time due to not having the PCI card in the FAS8040. Come to find out that the 10gb PCI card in the FAS3240 was not compatible with the FAS8040, so was not able to Move the card over. We had no choice but to proceed, knowing that one of the nodes was going to be OUT OF QUORUM due to these cluster ports. But just wanted to point out to anyone going through this, that the way we got through it was by adding the e0a and e0c ports to the CLUSTER broadcast domain and creating new LIFS in the CLUSTER SVM on both nodes after almost completing the headswap. Waited a couple minutes and both nodes were now in quorum. Verified with the "Cluster Show" and "Cluster ring show" commands. So it is possible to fix up your cluster towards the end of the headswap process.. assuming you follow the instructions correctly and make it that far.