Tuesday, 6 December 2016

What Happens when Ports Go Missing ...

In the post we cover 5 scenarios that you might encounter in the field, when removing network cards, doing headswaps (non-disruptive ARL, or disruptive), etcetera. Here the version of NetApp ONTAP is 8.3.2.

1) After Ethernet card removal, have lost a port that was a home port for a data LIF.
2) After Ethernet card removal, have lost a port that was part of an IFGRP.
3) After Ethernet card removals, have lost both ports that were part of an IFGRP.
4) After a head swap, have lost both cluster ports on the Epsilon node (2-node cluster)
5) After a head swap, have lost both cluster ports on the non-Epsilon/out-of-quorum node (2-node cluster)

I’m using a 2-node simulator cluster to demonstrate these. The cluster is called CLU, and the two nodes are CLU-01 and CLU-02.

1) Lost e0j on CLU-01

Initial Setup:


CLU::> network interface show SVM1_NFS1

        Logical    Status     Network      Current Current Is
Vserver Interface  Admin/Oper Address/Mask Node    Port    Home
------- ---------- ---------- ------------ ------- ------- ----
SVM1    SVM1_NFS1    up/up    10.3.6.1/8   CLU-01  e0j     true

CLU::> network port show -node CLU-01 -port e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node   port link mtu  speed-admin speed-oper ipspace broadcast-domain
------ ---- ---- ---- ----------- ---------- ------- ----------------
CLU-01 e0j  up   1500 auto        1000       Default Default


After halting system, and removing the port, booting back up, this is what we have:


CLU::> network port show -node CLU-01 -port e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node   port link mtu speed-admin speed-oper ipspace broadcast-domain
------ ---- ---- --- ----------- ---------- ------- ----------------
CLU-01 e0j  -    -   auto        -          Default Default

CLU::> network interface show SVM1_NFS1

        Logical    Status     Network      Current Current Is
Vserver Interface  Admin/Oper Address/Mask Node    Port    Home
------- ---------- ---------- ------------ ------- ------- ----
SVM1    SVM1_NFS1    up/up    10.3.6.1/8   CLU-01  e0c     false


To tidy up/resolve:


CLU::> set adv
CLU::*> network port delete -node CLU-01 -port e0j

Error: command failed: Operation can't be completed because port is either the home port or failover target of a LIF.

CLU::*> network interface modify -lif SVM1_NFS1 -vserver SVM1 -home-node CLU-01 -home-port e0c
CLU::*> net port delete -node CLU-01 -port e0j


2) Lost e0i on CLU-01 (e0j is also in the ifgrp)

Initial setup:


CLU::> network interface show SVM1_NFS1

        Logical    Status     Network      Current Current Is
Vserver Interface  Admin/Oper Address/Mask Node    Port    Home
------- ---------- ---------- ------------ ------- ------- ----
SVM1    SVM1_NFS1    up/up    10.3.6.1/8   CLU-01  a0a     true

CLU::> ifgrp show -node CLU-01 -ifgrp a0a -fields ports

node   ifgrp ports
------ ----- -------
CLU-01 a0a   e0h,e0i

CLU::> network port show -node CLU-01 -port e0i -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node   port link mtu  speed-admin speed-oper ipspace broadcast-domain
------ ---- ---- ---- ----------- ---------- ------- ----------------
CLU-01 e0i  up   1500 auto        1000       Default -


After halting system, and removing the port, booting back up, this is what we have:


CLU::> network interface show SVM1_NFS1

        Logical    Status     Network      Current Current Is
Vserver Interface  Admin/Oper Address/Mask Node    Port    Home
------- ---------- ---------- ------------ ------- ------- ----
SVM1    SVM1_NFS1    up/up    10.3.6.1/8   CLU-01  a0a     true

CLU::> ifgrp show -node CLU-01 -ifgrp a0a -fields ports

node   ifgrp ports
------ ----- -------
CLU-01 a0a   e0h,e0i

CLU::> network port show -node CLU-01 -port e0i -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node   port link mtu speed-admin speed-oper ipspace broadcast-domain
------ ---- ---- --- ----------- ---------- ------- ----------------
CLU-01 e0i  -    -   auto        -          Default -


To tidy up/resolve:


CLU::*> set adv
CLU::*> ifgrp remove-port -ifgrp a0a -node CLU-01 -port e0i

Error: command failed: Port already has a lif bound.

CLU::*> net int modify -lif SVM1_NFS1 -home-node CLU-01 -home-port e0c -vserver SVM1
CLU::*> net int revert -lif SVM1_NFS1 -vserver SVM1
CLU::*> ifgrp remove-port -ifgrp a0a -node CLU-01 -port e0i
CLU::*> net port delete -node CLU-01 -port e0i


3) Lost e0i + e0j on CLU-02 (both ports in the ifgrp)

Initial Setup:


CLU::> network interface show SVM1_NFS1

        Logical    Status     Network      Current Current Is
Vserver Interface  Admin/Oper Address/Mask Node    Port    Home
------- ---------- ---------- ------------ ------- ------- ----
SVM1    SVM1_NFS1    up/up    10.3.6.1/8   CLU-02  a0a     true

CLU::> ifgrp show -node CLU-02 -ifgrp a0a -fields ports

node   ifgrp ports
------ ----- -------
CLU-02 a0a   e0i,e0j

CLU::> network port show -node CLU-02 -port e0i,e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node   port link mtu  speed-admin speed-oper ipspace broadcast-domain
------ ---- ---- ---- ----------- ---------- ------- ----------------
CLU-02 e0i  up   1500 auto        1000       Default -
CLU-02 e0j  up   1500 auto        1000       Default -


After halting system, and removing the ports, booting back up, this is what we have:


CLU::> network interface show SVM1_NFS1

        Logical    Status     Network      Current Current Is
Vserver Interface  Admin/Oper Address/Mask Node    Port    Home
------- ---------- ---------- ------------ ------- ------- ----
SVM1    SVM1_NFS1    up/up    10.3.6.1/8   CLU-01  e0c     false

CLU::> ifgrp show -node CLU-02 -ifgrp a0a -fields ports

node   ifgrp ports
------ ----- -------
CLU-02 a0a   e0i,e0j

CLU::> network port show -node CLU-02 -port e0i,e0j -fields node,port,ipspace,broadcast-domain,link,mtu,speed-admin,speed-oper

node   port link mtu speed-admin speed-oper ipspace broadcast-domain
------ ---- ---- --- ----------- ---------- ------- ----------------
CLU-02 e0i  -    -   auto        -          Default -
CLU-02 e0j  -    -   auto        -          Default -


To tidy up/resolve:


CLU::> set adv
CLU*::> net int modify -lif SVM1_NFS1 -home-node CLU-02 -home-port e0c -vserver SVM1
CLU*::> net int revert -lif SVM1_NFS1 -vserver SVM1
CLU*::> ifgrp delete -node CLU-02 -ifgrp a0a
CLU*::> net port delete -node CLU-02 -port e0i
CLU*::> net port delete -node CLU-02 -port e0j


Prelimaries for 4 and 5:

Cluster, Cluster LIFs, and Cluster Ports setup:

CLU::*> cluster show

Node   Health  Eligibility   Epsilon
------ ------- ------------  -------
CLU-01 true    true          true
CLU-02 true    true          false

CLU::*> network interface show -role cluster

        Logical    Status     Network            Current Current Is
Vserver Interface  Admin/Oper Address/Mask       Node    Port    Home
------- ---------- ---------- ------------------ ------- ------- ----
Cluster
        CLU-01_clus1 up/up    169.254.76.193/16  CLU-01  e0g     true
        CLU-01_clus2 up/up    169.254.126.4/16   CLU-01  e0h     true
        CLU-02_clus1 up/up    169.254.33.108/16  CLU-02  e0g     true
        CLU-02_clus2 up/up    169.254.130.213/16 CLU-02  e0h     true

CLU::*> network port show -role cluster
                                                        Speed (Mbps)
Node   Port      IPspace Broadcast Domain Link   MTU    Admin/Oper
------ --------- ------- ---------------- ----- ------- ------------
CLU-01
       e0g       Cluster Cluster          up       1500  auto/1000
       e0h       Cluster Cluster          up       1500  auto/1000
CLU-02
       e0g       Cluster Cluster          up       1500  auto/1000
       e0h       Cluster Cluster          up       1500  auto/1000


Then we halt both nodes in the cluster:


CLU::*> halt !local -inhi -igno -skip
CLU::*> halt local -inhi -igno -skip


4) Lost e0g,e0h on CLU-01 (Node had Epsilon prior to 2-node cluster shutdown)

What we have:

CLU::> set adv
CLU::*> cluster show

Node   Health  Eligibility   Epsilon
------ ------- ------------  -------
CLU-01 true    true          true
CLU-02 false   true          false

CLU::*> network interface show -role cluster

        Logical    Status     Network            Current Current Is
Vserver Interface  Admin/Oper Address/Mask       Node    Port    Home
------- ---------- ---------- ------------------ ------- ------- ----
Cluster
        CLU-01_clus1 up/down  169.254.76.193/16  CLU-01  e0g     true
        CLU-01_clus2 up/down  169.254.126.4/16   CLU-01  e0h     true
        CLU-02_clus1 up/-     169.254.33.108/16  CLU-02  e0g     true
        CLU-02_clus2 up/-     169.254.130.213/16 CLU-02  e0h     true

CLU::*> network port show -role cluster
                                                   Speed (Mbps)
Node   Port IPspace Broadcast Domain Link   MTU    Admin/Oper
------ ---- ------- ---------------- ----- ------- ------------
CLU-01
       e0g  Cluster Cluster          -           -  auto/-
       e0h  Cluster Cluster          -           -  auto/-

Warning: Unable to list entries for vifmgr on node "CLU-02": RPC: Port mapper failure - RPC: Unable to send.


To fix (we are connected via the CLU-01's node management LIF):


CLU::*> broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports CLU-01:e0a
CLU::*> broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports CLU-01:e0b
CLU::*> net int modify -lif CLU-01_clus1 -vserver Cluster -home-port e0a -home-node CLU-01
CLU::*> net int modify -lif CLU-01_clus2 -vserver Cluster -home-port e0b -home-node CLU-01
CLU::*> net int revert -lif CLU-01_clus1 -vserver Cluster
CLU::*> net int revert -lif CLU-01_clus2 -vserver Cluster
CLU::*> net port delete -port e0g -node CLU-01
CLU::*> net port delete -port e0h -node CLU-01


Shows:


CLU::*>  cluster show

Node   Health  Eligibility   Epsilon
------ ------- ------------  -------
CLU-01 true    true          true
CLU-02 false   true          false

CLU::*> network interface show -role cluster

        Logical    Status     Network            Current Current Is
Vserver Interface  Admin/Oper Address/Mask       Node    Port    Home
------- ---------- ---------- ------------------ ------- ------- ----
Cluster
        CLU-01_clus1 up/up    169.254.76.193/16  CLU-01  e0a     true
        CLU-01_clus2 up/up    169.254.126.4/16   CLU-01   e0b     true
        CLU-02_clus1 up/-     169.254.33.108/16  CLU-02  e0g     true
        CLU-02_clus2 up/-     169.254.130.213/16 CLU-02  e0h     true

CLU::*> network port show -role cluster
                                                  Speed (Mbps)
Node   Port IPspace Broadcast Domain Link  MTU    Admin/Oper
------ ---- ------- ---------------- ----- ------ ------------
CLU-01
       e0a  Cluster Cluster          up    1500  auto/1000
       e0b  Cluster Cluster          up    1500  auto/1000

Warning: Unable to list entries for vifmgr on node "CLU-02": RPC: Port mapper failure - RPC: Timed out.
2 entries were displayed.


5) Lost e0g,e0h on CLU-02 (Node didn't have Epsilon prior to 2-node cluster shutdown)

What we have:

CLU::> set adv

CLU::*> cluster show

Node   Health  Eligibility   Epsilon
------ ------- ------------  -------
CLU-01 false   true          true
CLU-02 false   true          false

CLU::*> network interface show -role cluster

        Logical    Status     Network            Current Current Is
Vserver Interface  Admin/Oper Address/Mask       Node    Port    Home
------- ---------- ---------- ------------------ ------- ------- ----
Cluster
        CLU-02_clus1 up/down  169.254.33.108/16  CLU-02  e0g     true
        CLU-02_clus2 up/down  169.254.130.213/16 CLU-02  e0h     true

CLU::*> network port show -role cluster
                                                   Speed (Mbps)
Node   Port IPspace Broadcast Domain Link   MTU    Admin/Oper
------ ---- ------- ---------------- ----- ------- ------------
CLU-02
       e0g  Cluster -                -        1500  auto/-
       e0h  Cluster -                -        1500  auto/-


To fix (we are connected via the CLU-02's node management LIF):             


CLU::*> broadcast-domain add-ports -broadcast-domain Cluster -IPspace Cluster -ports CLU-02:e0a

Error: command failed: Cannot run this command because the system is not fully initialized. Wait a few minutes, and then try the command again.


OH SH*T!

The point of the post was to show how crucial the cluster ports are. If you’ve performed a headswap (ARL or disruptive), and haven’t fully considered how the cluster ports are going to work on the new platform, then you’ll be a bit stuck with an out-of-quorum node where you can’t make any changes. At this point it would either be a support case (support may have some secret diag commands to fix it), or you could physically restore the ports (i.e. if you were doing a headswap from FAS32XX with cluster ports on e1a and e2a, to a FAS80XX with cluster ports on e0a, e0c, you could move the 10 GbE cards from the FAS32XX to the FAS80XX).


No comments:

Post a Comment