Something to be
aware of in Clustered ONTAP (and pretty much all other clustered file systems),
is that: if your cluster goes out of quorum, you will lose data availability
until quorum is restored. The following post demonstrates this.
The Demonstration
We have a 4 node cluster running Data ONTAP 8.1.2P4. As
per normal with a new cluster, epsilon exists on the first node that was
entered into the cluster.
Note 1: All the
epsilon does is add voting weight to the holder, so that - in normal
circumstances - there will always be a victor in any election for quorum
ownership. The epsilon node will have a voting weight of say 1.1, and all
others 1.
Note 2: Epsilon can
be moved, please contact NetApp Global Support (NGS) for this.
To see where Epsilon is, drop down into the advanced
privilege level and run cluster show:
clust::> set -priv
advanced
clust::*> cluster show
Node Health Eligibility Epsilon
-------- ------
----------- -------
clust-01
true true
true
clust-02
true true
false
clust-03
true true
false
clust-04
true true
false
Also, run cluster ring show to display cluster node
member's replication rings:
clust::cluster*>
cluster ring
show
Node UnitName Epoch DB Epoch DB Trnxs Master
--------- --------
----- -------- -------- --------
clust-01 mgmt
5 5 259
clust-01
clust-01 vldb
3 3 60
clust-01
clust-01 vifmgr
3 3 145
clust-01
clust-01 bcomd
4 4 35
clust-01
clust-02 mgmt
5 5 259
clust-01
clust-02 vldb
3 3 60
clust-01
clust-02 vifmgr
3 3 145
clust-01
clust-02 bcomd
4 4 35
clust-01
clust-03 mgmt
5 5 259
clust-01
clust-03 vldb
3 3 60
clust-01
clust-03 vifmgr
3 3 145
clust-01
clust-03 bcomd
4 4 35
clust-01
clust-04 mgmt
5 5 259
clust-01
clust-04 vldb
3 3 60
clust-01
clust-04 vifmgr
3 3 145
clust-01
clust-04 bcomd
4 4 35
clust-01
In our test lab, we simply have two NFS volumes, one on
clust-03, and one on clust-04, presented to an ESXi host as an NFS datastore.
Image: NFS datastores
available (active)
Power Down of
Epsilon Node and Partner
After powering down the Epsilon node - clust-01 - and clust-02,
we get:
clust::*> cluster show
Node Health Eligibility Epsilon
-------- ------
----------- -------
clust-01 false true
true
clust-02 false true
false
clust-03 false true
false
clust-04 false true
false
clust::*> cluster ring
show
Node UnitName Epoch DB Epoch DB Trnxs Master
-------- --------
----- -------- -------- ------
Warning: Unable
to list entries on node clust-01. RPC: Port mapper failure - RPC: Timed out
Warning: Unable
to list entries on node clust-02. RPC: Port mapper failure - RPC: Timed out
clust-03
mgmt 0 5
277 -
clust-03
vldb 0 3
60 -
clust-03
vifmgr 0 3
145 -
clust-03
bcomd 0 4
35 -
clust-04
mgmt 0 5
277 -
clust-04
vldb 0 3
60 -
clust-04
vifmgr 0 3
145 -
clust-04
bcomd 0 4
35 -
Note: You might
lose connection to cluster management if it is on one of those two nodes, and
it won’t fail over with the cluster being out of quorum. Connect via one of the
surviving node management interfaces.
The Result
The NFS Datastores are unavailable.
Image: NFS
datastores unavailable (inactive)
In this situation, the fix would simply be to bring up one
of the downed nodes, thus restoring quorum, or contact NGS to move Epsilon.
Note: This is only
applicable to clusters of greater than 2 nodes. In 2 node clusters, cluster ha
is enabled with the command.
clust::*> cluster ha modify -configured true
This must be
configured false for greater than 2 nodes.
Comments
Post a Comment