Clustered ONTAP Data Availability in the Event of 'Going Out of Quorum'

Something to be aware of in Clustered ONTAP (and pretty much all other clustered file systems), is that: if your cluster goes out of quorum, you will lose data availability until quorum is restored. The following post demonstrates this.

The Demonstration

We have a 4 node cluster running Data ONTAP 8.1.2P4. As per normal with a new cluster, epsilon exists on the first node that was entered into the cluster.

Note 1: All the epsilon does is add voting weight to the holder, so that - in normal circumstances - there will always be a victor in any election for quorum ownership. The epsilon node will have a voting weight of say 1.1, and all others 1.

Note 2: Epsilon can be moved, please contact NetApp Global Support (NGS) for this.

To see where Epsilon is, drop down into the advanced privilege level and run cluster show:

clust::> set -priv advanced

clust::*> cluster show

Node Health Eligibility Epsilon

-------- ------ ----------- -------

clust-01 true true true

clust-02 true true false

clust-03 true true false

clust-04 true true false

Also, run cluster ring show to display cluster node member's replication rings:

clust::cluster*> cluster ring show

Node UnitName Epoch DB Epoch DB Trnxs Master

--------- -------- ----- -------- -------- --------

clust-01 mgmt 5 5 259 clust-01

clust-01 vldb 3 3 60 clust-01

clust-01 vifmgr 3 3 145 clust-01

clust-01 bcomd 4 4 35 clust-01

clust-02 mgmt 5 5 259 clust-01

clust-02 vldb 3 3 60 clust-01

clust-02 vifmgr 3 3 145 clust-01

clust-02 bcomd 4 4 35 clust-01

clust-03 mgmt 5 5 259 clust-01

clust-03 vldb 3 3 60 clust-01

clust-03 vifmgr 3 3 145 clust-01

clust-03 bcomd 4 4 35 clust-01

clust-04 mgmt 5 5 259 clust-01

clust-04 vldb 3 3 60 clust-01

clust-04 vifmgr 3 3 145 clust-01

clust-04 bcomd 4 4 35 clust-01

In our test lab, we simply have two NFS volumes, one on clust-03, and one on clust-04, presented to an ESXi host as an NFS datastore.

Image: NFS datastores available (active)

Power Down of Epsilon Node and Partner

After powering down the Epsilon node - clust-01 - and clust-02, we get:

clust::*> cluster show

Node Health Eligibility Epsilon

-------- ------ ----------- -------

clust-01 false true true

clust-02 false true false

clust-03 false true false

clust-04 false true false

clust::*> cluster ring show

Node UnitName Epoch DB Epoch DB Trnxs Master

-------- -------- ----- -------- -------- ------

Warning: Unable to list entries on node clust-01. RPC: Port mapper failure - RPC: Timed out

Warning: Unable to list entries on node clust-02. RPC: Port mapper failure - RPC: Timed out

clust-03 mgmt 0 5 277 -

clust-03 vldb 0 3 60 -

clust-03 vifmgr 0 3 145 -

clust-03 bcomd 0 4 35 -

clust-04 mgmt 0 5 277 -

clust-04 vldb 0 3 60 -

clust-04 vifmgr 0 3 145 -

clust-04 bcomd 0 4 35 -

Note: You might lose connection to cluster management if it is on one of those two nodes, and it won’t fail over with the cluster being out of quorum. Connect via one of the surviving node management interfaces.

The Result

The NFS Datastores are unavailable.

Image: NFS datastores unavailable (inactive)

In this situation, the fix would simply be to bring up one of the downed nodes, thus restoring quorum, or contact NGS to move Epsilon.

Note: This is only applicable to clusters of greater than 2 nodes. In 2 node clusters, cluster ha is enabled with the command.

clust::*> cluster ha modify -configured true

This must be configured false for greater than 2 nodes.

Cosonok's IT Blog

Search This Blog

Clustered ONTAP Data Availability in the Event of 'Going Out of Quorum'

Comments

Post a Comment