Monday, 16 January 2017

Understanding Disk Auto Assignment

There was a time when Disk Auto Assignment on NetApp FAS systems was either stack based or nothing. Then, in a version of ONTAP, we got 3 different autoassign policies:

[-autoassign-policy {default|bay|shelf|stack}] - Auto Assignment Policy

This parameter defines the granularity at which auto assign should work. This option is ignored if the -autoassign option is off. Auto assignment can be done at the stack/loop, shelf, or bay level. The possible values for the option are default, stack, shelf, and bay. The default value is platform dependent. It is stack for all non-entry platforms and single-node systems, whereas it is bay for entry-level platforms.

And that doesn’t tell the full story. Of course, there were some very good reasons why just having Auto Assign at a stack level was sub-optimal, since for a lot of systems you didn’t always have the luxury of dedicating entire disk stacks to nodes, then there were small HA pairs with a single internal shelf of disks, and AFF came along.

The following post is my understanding through 5 different scenarios.

Note: This is covered officially (differently) in the NetApp library - How automatic ownership assignment works for disks - which links to - Which disk autoassignment policy to use - and there’s a table:
Scenario 1) Bay

This works on Entry-Level systems only (FAS2XXX), and works by assigning even and odd disks to different nodes (even disks to node 2, odd disks to node 1), as in the diagram below:

Image: Auto Assignment Policy of bay
Note: I have tried to enable “bay” on a non-Entry-Level system, and got the following error:

cluster::> disk option modify -node * -autoassign-policy bay

Error: command failed on node "cluster-01": Failed to modify autoassignpolicy.
Error: command failed on node "cluster-02": Failed to modify autoassignpolicy.

Scenario 2) Half-Shelf Drive Assignment

This was not listed in the Disk Auto Assignment Policies above, but it does exist. Half-shelf drive assignment is an automatic policy for AFF systems only. Best practice (for performance reasons) with AFF is to assign half a shelf of disks to node 1, and the other half to node 2. See the diagram below:

Image: AFF Half-Shelf Drive Assignment
Scenario 3) Half-Stack

I can’t say for sure if this works or not (needs testing), but I’ve been informed “When there is only one stack that is shared between both nodes and an odd number of shelves, drives in the middle shelf will be assigned 50-50 to each node by default.”

Image: Half-Stack Drive Assignment
Scenario 4) Shelf

Shelf disk auto assignment policy works at a per-shelf level, as in the diagram below:

Image: Disk Auto Assignment at a Per-Shelf Level

Scenario 5) Stack

Finally, the traditional stack disk auto assignment policy works on a per-stack level, as in the diagram below:

Image: Disk Auto Assignment at a Per-Stack Level

Tuesday, 10 January 2017

7 to C Project Visio Diagrams - part II

Based on the (updated) diagrams in the previous post, here are the low level steps:

Low level steps i:
Cluster Build -> Cluster X -> Cluster Switches
- Racked
- Powered
- Correct OS and RCF Version
- Basic configuration
- Cluster Cabling
- Front-End Cabling
- Advanced Configuration

Low level steps ii:
Cluster Build -> Cluster X -> HA-Pair X
- Racked
- Back-end Cabling
- Front-end Cabling
- Powered
- Correct OS Version
- Basic Configuration
- Config Advisor (physical check)

Low level steps iii:
Cluster Build -> Cluster X -> Cluster Configuration
- DNS and NTP
- AutoSupport Configure and Test
- Storage Failover Settings
- Aggregates
- Cluster Networking
- Cluster Roles and Users
- NDMP Backup configuration
- Anti-Virus Configuration
- SSL Certificates

Low level steps iv:
Data SVM Build -> Cluster X -> Data SVM X
- Vserver Creation
- SVM Networking
- Anti-Virus
- Multi-Protocol
- Fpolicy
- Load-Sharing Mirrors
- Data Protection
- Volume Configurations

Low level steps v:
Testing -> Cluster X -> Cluster Switches
- Cluster Switch 1 Failure
- Cluster Switch 2 Failure

Low level steps vi:
Testing -> Cluster X -> HA-Pair X
- Config Advisor
- Local LIF Failover
- Controller Resiliency
- Node Failure
- Disk Failure

Low level steps vii:
Testing -> Cluster X -> General Cluster Test
- Cluster Mgmt LIF Failover
- Software Upgrade
- Non-NetApp IMM

Low level steps viii:
Testing -> Cluster X -> Data SVM X
- Data LIF Failover
- CIFS Protocol Tests
- NFS Protocol Tests
- Multi-Protocol Tests
- Anti Virus
- Volume Move
- Qtrees and Quotas
- Load-Sharing Mirrors
- DP Mirrors
- Data Restore
- NDMP Backup and Restore
- Disaster Recovery
- Storage Efficiency

Low level steps ix:
7 to C 7MTT CBT Migration Projects -> Project X -> Filer X -> Volume(s) List ->
- 7MTT Project Create
- 7MTT Prechecks
- 7MTT Start
- 7MTT Pre-Cutover Testing
- Schedule Cutover
- Change Control Approvals
- Client Readiness
- 7MTT Cutover
- Client Reconnect
- Post Cutover Tidy Up

Image: Phases of a 7 to C Project (some can run in parallel)

Monday, 9 January 2017

7 to C Project Visio Diagrams

The following Visio diagram images help to plan your 7 to C Migration Project (Enterprise NAS customers and 7MTT CBT migrations in mind here.) These are helpful if you’re a big picture person. Some of the diagrams can help for new ONTAP projects where there’s no 7 to C transitions involved.

The Visio diagrams help to layout an excel spreadsheet that will be the project planner/tracker. Adding the low level steps would make the Visio diagrams super massive and time consuming to maintain, hence the “low level steps X” which are detailed in part II of this post.

The colour scheme:

- Orange for things not yet started
- Yellow for things started/in progress
- Green for things finished
- Red for anything critical
- Grey for low-level steps
- Blue is an example of how the diagrams can be expanded

Note: In the diagrams below, only Orange, Grey, and Blue is used.

List of 7 to C Project Visio Diagrams

1) 7 to C Migration Preparation
2) ONTAP Infrastructure
3) Cluster Build
4) Data SVM Build
5) Testing
6) 7 to C 7MTT CBT Migration Projects

**Please click the image to make them larger**

Image: 1) 7 to C Migration Preparation

Image: 2) ONTAP Infrastructure

Image: 3) Cluster Build

Image: 4) Data SVM Build

Image: 5) Testing

Image: 6) 7 to C 7MTT CBT Migration Projects

Saturday, 7 January 2017

Something about the DS4486

The DS4486 is unique amongst NetApp disk shelves in having dual disk carriers - it has 24 bays of dual disk carriers, allowing 48 disks in a 6U enclosure (currently 6 to 10 TB MSATA drives are available - allowing up to 447 TiB physical in one enclosure). When a disk in the dual disk carrier fails, if the other disk in the carrier is part of another RAID Group (can’t have two disks in the same RAID group in the same carrier) then a well disk copy is undertaken to another disk so that the dual disk carrier can eventually be replaced (the disk failure isn’t flagged until the evacuation is complete, and both disks in the dual disk carrier are replaced at the same time - including the disk that was good.)

One best practice I learnt recently and struggled to find documented anywhere except in the Syslog Translator, is that:

All disks within a multidisk carrier should belong to one owner.

If you see dual disk carriers with number 1 disk assigned to say node 1, and 2 disk assigned to node 2, it is technically fine (it will work, it is supported), but it’s not best practice. Personally, I’m always keen to get disk autoassign working where possible, and disk autoassign would not work with number 1 disk to node 1, and 2 disk to node 2. Also, you can’t actually assign disks within a multidisk carrier to different owners without forcing it:

cluster::> disk assign -disk -owner cluster-01
cluster::> disk assign -disk -owner cluster-02

Error: command failed: Failed to assign disks. Reason: Unable to assign disk Another disk enclosed in the same disk carrier is assigned to another node or is in a failed state. All disks in one disk carrier should be assigned to the same node. Override is not recommended but is possible via the -force option.

Image: DS4486 Dual Disk Carrier
Somethings Else of Note:
1) You can only have maximum of 5 x DS4486 in a stack. The limitation is actually with the 240 disks, so you could for example have: 1 x DS4486 and 8 x DS4246’s in a stack (240 disks).
2) And recommended minimum spares are 4 per node that has DS4486 (since when one disk fails it effectively takes two disks out of action.)

1 HA-Pair, 1 Storage Pool, and 4 Flash Pool Aggregates

In the following scenario, we have a NetApp ONTAP 8.3.2 HA-Pair, with one DS2246 shelf half populated with 12 SSDs, and 4 pre-existing SATA data aggregates (2 per node). We will create a storage pool using 11 disks (leaving one SSD spare across the HA-Pair*), and use this one storage pool to hybrid-enable the 4 data aggregates.

Note: Most of this information can be got from the Physical Storage Management Guide.

The following commands:
- Determine the names of the spare SSDs
- Create the storage pool (simulate)
- Create the storage pool
- Show the storage pool
- Hybrid enable aggregate (x4)
- Add 1 storage pool allocation unit** to the aggregate (x4)
- Rename the aggregate to reflect that it is now hybrid SATA (x4)

storage aggregate show-spare-disk -disk-type SSD
storage pool create -storage-pool SP1 -disk-list disk1,disk2,...,disk11 -simulate true
storage pool create -storage-pool SP1 -disk-list disk1,disk2,...,disk11
storage pool show -storage-pool SP1
storage aggregate modify -aggregate N1_sata_1 -hybrid-enabled true
storage aggregate modify -aggregate N1_sata_2 -hybrid-enabled true
storage aggregate modify -aggregate N2_sata_1 -hybrid-enabled true
storage aggregate modify -aggregate N2_sata_2 -hybrid-enabled true
storage aggregate add N1_sata_1 -storage-pool SP1 -allocation-units 1
storage aggregate add N1_sata_2 -storage-pool SP1 -allocation-units 1
storage aggregate add N2_sata_1 -storage-pool SP1 -allocation-units 1
storage aggregate add N2_sata_2 -storage-pool SP1 -allocation-units 1
aggr rename -aggr N1_sata_1 -newname N1_hata_1
aggr rename -aggr N1_sata_2 -newname N1_hata_2
aggr rename -aggr N2_sata_1 -newname N2_hata_1
aggr rename -aggr N2_sata_2 -newname N2_hata_2

That’s it!

* “When a storage pool is used to provision cache, and each node has at least one allocation unit from the storage pool, only 1 spare SSD is needed for the HA pair. When dedicated SSDs are used for Flash Pool cache, each node needs a hot spare. There is no global spare for non-partitioned, non-shared drives.”
** Storage from an SSD storage pool is divided into 4 allocation units - hence one storage pool can be shared by up to 4 aggregates. In a HA-pair, initially each node has 2 allocation units - these allocation units can be reassigned.

Other notes:
i) Remember, once you’ve added a storage pool allocation unit (or 2/3/4) to an aggregate, you can’t delete the storage pool without first deleting the data aggregate.
ii) It is easy to add an SSD disk to a storage pool (storage pool add), but you cannot remove SSDs from the storage pool without deleting it.
iii) SSDs in a storage pool are partitioned into 4 partitions per disk, hence the 4 allocation units. This also allows the storage pool to be shared by RAID-4 and RAID-DP HDD aggregates as in the diagram below.

Image: SSD Storage Pool providing cache to two Flash Pool aggregates

iv) For flash pool throughput metrics use statistics cache flash-pool show