Tuesday, 31 January 2017

NetApp StorageGRID Webscale Brief Notes

I was originally planning to write a ‘StorageGRID Webscale Primer’ blog post, but in doing my research I found there is an official and very good ~70 page “Grid Primer” (see StorageGRID Webscale 10.3 documentation) which you should read; hence this post became brief notes and links instead. Current (as at 2017.01.31) latest version of StorageGRID Webscale is 10.3.0.

0) Contents

1) Interoperability Matrix Tool Results
2) Software and Documentation Links
3) Brief Notes on SGW

1) Interoperability Matrix Tool Results

http://mysupport.netapp.com/matrix/#welcome > Component Search > StorageGRID
StorageGRID Webscale 10.3 supported on/with:
Hypervisor (OS):
VMware ESXi 6.0u1 / 6.0 / 5.5u2 / 5.5u1 / 5.5 / 5.1u3 / 5.1u2 / 5.1u1 / 5.1 / 5.0u3 / 5.0u2 / 5.0u1 / 5.0 & OpenStack Kilo
TSM Client (Backup)
 IBM Tivoli Storage Manager 6.4.1
Protocol:
SMB 2.1, SMB 2.0, NFS v3.0, HTTPS
Appliance (Storage Node):
StorageGRID Appliance 5660/5612
StorageGRID NAS Bridge:
Storage GRID NAS Bridge 2.0.1
API:
Swift 1.0, S3 2006-03-01, CDMI v1.02, CDMI v1.01

2) Software and Documentation Links

StorageGRID Webscale download link:

StorageGRID Webscale 10.3.0 download link:
This link contains links to:
- StorageGRID-Webscale-10.3.0-20160818.2333.94beb49.tgz
- SGW 10.3 Appliance Installation and Maintenance Guide
- SGW 10.3 Software Installation Guide for OpenStack Deployments
- SGW 10.3 Software Installation Guide for VMware Deployments
Note: The download is 3.4GB

StorageGRID Webscale documentation link:

StorageGRID Webscale 10.3 documentation link:
This link contains includes links to:
- Release Notes for StorageGRID Webscale 10.3.0
- Release Notes for StorageGRID Webscale NAS Bridge 2.0.1
- Administrator Guide
- Appliance Installation and Maintenance Guide
- Audit Message Reference
- Cloud Data Management Interface Implementation Guide
- Expansion Guide for OpenStack Deployments
- Expansion Guide for VMware Deployments
- Grid Primer
- Maintenance Guide for OpenStack Deployments
- Maintenance Guide for VMware Deployments
- NAS Bridge 2.0 Administration Guide
- NAS Bridge 2.0 Installation and Setup Guide
- NAS Bridge 2.0 Management API Guide
- Simple Storage Service Implementation Guide
- Software Installation Guide for OpenStack Deployments
- Software Installation Guide for VMware Deployments
- Software Upgrade Guide
- Swift Implementation Guide
- Troubleshooting Guide

StorageGRID Webscale:

Field Portal:

Also, don’t forget to check out NetApp University (learningcenter.netapp.com) and Lab on Demand (labondemand.netapp.com) if you have access.

3) Brief Notes on SGW

Introduction to the StorageGRID Webscale System
- SGW is a massively scalable, software-defined object-based storage solution for media-intensive workloads in the form of video, images, and PDF documents
- SGW has 10+ years of production object-storage deployments
- SGW has the industry’s most advanced policy framework for data lifecycle management
- SGW has a distributed storage architecture
- SGW has true geographically distributed and geographically selective object placement
- SGW offers the flexibility for customers to choose their storage hardware
- SGW also allows tape or public cloud as an active tier (object-granularity retrieval from tape and cloud)
- An object-based storage system does not organize objects in a hierarchal structure.
- What is object-based storage? Think of the Valet Parking Analogy...
- Typical use cases:
-- Web Data Repositories
-- Data Archives
-- Media Repositories

Deployment Topologies, Grid Nodes, and Services
- Single Data Center Site (Grid Nodes and Primary Admin Node in one Data Center Site)
- Multiple Data Center Sites (Multiple Datacenters, each with Grid Nodes and Primary Admin Node)
- Grid Nodes:
-- Admin Node
-- API Gateway Node (Optional)
-- Storage Node
-- Archive Node (Optional)
- Admin Node services:
-- NMS: Network Management System
-- CMN: Configuration Management Node
-- AMS: Audit Management System
-- SSM: Server Status Monitor
- Storage Node services:
-- LDR: Local Distribution Router
-- DDS: Distributed Data Store
-- ADC: Administrative Domain Controller
-- CMS: Content Management System
-- SSM: Server Status Monitor
Note: Each Data Center Site should have - at a minimum - 3 Storage Nodes.
- API Gateway Node services:
-- CLB: Connection Load Balancer
-- SSM: Server Status Monitor
- Archive Node services:
-- ARC: The Archive service
-- SSM: Server Status Monitor
- For storage and retrieval operations, Archive Nodes can be configured to interface with either Amazon Simple Storage Service (S3) or Tivoli Storage Manager (TSM)

Data Management
- A ‘Storage Pool’ is a logical grouping of storage media
- Metadata is the data that describes the object data that is stored in a StorageGRID Webscale system
- Metadata is used to create the matching criteria for ILM rule filters
- An ILM rule is not evaluated until it is activated in the ILM policy configuration
- An Information Lifecycle Management (ILM) rule is a description of when and where to store objects, how many copies are stored, and for how long
- ‘Storage Grade’ is the type of storage media
- Erasure coding is a data protection method that is best applied to objects larger than 1MB
- An erasure-coded copy contains portions of object data (data fragments) and information for reconstructing object data (parity fragments)

Data Flow
- SGW supports the APIs Swift, S3 and CDMI
- SGW supports standard RESTful HTTP protocols and interfaces
- SGW delivers NFS and SMB
- When a client application sends a delete request to the StorageGRID Webscale system, a content handle is released. However, if no ILM rule exists that requests the purge of the object, the object is not deleted permanently.

Information Lifecycle Management: Key Concepts
Storage Pool = A storage pool is a logical grouping of Storage Nodes or Archive Nodes. These logical groupings can be located at a single data center site or spread across multiple sites.
Link Cost Group = Link cost refers to the relative cost of communication between data center sites within the StorageGRID Webscale system. Link cost is used to determine which grid nodes should provide a requested service.
Replication = A data protection method that duplicates complete instances of object data and stores the data on distributed storage pools. Replication is best suited for small or frequently accessed objects.
Erasure Coding = A data protection method that protects object data from loss by splitting an object into data fragments. Erasure coding is best suited for objects that average more than 1 MB in size.
ILM Rule = A set of instructions for placing objects in the StorageGRID Webscale system over time. The rules defines when, where, and how to store objects, how many copies to store, and for how long by evaluating object metadata.
ILM Policy = A set of prioritized ILM rules. Multiple ILM rules can be configured to manage content placements for various types of objects. The logical order of ILM rules is defined by an ILM policy.

Network Management System
- CLB HTTP Port 8080
- CLB S3 Port 8082
- LDR HTTP Port 18080
- LDR S3 Port 18082
- On a grid node, the ‘Resources’ component under the SSM service provides the IP address of the grid node
- Under Grid Configuration, select the ‘Storage’ component to view the port numbers
- With S3 API, certificates and security partitions is the mechanism for controlling access to objects
- With CDMI, certificates and security partitions, is the mechanism for controlling access to objects
- Object metadata is always mapped to object store 0
- The key function of the SSM service is monitoring the condition of a server and related hardware
- The SSM service is present on every Storage Node
- In the NMS MI, if a service icon is blue, it indicates that the service is in an unknown state
- In the NMS MI, the Unknown service state overrides the Administratively Down service state
- A global custom alarm against an attribute does not override a custom alarm or a default alarm that was configured against this attribute
- The Audit Management System service on the Admin node manages audit logs
- If an object does not match the filter criteria of any ILM rule in the active ILM policy, the content placement instructions defined by the default ILM rule are applied to the object
- After a new ILM policy is activated, this new active ILM policy is not applied to all previously ingested objects.
- The function of the ‘Re-evaluate Content’ button that is located on the Proposed Policy page in the Configuration tab of the ILM Management branch, is to trigger the SGW system to apply the active ILM policy to all objects ingested before this ILM policy was activated.

Server Manager Command Shell (CLI) Notes
- Noteable key presses: Ctrl+Alt+F1 and [Space], Alt+F7


/etc/init.d/servermanager status
/usr/local/servermanager/reader.rb
/etc/init.d/cms stop
/etc/init.d/cms status
/etc/init.d/cms start
/etc/init.d/ldr restart
/etc/init.d/ldr status
storagegrid-status # Dynamic Status Update (Ctrl+C to escape)
/etc/init.d/servermanager stop # stop all services (graceful)
shutdown -h now # shutdown the node (after stopping services)
/etc/init.d/servermanager start
/etc/init.d/servermanager restart


Miscellaneous
- If you enable security partitions you cannot disable them
- As of 10.x, StorageGRID is StorageGRID Webscale. Earlier versions were just StorageGRID.

Tuesday, 24 January 2017

We Can Assign Disks for you Wholesale (DiskAutoAssigner.ps1)

Following on from the subject matter of the previous post Understanding Disk Auto Assignment, here’s a Proof of Concept PowerShell script for those situations where disk Auto Assign won’t work. It’s a fairly simple script. Run it as>

.\DiskAutoAssigner.ps1 -Clusters {enter a comma separated list of clusters here - or just the one}

An example of using it is illustrated in the image below.

Image: Example of DiskAutoAssigner.ps1
First run: We target two clusters. We only have the credential for one cluster so the program just connects to that. We’ve not connected to this cluster before so we save a disk assignments XML file.

Second run: Like above, but since this is our second/a subsequent run, we have a disk history file for 10.3.5.50. The program checks for unassigned disks, and checks these unassigned disks against the previous run. If there are no unassigned disks that weren’t previously assigned the program does nothing.

Third run: I’d unassigned disk NET-2.20, and the program detects this and assigns it back to the original owner.

Note: This was tested against a NetApp Clustered ONTAP 2-node 8.3.2 SIM cluster.

The Script

Copy into a text editor and save as DiskAutoAssigner.ps1.


Param([Parameter(Mandatory=$True)][System.Array]$Clusters)

FUNCTION Wr{
  Param([String]$P="",[String]$C="WHITE")
  Write-Host $P -ForegroundColor $C
  [String]$Category = "INFO "
  If($C -eq "RED"){ $Category = "ERROR" }             
  ([String](Get-Date -uformat %c) + " | " + $Category + " | " + $P) >> "DiskAutoAssigner.log.txt"
}

If(!(Get-Module DataONTAP)){ [Void](Import-Module DataONTAP -ErrorAction SilentlyContinue) }
If(!(Get-Module DataONTAP)){ Wr "Failed to load DataONTAP PSTK!" RED; EXIT }
Wr "Loaded DataONTAP PSTK" GREEN

[System.Object]$FoundCredential = @{}
$Clusters | Foreach {
  [Boolean]$FoundCredential.$_ = $FALSE
  If(Get-NcCredential $_){ $FoundCredential.$_ = $TRUE }
  else{ Wr "No credential for cluster $_ - supply credentials with Add-NcCredential!" RED }
}

$Clusters | Foreach {
  $Global:CurrentNcController = $NULL
  If($FoundCredential.$_){
    [Void](Connect-NcController $_ -ErrorAction SilentlyContinue)
    If(!$Global:CurrentNcController){ Wr "Failed to connnect to $_!" RED }
    else{
      Wr "Connected to $_" GREEN
      $SavedDisks = $NULL
      If(Test-Path "$_.disklist.xml"){
        Wr "Found disk history file $_.disklist.xml" GREEN
        $SavedDisks = Import-Clixml "$_.disklist.xml"
        $DiskAttrs = Get-NcDisk -Template
        Initialize-NcObjectProperty -Object $DiskAttrs -Name DiskOwnershipInfo
        $DiskAttrs.DiskOwnershipInfo.OwnerNodeName = ""                   
        $Disks = Get-NcDisk -Attributes $DiskAttrs
        $Unassigned = $Disks | where{ $_.DiskOwnershipInfo.OwnerNodeName -eq $NULL }
        Foreach ($Disk in $Unassigned){
          $SavedDisk = $SavedDisks | Where{ $_.Name -eq $Disk.Name }
          If($SavedDisk.DiskOwnershipInfo.OwnerNodeName){
            [Void](Set-NcDiskOwner $Disk.Name -Owner $SavedDisk.DiskOwnershipInfo.OwnerNodeName)
            Wr ("Assigned Disk " + $Disk.Name + " to " + $SavedDisk.DiskOwnershipInfo.OwnerNodeName) CYAN
          }
        }
      }
      Get-NcDisk -Attributes $DiskAttrs | Export-Clixml "$_.disklist.xml"
      Wr "Saved disk history file $_.disklist.xml" GREEN
    }
  }
}


Monday, 16 January 2017

Understanding Disk Auto Assignment

There was a time when Disk Auto Assignment on NetApp FAS systems was either stack based or nothing. Then, in a version of ONTAP, we got 3 different autoassign policies:


[-autoassign-policy {default|bay|shelf|stack}] - Auto Assignment Policy

This parameter defines the granularity at which auto assign should work. This option is ignored if the -autoassign option is off. Auto assignment can be done at the stack/loop, shelf, or bay level. The possible values for the option are default, stack, shelf, and bay. The default value is platform dependent. It is stack for all non-entry platforms and single-node systems, whereas it is bay for entry-level platforms.


And that doesn’t tell the full story. Of course, there were some very good reasons why just having Auto Assign at a stack level was sub-optimal, since for a lot of systems you didn’t always have the luxury of dedicating entire disk stacks to nodes, then there were small HA pairs with a single internal shelf of disks, and AFF came along.

The following post is my understanding through 5 different scenarios.

Note: This is covered officially (differently) in the NetApp library - How automatic ownership assignment works for disks - which links to - Which disk autoassignment policy to use - and there’s a table:
Scenario 1) Bay

This works on Entry-Level systems only (FAS2XXX), and works by assigning even and odd disks to different nodes (even disks to node 2, odd disks to node 1), as in the diagram below:

Image: Auto Assignment Policy of bay
Note: I have tried to enable “bay” on a non-Entry-Level system, and got the following error:

cluster::> disk option modify -node * -autoassign-policy bay

Error: command failed on node "cluster-01": Failed to modify autoassignpolicy.
Error: command failed on node "cluster-02": Failed to modify autoassignpolicy.


Scenario 2) Half-Shelf Drive Assignment

This was not listed in the Disk Auto Assignment Policies above, but it does exist. Half-shelf drive assignment is an automatic policy for AFF systems only. Best practice (for performance reasons) with AFF is to assign half a shelf of disks to node 1, and the other half to node 2. See the diagram below:

Image: AFF Half-Shelf Drive Assignment
Scenario 3) Half-Stack

I can’t say for sure if this works or not (needs testing), but I’ve been informed “When there is only one stack that is shared between both nodes and an odd number of shelves, drives in the middle shelf will be assigned 50-50 to each node by default.”

Image: Half-Stack Drive Assignment
Scenario 4) Shelf

Shelf disk auto assignment policy works at a per-shelf level, as in the diagram below:

Image: Disk Auto Assignment at a Per-Shelf Level

Scenario 5) Stack

Finally, the traditional stack disk auto assignment policy works on a per-stack level, as in the diagram below:

Image: Disk Auto Assignment at a Per-Stack Level

UPDATE: Another possibility?

Scenario 6) AFF Quarter Shelf Drive Assignment

There might be a 6th scenario. This is where you start with a half-shelf, 12 SSD AFF, and later expand to 24 SSDs in the shelf. Initially, with 12 SSDs, 0-5 are assigned to Node A, and 6-11 are assigned to Node B. When you expand with 12 more SSDs, 12-17 are assigned to Node A, and 18-23 are assigned to Node B.

Image: Expanding AFF 12-drive system to 24 SSDs

Tuesday, 10 January 2017

7 to C Project Visio Diagrams - part II

Based on the (updated) diagrams in the previous post, here are the low level steps:

Low level steps i:
Cluster Build -> Cluster X -> Cluster Switches
- Racked
- Powered
- Correct OS and RCF Version
- Basic configuration
- Cluster Cabling
- Front-End Cabling
- Advanced Configuration

Low level steps ii:
Cluster Build -> Cluster X -> HA-Pair X
- Racked
- Back-end Cabling
- Front-end Cabling
- Powered
- Correct OS Version
- Basic Configuration
- Config Advisor (physical check)

Low level steps iii:
Cluster Build -> Cluster X -> Cluster Configuration
- DNS and NTP
- AutoSupport Configure and Test
- Storage Failover Settings
- Aggregates
- Cluster Networking
- Cluster Roles and Users
- NDMP Backup configuration
- Anti-Virus Configuration
- SSL Certificates

Low level steps iv:
Data SVM Build -> Cluster X -> Data SVM X
- Vserver Creation
- SVM Networking
- CIFS
- Anti-Virus
- NFS
- Multi-Protocol
- Fpolicy
- Load-Sharing Mirrors
- Data Protection
- Volume Configurations

Low level steps v:
Testing -> Cluster X -> Cluster Switches
- Cluster Switch 1 Failure
- Cluster Switch 2 Failure

Low level steps vi:
Testing -> Cluster X -> HA-Pair X
- Config Advisor
- Local LIF Failover
- Controller Resiliency
- Node Failure
- Disk Failure

Low level steps vii:
Testing -> Cluster X -> General Cluster Test
- OCUM
- OPM
- Cluster Mgmt LIF Failover
- Software Upgrade
- Non-NetApp IMM

Low level steps viii:
Testing -> Cluster X -> Data SVM X
- Data LIF Failover
- CIFS Protocol Tests
- NFS Protocol Tests
- Multi-Protocol Tests
- Anti Virus
- Volume Move
- Qtrees and Quotas
- Load-Sharing Mirrors
- DP Mirrors
- Data Restore
- NDMP Backup and Restore
- Disaster Recovery
- Storage Efficiency
- OCUM
- OPM

Low level steps ix:
7 to C 7MTT CBT Migration Projects -> Project X -> Filer X -> Volume(s) List ->
- 7MTT Project Create
- 7MTT Prechecks
- 7MTT Start
- 7MTT Pre-Cutover Testing
- Schedule Cutover
- Change Control Approvals
- Client Readiness
- 7MTT Cutover
- Client Reconnect
- Post Cutover Tidy Up

Image: Phases of a 7 to C Project (some can run in parallel)

Monday, 9 January 2017

7 to C Project Visio Diagrams

The following Visio diagram images help to plan your 7 to C Migration Project (Enterprise NAS customers and 7MTT CBT migrations in mind here.) These are helpful if you’re a big picture person. Some of the diagrams can help for new ONTAP projects where there’s no 7 to C transitions involved.

The Visio diagrams help to layout an excel spreadsheet that will be the project planner/tracker. Adding the low level steps would make the Visio diagrams super massive and time consuming to maintain, hence the “low level steps X” which are detailed in part II of this post.

The colour scheme:

- Orange for things not yet started
- Yellow for things started/in progress
- Green for things finished
- Red for anything critical
- Grey for low-level steps
- Blue is an example of how the diagrams can be expanded

Note: In the diagrams below, only Orange, Grey, and Blue is used.

List of 7 to C Project Visio Diagrams

1) 7 to C Migration Preparation
2) ONTAP Infrastructure
3) Cluster Build
4) Data SVM Build
5) Testing
6) 7 to C 7MTT CBT Migration Projects

**Please click the image to make them larger**

Image: 1) 7 to C Migration Preparation

Image: 2) ONTAP Infrastructure

Image: 3) Cluster Build

Image: 4) Data SVM Build

Image: 5) Testing

Image: 6) 7 to C 7MTT CBT Migration Projects

Saturday, 7 January 2017

Something about the DS4486

The DS4486 is unique amongst NetApp disk shelves in having dual disk carriers - it has 24 bays of dual disk carriers, allowing 48 disks in a 6U enclosure (currently 6 to 10 TB MSATA drives are available - allowing up to 447 TiB physical in one enclosure). When a disk in the dual disk carrier fails, if the other disk in the carrier is part of another RAID Group (can’t have two disks in the same RAID group in the same carrier) then a well disk copy is undertaken to another disk so that the dual disk carrier can eventually be replaced (the disk failure isn’t flagged until the evacuation is complete, and both disks in the dual disk carrier are replaced at the same time - including the disk that was good.)

One best practice I learnt recently and struggled to find documented anywhere except in the Syslog Translator, is that:

All disks within a multidisk carrier should belong to one owner.

If you see dual disk carriers with number 1 disk assigned to say node 1, and 2 disk assigned to node 2, it is technically fine (it will work, it is supported), but it’s not best practice. Personally, I’m always keen to get disk autoassign working where possible, and disk autoassign would not work with number 1 disk to node 1, and 2 disk to node 2. Also, you can’t actually assign disks within a multidisk carrier to different owners without forcing it:


cluster::> disk assign -disk 1.11.19.1 -owner cluster-01
cluster::> disk assign -disk 1.11.19.2 -owner cluster-02

Error: command failed: Failed to assign disks. Reason: Unable to assign disk 1.11.19.2. Another disk enclosed in the same disk carrier is assigned to another node or is in a failed state. All disks in one disk carrier should be assigned to the same node. Override is not recommended but is possible via the -force option.


Image: DS4486 Dual Disk Carrier
Somethings Else of Note:
1) You can only have maximum of 5 x DS4486 in a stack. The limitation is actually with the 240 disks, so you could for example have: 1 x DS4486 and 8 x DS4246’s in a stack (240 disks).
2) And recommended minimum spares are 4 per node that has DS4486 (since when one disk fails it effectively takes two disks out of action.)

1 HA-Pair, 1 Storage Pool, and 4 Flash Pool Aggregates

In the following scenario, we have a NetApp ONTAP 8.3.2 HA-Pair, with one DS2246 shelf half populated with 12 SSDs, and 4 pre-existing SATA data aggregates (2 per node). We will create a storage pool using 11 disks (leaving one SSD spare across the HA-Pair*), and use this one storage pool to hybrid-enable the 4 data aggregates.

Note: Most of this information can be got from the Physical Storage Management Guide.

The following commands:
- Determine the names of the spare SSDs
- Create the storage pool (simulate)
- Create the storage pool
- Show the storage pool
- Hybrid enable aggregate (x4)
- Add 1 storage pool allocation unit** to the aggregate (x4)
- Rename the aggregate to reflect that it is now hybrid SATA (x4)


storage aggregate show-spare-disk -disk-type SSD
storage pool create -storage-pool SP1 -disk-list disk1,disk2,...,disk11 -simulate true
storage pool create -storage-pool SP1 -disk-list disk1,disk2,...,disk11
storage pool show -storage-pool SP1
storage aggregate modify -aggregate N1_sata_1 -hybrid-enabled true
storage aggregate modify -aggregate N1_sata_2 -hybrid-enabled true
storage aggregate modify -aggregate N2_sata_1 -hybrid-enabled true
storage aggregate modify -aggregate N2_sata_2 -hybrid-enabled true
storage aggregate add N1_sata_1 -storage-pool SP1 -allocation-units 1
storage aggregate add N1_sata_2 -storage-pool SP1 -allocation-units 1
storage aggregate add N2_sata_1 -storage-pool SP1 -allocation-units 1
storage aggregate add N2_sata_2 -storage-pool SP1 -allocation-units 1
aggr rename -aggr N1_sata_1 -newname N1_hata_1
aggr rename -aggr N1_sata_2 -newname N1_hata_2
aggr rename -aggr N2_sata_1 -newname N2_hata_1
aggr rename -aggr N2_sata_2 -newname N2_hata_2


That’s it!

Notes:
* “When a storage pool is used to provision cache, and each node has at least one allocation unit from the storage pool, only 1 spare SSD is needed for the HA pair. When dedicated SSDs are used for Flash Pool cache, each node needs a hot spare. There is no global spare for non-partitioned, non-shared drives.”
** Storage from an SSD storage pool is divided into 4 allocation units - hence one storage pool can be shared by up to 4 aggregates. In a HA-pair, initially each node has 2 allocation units - these allocation units can be reassigned.

Other notes:
i) Remember, once you’ve added a storage pool allocation unit (or 2/3/4) to an aggregate, you can’t delete the storage pool without first deleting the data aggregate.
ii) It is easy to add an SSD disk to a storage pool (storage pool add), but you cannot remove SSDs from the storage pool without deleting it.
iii) SSDs in a storage pool are partitioned into 4 partitions per disk, hence the 4 allocation units. This also allows the storage pool to be shared by RAID-4 and RAID-DP HDD aggregates as in the diagram below.

Image: SSD Storage Pool providing cache to two Flash Pool aggregates


iv) For flash pool throughput metrics use statistics cache flash-pool show