Sunday, 27 May 2018

Tech Roundup - 27th May 2018

Stuff collected in the last month and a half since the last Tech Roundup. With headings:
AWS, Broadcom, Docker, Microsoft, NetApp, PowerShell, Storage Industry News, VMware, Miscellaneous

AWS

Amazon Web Services - Builders' Day London 2018 & Keynote presentation

Broadcom

Broadcom, NetApp & SUSE Announce Production Availability of the Industry’s First End-to-End NVMe over Fibre Channel Solution Enabling Groundbreaking Application Performance

Docker

Unveiling Docker Enterprise Edition 2.0:

Microsoft

ReFS is fully supported on ANY storage hardware that is listed on Microsoft HCL!
Note: The main obstacle for ReFS on SAN, is ReFS not yet having ODX and UNMAP implemented for SAN volumes, meaning these configurations are not HLK tested or certified. So, if you do need to use ODX, TRIM/UNMAP or Thin Provisioning for the underlying SAN LUNs, you should continue to use NTFS. Microsoft recommends turning these features off with ReFS to avoid potential issues. But some SANs enable thin provisioning by default, so it is important to keep this current restriction in mind.

Windows Admin Center aka Project "Honolulu" is now generally available!

Image: Windows Admin Center

For the first time ever, Microsoft will distribute its own version of Linux

NetApp

NetApp HCI Total Cost of Ownership (TCO) Study

Converged Versus Hyper Converged: What’s the Difference and What’s Right for Me?

How to Install and Scale Your NetApp SolidFire Cluster

How to Use Software Defined Storage in a ROBO Environment

10,000 Reasons Why You Need a Salesforce Backup

How Digital Transformation Paves the Way for Cloud-First Architectures

Great coverage on NetApp at Dell EMC World

SnapCenter YouTube Playlist

Deep Learning in Action: How Vincent Learned to Paint Like Van Gogh

Why SAP HANA Runs Better on NetApp: SAP System Refresh

NetApp: Spring Launch - May 2018

NetApp Cloud Volumes for Google Cloud Platform Strengthens Cloud Data Services Portfolio

Listed Alphabetically:

- 100GbE Cluster Interconnect and MetroCluster Switch
- Active IQ Services: New cloud-based predictive analytics services that deliver additional performance and data protection insights to customers.
- AFF A800: The industry’s first end-to-end NVMe-based enterprise all-flash system.
- AltaVault 4.4.1
- AFF A200
- Brocade 64-Port 32Gbps Blade
- Brocade G630 32GB Switch
- Digital Support 2.0: Our SupportEdge services combine live, cloud, and digital resources to deliver comprehensive support through Elio with AI powered predictive analytics.
- E-Series EF280
- FAS2720/FAS2750
- NetApp Manageability SDK 9.4
- OnCommand Unified Manager 9.4
- ONTAP 9.4: Power the newest AI and enterprise workloads with NVMe-accelerated performance, advanced cloud integration, and enhanced security features.
- SolidFire 18.2 Stack
- SolidFire VCP 4.0
- SSDs: The industry’s first 30TB SSDs: Shrink data center footprint and lower data center costs by storing over 2PB in a 2U shelf.
- StorageGRID 11.1: The number one data management solution for distributed organizations, now automates tamper proof retention of critical financial and personal data.

NetApp: The Pub

CloudMirror replication between StorageGRID WebScale and AWS

NetApp, Cisco and Red Hat announce OpenStack on FlexPod SolidFire!

Ansible support receives massively expanded ONTAP modules

Anonymous/Public Bucket Access in StorageGRID

Hear Ye! Hear Ye! Trident 18.04 is available now!

NetApp: Tech ONTAP Podcast

Episode 140: Quarterly Security Update - ONTAP 9.4 and GDPR

Episode 139: ONTAP 9.4 - NVMe and New Platforms

Episode 138: ONTAP 9.4 Overview

ONTAP 9.4 - Feature Overview

Episode 137: Name Services in ONTAP

Episode 136: Modernizing Dev and QA the NetApp Way

NetApp: TRs



PowerShell

Script Duplicate File Finder and Remover

Storage Industry News

Worldwide Converged Systems Revenue Increased 9.1% During the Fourth Quarter of 2017 with Vendor Revenue Reaching $3.6 Billion, According to IDC

VMware

VMware vSphere 6.7 is GA!

Windows 7 and 2008 virtual machines lose network connectivity on VMware Tools 10.2.0 (54483)

Miscellaneous

Logging with timestamp:

CLU::> set -prompt-timestamp inline

Image: Very Good Putty Session Logging Settings

Sunday, 6 May 2018

System Node Migrate-Root - Experience and Tips

I needed to test out ‘system node migrate-root’ in preparation for potentially using it prior to performing a headswap. No physical hardware here just an ONTAP 9.3 HA Simulator lab. These are some observations with a couple of tips.

Documentation

System Node Migrate-Root is documented here:

The syntax is very simple, for example:


cluster1::*>  system node migrate-root -node cluster1-01 -disklist VMw-1.17,VMw-1.18,VMw-1.19 -raid-type raid_dp

Warning: This operation will create a new root aggregate and replace the existing root on the node "cluster1-01". The existing root aggregate will be discarded.
Do you want to continue? {y|n}: y

Info: Started migrate-root job. Run "job show -id 86 -instance" command to
      check the progress of the job.


The process (see below) starts straight away.

The official documentation mentions:

The command starts a job that backs up the node configuration, creates a new aggregate, set it as new root aggregate, restores the node configuration and restores the names of original aggregate and volume. The job might take as long as a few hours depending on time it takes for zeroing the disks, rebooting the node and restoring the node configuration.

The Process and Timings

The SIM has tiny disks of 10.66GB and 28.44GB usable size. The 10.66GB disks were originally used for 3 disk root aggregates, and the migrate-root moved the root aggregate to the slightly larger virtual disks. On a physical system with much bigger than 28.44GB disks, I would expect the timings to be considerably longer than the below. The below timings are taken from acquiring the ‘Execution Progress’ string - from the job show output - every second.

0-27 seconds: Starting node configuration backup. This might take several minutes.
28-146 seconds: Starting aggregate relocation on the node "cluster1-02"
147-212 seconds: Rebooting the node to create a new root aggregate.
213-564 seconds: Waiting for the node to create a new root and come online. This might take a few minutes.
565-682 seconds: Making the old root aggregate online.
683-686 seconds: Copying contents from old root volume to new root volume.
687-864 seconds: Starting removal of old aggregate and volume and renaming the new root.
865-1653 seconds: Starting node configuration restore.
1654-1772 seconds: Enabling HA and relocating the aggregates. This might take a few minutes.
1773 seconds: Complete: Root aggregate migration successfully completed [0]

Nearly 30 minutes for migrate-root on one node of a tiny ONTAP 9.3RC1 HA SIM! And you still need to do a takeover/giveback of the node whose root aggregate was moved (see below).

Tips

1) The process disables HA and Storage Failover, and Aggregate Relocation is used to move the data aggregates to the node that’s staying up. The process does not move data LIFs, these will failover automatically, but I noticed a bit of a delay (my test CIFS share was down for 45 seconds), so I’d recommend moving data LIFs onto the node that’s going to stay up first.

2) I noticed - consistently - that if you run ‘system health alert show’ after the migrate-root completes, you get some weird output. Perform a takeover/giveback of the affected node after the migrate-root completes to correct this.


cluster1::*> system health alert show
This table is currently empty.

Warning: Unable to list entries for schm on node "cluster1-02": RPC: Remote
         system error [from mgwd on node "cluster1-02" (VSID: -1) to schm at
         127.0.0.1].
         Unable to list entries for shm on node "cluster1-02": RPC: Remote
         system error [from mgwd on node "cluster1-02" (VSID: -1) to shm at
         127.0.0.1].
         Unable to list entries for cphm on node "cluster1-02": RPC: Remote
         system error [from mgwd on node "cluster1-02" (VSID: -1) to cphm at
         127.0.0.1].
         Unable to list entries for cshm on node "cluster1-02": RPC: Remote
         system error [from mgwd on node "cluster1-02" (VSID: -1) to cshm at
         127.0.0.1].

cluster1::*> Replaying takeover WAFL log
May 06 14:01:04 [cluster1-01:monitor.globalStatus.critical:EMERGENCY]: This node has taken over cluster1-02.

cluster1::*> system health alert show
This table is currently empty.

cluster1::*>


Image: Remote system error [from mgwd on node...

PowerShell to Record Job Execution Progress Per Second

Sometimes when you’re doing stuff with NetApp ONTAP, you’ll get given a job-id and you’ll be curious to know all the phases the job goes through. So, I wrote this little program to find out just that. In the image - down the bottom - is my use case (which will be the subject of the next blog post.)


############################
## RecordJobExecution.ps1 ##
############################

Import-Module DataONTAP
$C = Read-Host "Enter Cluster FQDN/IP"
[Void](Connect-NcController $C)
$J = Read-Host "Enter Job ID"
while($TRUE){
  $D = Get-Date -uformat %Y%m%d
  $T = Get-Date -uFormat %T
  $P = (Get-NcJob -id $J).JobProgress
  "$D $T $P" >> execution_log.txt
  sleep 1
}


Image: Use Case: Recording per second the job execution progress of system node migrate-root

Saturday, 5 May 2018

One-Liner PowerShell to Record/Log CIFS Share Access Availability

I like one liners where you don’t need a PowerShell script/PS1 file. It might be a bit of a cheat using semi-colons, but you can still paste it into your PowerShell and run it as a one liner.

This one liner can be used to test availability of a CIFS share. Change the path variable as required.

while($TRUE){$T1 = Get-Date; $P = Test-Path "\\192.168.0.120\vol1"; $T2 = Get-Date; $TS = (New-TimeSpan -Start $T1 -End $T2).Seconds; If($TS -gt 1){"DOWN for $TS seconds" >> log.txt}; $D = Get-Date -uformat %Y%m%d; $T = Get-Date -uFormat %T; "$D $T $P" >> log.txt; sleep 1}

It will run until you Ctrl+C to exit. This is what it does:

- Gets a time T1
- Tests the path to the CIFS share (Test-Path has a long timeout)
- Gets a time T2
- If T2 minus T1 is greater than 1 second we log “DOWN for $TS seconds”
- Finally, we log a formatted date with the result of Test-Path
- Sleep for 1 second and repeat

Image: Screenshot of the log showing DOWN caused by intentionally downing the data LIF (IP) in NetApp ONTAP clustershell