A Fun Night In with a FAS3140A: This Job went a little bit Pete Tong

The Story

A dual-controller FAS3140 (HA Pair.) 11 x DS14 shelves in 5 loops of:

2 x Mk2-AT (Single Path), 2 x Mk4-FC (Single Path), 3 x Mk2-AT (MPHA), 2 x Mk2-AT (Single Path), 2 x Mk2-FC (Single path)

Onboard unused FC target ports disabled and type changed from target to initiator via the below commands (ports get automatically enabled on reboot, cannot enable before – this is 7DOT 8.0.2P5):
*this is so we can later use them for MPHA cabling of fibre cabled shelves since an FC HBA needs to be removed to make way for a SAS one!

fcadmin config –d 0a
fcadmin config –d 0a
fcadmin config –d 0a
fcadmin config –d 0a

fcadmin config –t initiator 0a
fcadmin config –t initiator 0a
fcadmin config –t initiator 0a
fcadmin config –t initiator 0a

Controllers shutdown with (the same commands run on each controller except the cf disable done on only one, and snapmirror off additionally done on any remote controller with a snapmirror pull* relationship with this pair):
*in 7-mode snapmirror is pull, C-mode it is push!

options autosupport.doit “Maintenance window!”
options autosupport.enable off
snapmirror off
cifs terminate
cf disable
halt

Both controllers’ hardware was modified by removing from slot 1 the Quad port FC HBA, and replacing with a Quad port SAS HBA (as per Hardware Universe slot recommendations.)

The 11 x DS14 shelves were re-cabled in 3 loops (5 loops consolidated into 3) of:

3 x Mk2-AT (MPHA), 4 x Mk2-AT (MPHA), 4 x Mk2-FC (MPHA)

Two new DS4243 shelves were cabled in an MPHA stack.

Shelf IDs set.

Ready to power up!

Shelves powered up (a few power-cycles on some of the old DS14’s to get the shelf ID to stay solid)!

Controllers powered up (a 5 minutes wait after the shelves)!

Controller 1 came up fine and we ran:

aggr status
vol status
storage show disk –p

Which showed all aggregates online, all volumes online, and alls disk as multipathed via A and B paths.

Controller 2 Failed to Boot

Had we done something wrong?

After looping through the boot process a couple of times it hit a:

PANIC Uncorrectable Machine Check Error CPU0

And came to a final resting place at the LOADERA> prompt.
Running the below from the LOADER prompt:

boot_diags

First thing we see is a message stating:

Failed NVRAM module. Powercycle system. If message persists replace motherboard.

Further digging into the boot_diags > mb > NVRAM – “NVRAM IB0 failed to initialize / uninitialized

So, We Did a Few Things

Powercycled the controller.
Shutdown the controller, removed the NVRAM battery, reseated the NVRAM memory.

But all to no avail!

So a motherboard tray was ordered from support (X3540-R5) and later replaced.

In the meantime we had half our services down with this being a dual-controller HA pair running on one controller with and cf disabled. The solution was to run:

cf forcetakeover

So, all services on the downed partner controller became available on the one surviving controller.

END OF STORY!

Actually, not quite the end of the story! After doing the cf forcetakeover and getting some services back up, alas - a bit of a comedy of errors here - the maintenance engineers - who had to move 2 PDUs to make room for the failed controller to slide out - managed to pull out both power cables from the FAS3140A! Fortunately, the surviving controller survived its unplanned hard-reset and came back up again without any complaint.

END OF STORY!!

A bit of background to the story:
These controllers and shelves had not been powered off in a good couple of years, and were over 4 years old. It’s almost to be expected that machinery that’s running happily when nice, warm and cosy; then is shut down and gets cold; on power up it might have a few grumbles (thermal expansion > contraction > expansion again.)

Comments

  1. There are plenty connected with businesses who're increasing to employ more professionals inside most of these fields to help you develop the actual good results on the company.
    actuaries Jobs

    ReplyDelete

Post a Comment