HP P4000: Management Group Maintenance Mode, Volumes Not Available, and Cache Status Corrupt

on April 14, 2012

An interesting problem with a P4000 SAN last week!

Due to electrical maintenance work, a 2 node P4000 SAN cluster with failover manager, needed to be completely powered down, and this was done in the correct order – hosts accessing the SAN powered down first, then the SAN was shut down via the “Shut Down Management Group...” option in the HP StorageWorks P4000 Centralized Management Console (CMC).

Something went wrong when the SAN was powered back up, and the result was:

i: The Management Group was operating in Maintenance Mode.

ii: All the volumes (including the Network RAID tolerant volumes) had a status of 'Not Available'.

iii: One of the Storage Systems had a status of Failed.

iv: Running Diagnostic tests on the "Failed" storage system indicated that the Cache Status of Cache 1 was Corrupt.

The resolution is in two parts:

Part 1: Restoring the Management Group to Normal Mode

1.1 Right-click the Management Group and select 'Edit Management Group'

1.2 In the Edit Management Group dialog box, to the right of Group Mode: Maintenance Mode, click on the button 'Set To Normal'.

Part 2: Restoring the Cache Status back from Cache 1 is Corrupt to PASS

Completing Part 1 and Setting the Management Group back to Normal will restore access to volumes and allow services on those volumes (at least the Network RAID tolerant volumes) to resume. Part 2 requires 4th-line (or Manufacturer) support, and the below is how HP support will resolve the issue.

2.1 Use PuTTY or similar to SSH to the affected storage system and login with the root password and the correct challenge s/key.

Note: Access to the underlying CentOS Linux CLI is only available to HP Techs – only HP Tech Support have access to the root password and the tool to find the correct challenge s/key password.

2.2 At the # prompt type the hpasmcli command and press return, to enter the HP management CLI for Linux

2.3 At the hpasmcli> prompt type clear iml and press return

2.4 At the hpasmcli> prompt type exit and press return

2.5 At the # prompt, enter each line in turn

/etc/lefthand/system/./servicectl –stop-all

rm -f /etc/configs/Controller.cache.discarded

/etc/lefthand/system/./servicectl --start-all

2.6 Close and reopen the CMC and log back into the Management Group, and check the system is now healthy!

Below is the output from the PuTTY session:

Support Key: ??:??:??:??:??:??

Using keyboard-interactive authentication.

Password:

Using keyboard-interactive authentication.

challenge s/key 99 none28381

password:

[root@SAN02 ~]# hpasmcli

HP management CLI for Linux (v2.0)

hpasmcli> clear iml

IML Log successfully cleared.

hpasmcli> exit

[root@SAN02 ~]# /etc/lefthand/system/./servicectl --stop-all

response: hydra ok: mgmt-gw ok: dbd_agent ok: hpclimon ok: gcagent ok: eman ok: dbd_store ok: hplogmon ok: dbd_manager ok:

[root@SAN02 ~]# rm -f /etc/configs/Controller.cache.discarded

[root@SAN02 ~]# /etc/lefthand/system/./servicectl --start-all

response: hydra ok:started mgmt-gw ok:started dbd_agent ok:started hpclimon ok:started gcagent ok:started eman ok:started dbd_store ok:started hplogmon ok:started dbd_manager ok:started

[root@SAN02 ~]#

Credits:

My colleagues Ekim Vopall and Veets Sejon.

HP Storage Division Tech Support.

Comments

Anonymous1 May 2012 at 05:39
Hi Vidad,

On a similar note, I wish to mover my CMC installation to another server. Will removing management groups disconnect my current iSCSI/gateway sessions to my ESXi 5.0 servers?.

How does one move a CMC installation?

Regards

Bob
ReplyDelete
Replies
Anonymous18 May 2012 at 00:45
Hi Vidad,

I don't think Back-end Commands should be in public domain.
ReplyDelete
Replies
Carsonok18 May 2012 at 01:06
Hi Anonymous,

Thank you for the comment.

All the commands do is clear the iml log, stop a service, use a linux command to delete a file, and then start a service; these are troubleshooting abilities that are commonly available to sysadmins on other systems....

It seems a shame when things are locked down in such a way that it forces a sysadmin to use manufacturer support, but I understand that this is done to prevent customers from doing serious damage to their systems.

Even with the information in this post, a customer must still call HP as this is the only way past the root and challenge key passwords.

Thank you and have a good day!

VCosonok
ReplyDelete
Replies
Anonymous7 September 2012 at 13:52
Or do what I do and mount the Virtual SAN Appliance into another Linux host and disable their SSH password changer.
ReplyDelete
Replies
Anonymous19 October 2012 at 16:08
hi, and exactly how can I contact the HP Tech Support? I need the root's password
ReplyDelete
Replies
Anonymous22 October 2012 at 07:27
what about this?, turn on the san operating system with a linux live cd, mount the san filesystem, chroot and then "rm -f /etc/configs/Controller.cache.discarded"?
ReplyDelete
Replies
Anonymous23 October 2012 at 18:45
Hi, I have this problem and I need your help. A SAN from my P4300 is returning the following message 'process "dbd_manager" is using obsolete setsockopt SO_BSDCOMPAT'. The main problem is about that node is writing this message in the CMC 'storage system offline, san/iq disconnected'. Man, I've done everything, but obviously I miss something. Can you help me?
ReplyDelete
Replies
khen1 April 2013 at 02:18
Hi.. I have different situation here..hope someone already experience our problem. Suddenly the clustered Volume was not recognize by the host as NTFS Format.it became RAW and asking me to format the drive before I can use it but base on the CMC Volume was still there..Any solution to solve this issue? I will appreciate anyone's idea about this.

Thank you in advance.
ReplyDelete
Replies
buddhika25 September 2013 at 03:20
I have P4000 with SAN/iq version 10 installed. Can't remember the CMC admin login. how to reset to default?
ReplyDelete
Replies
Unknown10 November 2013 at 03:57
This comment has been removed by the author.
ReplyDelete
Replies
Unknown10 November 2013 at 04:14
I have a existing cluster with 3 VSA's in it. I have added additional disks on all of the 3 VSA's so that I can expand my SAN size. On all of the 3 VSA's in CMC i can see the new disk as uninitialized. How do I expand my SAN? Thanks.
ReplyDelete
Replies
Test7 March 2017 at 00:23
Was successfully able to delete the required file using a live CD to overcome this issue (also cleared the IML via ILO rather than using the commands).
ReplyDelete
Replies

Add comment

Cosonok's IT Blog

Search This Blog

HP P4000: Management Group Maintenance Mode, Volumes Not Available, and Cache Status Corrupt

Comments

Post a Comment