Saturday, 20 February 2021

NetApp (Broadcom) BES-53248 Cluster Switch Notes: How to Setup

The setup of the Broadcom BES-53248 as a NetApp ONTAP Cluster Switch, is very similar to the setup of the NetApp CN1610 cluster switch (which always was a Broadcom switch, just re-badged NetApp). So this post is quite similar to 2017's: NetApp CN1610 Cluster Switch Notes: How to Setup.

BES-53248 and License Options

The default license (or no license), allows by default 16 * 10/25 GbE ports and 2 * 40/100 ports (for the ISL). This blog post will assume we're just following the default setup. The RCF/ports configuration needs to be modified if you have licenses to apply.

Image: BES-53248 Cluster Switch


BES Software

Alas, it's no longer possible to obtain the switch software (called EFOS) from NetApp's website, you get directed to:
https://www.broadcom.com/support/bes-switch
You'll need to register for an account. If you don't have an account you need to email:

BES-Support@techdata.com

What you can do from the NetApp website is download the RCF file, SHM_Broadcom_BES_53248, and review the switch compatibility matrix for your version of ONTAP:
https://mysupport.netapp.com/site/products/all/details/broadcom-cluster-switches/downloads-tab
https://mysupport.netapp.com/site/info/broadcom-cluster-switch

1) Initial Cluster Switch Setup Script

Note: Full instructions are available at docs.netapp.com under Configuring a new Broadcom-supported BES-53248 cluster switch.

Connect a laptop to the switch’s console (RJ45) port (115200 baud). Out-of-the-factory, the default BES-53248 login is admin with no password, and you will be prompted to change it to a secure password on first login.

username = admin
password = {blank}

Note: NetApp123#! will work as a secure password.

You will initially be in the user USER command mode: >
From > , copy and paste the below script, with the highlighted entries updated accordingly:

en
hostname SWITCH_NAME
serviceport protocol none
y
network protocol none
y
serviceport ip SWITCH_IP NETWORK_MASK GATEWAY
show serviceport
show network

Note: Type ‘en’ or ‘enable’ to get from the USER command mode - > - to the EXEC mode - #

2) Cluster Switch OS and RCF File

To check EFOS and RCF, run the below commands - the RCF version is listed in running-config.
Note: Brand new switches might be running the correct EFOS version, but are unlikely to have had the RCF applied.

(BES_SW1) # show version
(BES_SW1) # show running-config

Upgrading EFOS and/or the RCF requires a TFTP/FTP/SFTP server (sometimes if the upload fails one, you have better luck trying a different one.)
Note: In the below, you only need to run 'copy active backup' if the active is different to the backup.

show version
show bootvar
copy active backup
show bootvar

ping {YOUR_TFTP_SERVER}
copy tftp://{YOUR_TFTP_SERVER}/EFOS.3.4.4.6.stk active
show bootvar

copy tftp://{YOUR_TFTP_SERVER}/BES-53248_RCF_v1.6-Cluster-HA.txt nvram:script BES-53248_RCF_v1.6-Cluster-HA.scr
script list
script apply BES-53248_RCF_v1.6-Cluster-HA.scr
show port all | exclude Detach
show running-config
write memory
y

reload
y

show version

3) Configuring DNS, NTP, and SSH

Configure DNS, NTP and SSH using the commands below with the highlighted entries updated accordingly (from the # prompt):

#DNS
configure
ip domain name {YOUR_DOMAIN}
ip name server {DNS_IP_1} {DNS_IP_2}
exit

#NTP
configure
sntp client mode unicast
sntp server {NTP SERVER 1 IP}
sntp server {NTP SERVER 2 IP}
clock timezone 0 zone UK
exit

#SSH
show ip ssh
config
crypto key generate rsa
crypto key generate dsa
crypto key generate ecdsa 521
exit
ip ssh server enable
show ip ssh

4) Passwords

To change the current logged in user’s password:

(BES_SW1) > password

If you want to set an enable password:

(BES_SW1) # enable password PASSWORD

IMPORTANT) Saving Changes!

To save changes so that they are persistent to reboots:

(BES_SW1) # write memory

~~~

2021.09.23: Security recommendation to disable the BMC on BES-53248 switches:

#Disable access to the BMC

#Refer https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Systems/FAS_Systems/BES-53248_BMC_SMASH%2C_SMASHLITE_Scorpio_Console_open_to_SSH_using_default_credentials


(switch1) > enable

(switch1) #

(switch1) # linuxsh

# ipmitool raw 0x32 0x6a 0x20 0x0 0x0 0x0 0x0 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x46 0x0 0xff 0xff 0xff 0xff 0x16 0x0 0x0 0x0 0xe0 0x1 0x0 0x0 0xff 0x0


Optionally set bogus IP address to disable it from being active on the network:

(switch1) # ipmitool lan set 1 ipsrc static

(switch1) # ipmitool lan set 1 ipaddr 1.1.1.2

(switch1) # ipmitool lan set 1 netmask 255.255.255.252

(switch1) # ipmitool lan set 1 defgw ipaddr 1.1.1.1

(switch1) # ipmitool mc reset cold

 

Disable Password Lockout:

(switch1) (Config)# passwords lock-out 0

Friday, 12 February 2021

FabricPool Data in NetApp OnCommand Insight 7.3.x DWH

 I'm not totally sure when FabricPool data first appeared in the OCI 7.3.x DWH. I was under the impression it would need 7.3.11, but this is not so (the below works in 7.3.10).  So did 7.3 Service Pack 9 enable collecting (and subsequent reporting) of object tiering data? I think so (essentially an ONTAP datasource patch). All I can say for sure is that you'll definitely be able to see 'objectStoreUsedSpace' in data acquired from operational OCI servers running 7.3.10 with SP9.

The below simple MySQL query will extract the NetApp Internal Volume Identifier, Snapshot Used by the Internal Volume, and objectStoreUsedSpace / internal volume space tiered to FabricPool, from the OCI DataWarehouse database.


SELECT

  iv.identifier AS 'Internal Volume',

  FLOOR(iv.snapshotUsedCapacityMB / 1024) AS 'Snapshot Used GB',

  ed.fieldValue AS 'Tiered GB'

FROM dwh_inventory.extended_data AS ed

JOIN dwh_inventory.internal_volume AS iv ON iv.id = ed.objectid

WHERE objectType = 'INTERNAL_VOLUME'

AND fieldName = 'objectStoreUsedSpace';


The above is just to demonstrate how to get the 'objectStoreUsedSpace' data. The data comes from dwh_inventory.extended_data. In this situation we are only tiering snapshots to FabricPool, so just interested in getting 'Snapshot Used GB' from internal_volume, to compare with how many GB is tiered.

What Else is in dwh_inventory.extended_data?

Whilst I'm on the subject of dwh_inventory.extended_data, I might as well show what other fields are available (in 7.3.10 SP9).

The below comes from the output of -

SELECT DISTINCT objectType,fieldName FROM dwh_inventory.extended_data WHERE objectType != 'SWITCH';

- skipping SWITCH because it's just line cards.

objectType      | fieldName
----------------+----------------------
STORAGE_NODE    | managementIpAddresses
STORAGE_POOL    | isCompactionSavingsEn
STORAGE_POOL    | isEncrypted
VOLUME          | isEncrypted
VOLUME          | storageGroups
DISK            | isEncrypted
INTERNAL_VOLUME | comment
INTERNAL_VOLUME | isEncrypted
INTERNAL_VOLUME | objectStoreTieringPol
INTERNAL_VOLUME | junctionPath
INTERNAL_VOLUME | qosLimitIOPS
INTERNAL_VOLUME | qosLimitRaw
INTERNAL_VOLUME | qosPolicy
INTERNAL_VOLUME | objectStoreUsedSpace

Extending this to Get Total Tiered Data for Aggregate

I was asked to see if I could get ActiveIQs FabricPool 'Total Tiered Data' from the OCI DWH data. You have to remember that it's always going to be a little bit out. ActiveIQ normally downloads on a Sunday, the dwh_inventory data is only as old as the last complete ETL. Also, important note is that the units in ActiveIQ are actually TB. Anyway, this below seemed to come to a reasonable accurate calculation.We take the sum of 'objectStoreUsedSpace' for all internal volumes on an aggregate and then multiply them by the compression factor (not totally sure why this works):

SELECT
sp.name AS 'Aggregate',
(sp.totalAllocatedCapacityMB / (1024 * 1024)) AS 'Aggr Allocated TB',
(sp.dataUsedCapacityMB / (1024 * 1024)) AS 'Aggr Used TB',
spcf.dedupeRatio AS 'Aggr Dedupe Ratio',
spcf.compressionRatio AS 'Aggr Compression Ratio',
spcf.compactionRatio AS 'Aggr Compaction Ratio',
SUM(ed.fieldValue) / 1024 AS 'Tiered TB',
SUM(ed.fieldValue) * spcf.compressionRatio / 1024 AS 'Tiered TB * Compression'
FROM dwh_inventory.extended_data AS ed
JOIN dwh_inventory.internal_volume AS iv ON iv.id = ed.objectid
JOIN dwh_inventory.storage_pool AS sp ON sp.id = iv.storagePoolId
JOIN dwh_capacity.storage_pool_dimension AS spd ON spd.id = sp.id
JOIN dwh_capacity.storage_and_storage_pool_capacity_fact AS spcf ON spd.tk = spcf.storagePoolTk
JOIN dwh_capacity.date_dimension AS dd ON dd.tk = spcf.dateTk
WHERE ed.objectType = 'INTERNAL_VOLUME'
AND ed.fieldName = 'objectStoreUsedSpace'
AND dd.latest = 1
GROUP BY sp.id;

Wednesday, 10 February 2021

Expanding Aggregates with New Full Disks as Partitioned Disks

Scenario:

You have a NetApp AFF with 36 disks (24 in shelf 0, 12 in shelf 1).
The system has been partitioned with ADPv2.
Currently each node has 18 disks assigned.
Each node has 18 P3 partitions for root and 18 P1 & P2 partitions for data (not a completely standard configuration, often you'll see node 1 with the P1's and node 2 with the P2's, but it's fine).

Existing aggregate layout:
Node 1: Root Aggr: rg0 of 16 * P3 partitions
Node 1: Data Aggr: rg0 of 17 * P1 partitions & rg1 of 17 * P2 partitions
Node 2: Root Aggr: rg0 of 16 * P3 partitions
Node 2: Data Aggr: rg0 of 17 * P1 partitions & rg1 of 17 * P2 partitions

You've bought 12 new disks (of the same size as the originals) and want to assign 6 disks to Node 1, and 6 disks to Node 2. And simply expand the existing raidgroups so your aggregate layout will be:

New aggregate layout:
Node 1: Root Aggr: rg0 of 16 * P3 partitions
Node 1: Data Aggr: rg0 of 23 * P1 partitions & rg1 of 23 * P2 partitions
Node 2: Root Aggr: rg0 of 16 * P3 partitions
Node 2: Data Aggr: rg0 of 23 * P1 partitions & rg1 of 23 * P2 partitions

Image: Existing Aggregate Partitions Layout and New Aggregate Layout

Note: 48 partitioned disks is the maximum you can go to as per hwu.netapp.com, and then you have to start using full disks.

How to do?

Note 1: We use the clustershell to achieve our objective.
Note 2: I'm using a vSIM here, so disk names will look different to what's seen in reality.

1) Disable disk auto assign:

disk option show
disk option modify -node * -autoassign off
disk option show

2) Physically insert the new disks.

3) Assign 6 disks to the Node 1:

disk show
disk assign -node CLU01-01 -disklist VMw-1.1,VMw-1.2,VMw-1.3,VMw-1.4,VMw-1.5,VMw-1.6
disk show

4) Make sure your data aggr maxraidsize is sufficient since it needs to be at least 23 here:

aggr show -aggregate aggr1 -fields maxraidsize
aggr modify -aggregate aggr1 -maxraidsize 24
aggr show -aggregate aggr1 -fields maxraidsize

5) Add the 6 newly assigned and spare but unpartitioned disks to your aggregate and see they get partitioned!:

Note: Key thing in the output below A) Disks are being added to existing raidgroups rg0 & rg1 B) It actually says "The following disks will be partitioned".

::> aggr add-disks -aggregate aggr1 -disklist VMw-1.1,VMw-1.2,VMw-1.3,VMw-1.4,VMw-1.5,VMw-1.6 -simulate

Disks would be added to aggregate "aggr1" on node "CLU01-01" in the following
manner:

First Plex

RAID Group rg0, 6 disks (block checksum, raid_dp)
Usable Physical
Position Disk Type Size Size
---------- ------------------------- ---------- -------- --------
shared VMw-1.1 SSD 12.39GB 12.42GB
shared VMw-1.2 SSD 12.39GB 12.42GB
shared VMw-1.3 SSD 12.39GB 12.42GB
shared VMw-1.4 SSD 12.39GB 12.42GB
shared VMw-1.5 SSD 12.39GB 12.42GB
shared VMw-1.6 SSD 12.39GB 12.42GB

RAID Group rg1, 6 disks (block checksum, raid_dp)
Usable Physical
Position Disk Type Size Size
---------- ------------------------- ---------- -------- --------
shared VMw-1.1 SSD 12.39GB 12.42GB
shared VMw-1.2 SSD 12.39GB 12.42GB
shared VMw-1.3 SSD 12.39GB 12.42GB
shared VMw-1.4 SSD 12.39GB 12.42GB
shared VMw-1.5 SSD 12.39GB 12.42GB
shared VMw-1.6 SSD 12.39GB 12.42GB

Aggregate capacity available for volume use would be increased by 133.8GB.

The following disks would be partitioned: VMw-1.1, VMw-1.2, VMw-1.3, VMw-1.4,
VMw-1.5, VMw-1.6.

::> aggr add-disks -aggregate aggr1 -disklist VMw-1.1,VMw-1.2,VMw-1.3,VMw-1.4,VMw-1.5,VMw-1.6

6) Repeat for 3 to 5 Node 2.

7) Finally, re-enable disk auto assign:

disk option modify -node * -autoassign on

THE END!

Monday, 1 February 2021

You Can Add Larger Disks to an Aggregate and Not Waste Capacity

I sometimes come across situations where people don't know you can add larger disks into an aggregate of smaller disks, and not suffer the larger disks being right-sized down to keep inline with the smaller disks. If you have larger disks to add to your system, and you want to maintain a system with as few aggregates as possible (or 1), the key is to use the switch -raidgroup new when adding the new disks to the aggregate.

In the below walk-through we demonstrate this. I am using a NetApp ONTAP 9.7 simulator with 14 * 1GB disks and 14 * 4GB disks.

Note i: Good to know that this 2013 blog post Clustered ONTAP 8.2 SIM: Maximizing Available Usable Space still works to some extent with the ONTAP 9.7 simulator.
Note ii: The 8 year old Home Lab lives on! ;-)

Walk-Through

These 5 bullet points cover the essence of what's happening in the succeeding Clustershell output.

  • Create a RAID-DP aggregate of 10 * 1GB disks and see it has usable space of 7.03GB.
  • See the aggregate disks usable size is 1020MB (~1GB).
  • See our 14 * 4GB spare disks have usable size 3.93GB.
  • Simulate adding a new RAID-DP raidgroup of 12 * 4GB disks to our aggregate, and see each disk has usable size 3.91GB.
  • After adding the 12 disks to our 10 disk 7.03GB aggregate, the aggregate now has a total size of 42GB.

CLU01::> aggr create -aggregate aggr1_CLU01_01 -diskcount 10 -node CLU01-01 -simulate

The layout for aggregate "aggr1_CLU01_01" on node "CLU01-01" would be:

First Plex

RAID Group rg0, 10 disks (block checksum, raid_dp)
                      Usable Physical
Position   Disk         Size     Size
---------- -------- -------- --------
dparity    NET-1.18        -        -
parity     NET-1.19        -        -
data       NET-1.20   1000MB   1.00GB
data       NET-1.21   1000MB   1.00GB
data       NET-1.22   1000MB   1.00GB
data       NET-1.23   1000MB   1.00GB
data       NET-1.24   1000MB   1.00GB
data       NET-1.25   1000MB   1.00GB
data       NET-1.26   1000MB   1.00GB
data       NET-1.27   1000MB   1.00GB

Aggregate capacity available for volume use would be 7.03GB.

CLU01::> aggr create -aggregate aggr1_CLU01_01 -diskcount 10 -node CLU01-01
CLU01::> disk show -container-name aggr1_CLU01_01 -fields usable-size,physical-size
disk     physical usable
         -size    -size
-------- -------- ----------
NET-1.18 1.00GB   1020MB
NET-1.19 1.00GB   1020MB
NET-1.20 1.00GB   1020MB
NET-1.21 1.00GB   1020MB
NET-1.22 1.00GB   1020MB
NET-1.23 1.00GB   1020MB
NET-1.24 1.00GB   1020MB
NET-1.25 1.00GB   1020MB
NET-1.26 1.00GB   1020MB
NET-1.27 1.00GB   1020MB
10 entries were displayed.

CLU01::> disk show -usable-size 3.93G -fields usable-size,container-type
disk     container usable
         -type     -size
-------- --------- ----------
NET-1.1  spare     3.93GB
NET-1.2  spare     3.93GB
NET-1.3  spare     3.93GB
NET-1.4  spare     3.93GB
NET-1.5  spare     3.93GB
NET-1.6  spare     3.93GB
NET-1.7  spare     3.93GB
NET-1.8  spare     3.93GB
NET-1.9  spare     3.93GB
NET-1.10 spare     3.93GB
NET-1.11 spare     3.93GB
NET-1.12 spare     3.93GB
NET-1.13 spare     3.93GB
NET-1.14 spare     3.93GB

CLU01::> aggr add-disks -aggregate aggr1_CLU01_01 -disksize 4 -diskcount 12 -raidgroup new -simulate true

Disks would be added to aggregate "aggr1_CLU01_01" on node "CLU01-01" in the following manner:

First Plex

RAID Group rg1, 12 disks (block checksum, raid_dp)
                      Usable Physical
Position   Disk         Size     Size
---------- -------- -------- --------
dparity    NET-1.2         -        -
parity     NET-1.3         -        -
data       NET-1.4    3.91GB   3.93GB
data       NET-1.5    3.91GB   3.93GB
data       NET-1.6    3.91GB   3.93GB
data       NET-1.7    3.91GB   3.93GB
data       NET-1.8    3.91GB   3.93GB
data       NET-1.9    3.91GB   3.93GB
data       NET-1.10   3.91GB   3.93GB
data       NET-1.11   3.91GB   3.93GB
data       NET-1.12   3.91GB   3.93GB
data       NET-1.13   3.91GB   3.93GB

Aggregate capacity available for volume use would be increased by 35.16GB.

CLU01::> aggr add-disks -aggregate aggr1_CLU01_01 -disksize 4 -diskcount 12 -raidgroup new
CLU01::*> df -A -aggregate aggr1_CLU01_01 -skip-snapshot-lines -gigabyte
Aggregate     total   used  avail   capacity
aggr1_CLU01_01 42GB    0GB   42GB         0%

Image: Adding 14 type 23 disks (1000MB) to the ONTAP VSIM


Image: Adding 14 type 31 disks (4000MB) to the ONTAP VSIM