[StorageGRID][FabricPool] Changing From Dual Copy to EC 2+1

** Caveat lector! Totally unofficial writings. **

Scenario

We have an ONTAP lab cluster (running 9.9.1) using StorageGRID (11.5.0) as a cloud tier for tiering of data.

cluster2::storage aggregate object-store> show-space

Aggregate Object Store Name Provider Type Used Space
-------------- ----------------- ------------- -----------
aggr1_cluster2 sgws_71 SGWS 3.04GB
aggr2_cluster2 sgws_71 SGWS 2.61GB
aggr3_cluster2 sgws_71 SGWS 771.0MB

Note: This is a lab system so the used space is quite small. But the volume are set to the "All" tiering policy, so the above represents pretty much all the data in our lab Cluster being on StorageGRID.

Our StorageGRID grid consists of 3 storage nodes. Our ILM policy contains one rule called "Make 2 Copies".

We want to change the "Make 2 Copies" rule to an "EC 2+1" rule.

** See below: "Things to Check Before and After Making the Change" **

Steps

Note: You may need to acquire some information before starting, such as bucket names.

cluster2::storage aggregate object-store> config show

Name: sgws_71
Server: dc1-adm1.demo.company.com
Container Name: fabricpool-cluster2
Provider Type: SGWS

1) Create a Storage Pool to contain All Storage Nodes in the site

ILM > Storage Pools

Click Create to create a new Storage Pool

Name: Data Center 1
Site: Data Center 1
Storage Grade: All Storage Nodes
Click Save

2) Create the new Erasure Coding Profile {<-- Skip for 11.8}

ILM > Erasure Coding

Click Create to create a new EC Profile

Profile Name: DC1_EC_2plus1
Storage Pool: Data Center 1
Scheme: 2+1
Click Save

3) Create a new ILM rule with the EC 2+1

Note: Here we could have not specify the bucket name, because this lab StorageGRID has one site and is just serving FabricPool as a cloud tier. But then it would be a default rule that applies to everything, which I didn't want.
ILM > Rules

Click Create to create a new ILM Rule

>> Step 1 of 3 <<
Name: DC1_EC_2plus1
Description:
Tenant Accounts (optional):
Bucket Name: fabricpool-cluster2
Click Next
>> Step 2 of 3 <<
Reference Time: Ingest Time (default)
From day 0 store forever (default)
Type: erasure coded
Location: Data Center 1 (DC1_EC_2plus1)
Click Next
>> Step 3 of 3 <<
Select Balanced for the ingest behaviour
Click Save

4) Clone the ILM Policy

ILM > Policies

Click Clone to clone the Active policy
Name: 2024_06_25 Policy
Reason for change: Converting 2 Copies to EC 2+1 for FabricPool
Click the button Select Rules

Select Default Rule: Make 2 Copies
Select Other Rules: DC1_EC_2plus1
Click Apply
Verify the order of the rule is correct (the default rule is always last).
Click Save

5) Active the new Proposed ILM policy

ILM > Policies

Select the new Proposed policy
Click Activate
Click OK to the "Activate the proposed policy. Errors in an ILM policy can cause irreparable data loss. Review and test the policy carefully before activating. Are you sure you want to active the proposed policy?"

All being well (the important thing is the "From day 0 store forever" which means that StorageGRID does not delete stuff, stuff is only deleted by the S3 client, which is ONTAP in this case), the cluster won't notice any change. You can check with

event log show

system health status show

system health subsystem show

Monitoring ILM Progress

The best way to monitor the ILM change progress is by looking at the ILM queue build up and then settle down. It is also possible to monitor ILM queue/scan rate using Grafana (but not super insighful.)

Note: On my lab, because the data set is so small and load non-existent, I saw pretty much nothing in the graphs except a blip on Network Traffic.

Note: In 11.8, under Support > Metrics > ILM, there are the Grafana graphs, which are quite good. The below example has no data (no ILM).

Difference in StorageGRID 11.8

In StorageGRID 11.8, you can skip (2) above and got straight to creating the rule with an Erasure Coding Profile.

Things to Check Before and After Making the Change

Check Alerts
Check Support > Diagnostics
Check Dashboard > ILM Tab*
Check Nodes > Data Center > ILM graph*
Check Nodes > Storage Nodes > ILM statistics and graph*
Check Nodes > Storage Nodes > Hardware For CPU and Memory utilization*
Check Nodes > Storage Nodes > Network for network utilization*

*We expect all these to increase whilst the ILM conversion is happening.

Also you want to get full details of the existing ILM Policy and rules, also knowledge of buckets might be useful too:

ILM > Rules
ILM > Policies
ILM > Erasure coding
Tenants > [Your Tenant] > Bucket details

APPENDIX: Checks Included in StorageGRID 11.8 Diagnostics

StorageGRID 11.8 Diagnostics are under SUPPORT > Diagnostics and they include many checks:

Node uptime
Cassandra automatic restarts
Cassandra blocked task queue too large
Cassandra commit log latency
Cassandra commit log queue depth
Cassandra compaction queue too large
Cassandra deleted data errors
Cassandra dropped messages
Cassandra garbage collection
Cassandra imbalanced SSTables
Cassandra memory
Cassandra memtable flushes
Cassandra offheap memor too high
Cassandra pending message queue too large
Cassandra read latency consistently high
Cassandra reclaimable space
Cassandra repair progress
Cassandra request timeouts
Cassandra requests unable to achieve consistency
Cassandra table partitions too large
CPU IO wait
CPU utilization
Custom SSH settings
Dirty page ratio
Disk read latency
Disk write latency
Erasure-coded groups in repair change over time
Erasure-coded groups repair health
Erasure-coded groups writable counts
Grid options
Invalid prefix corrections for bucket listing
LDR Storage Desired State
Load balancer - request timeouts
Load balancer - upstream connection problems
Load balancer - upstream retries exceeded
Network MTU values
Replicated repair jobs not progressing
SSD Cache Hit Rate
Storage Node client connections
Storage used - object data
StorageGRID version consistency
TCP connection tracking utilization
TCP retransmission rate

Cosonok's IT Blog

Search This Blog

[StorageGRID][FabricPool] Changing From Dual Copy to EC 2+1

Comments

Post a Comment