[Brief Notes] FabricPool & StorageGRID Best Practices & More...

** Work in progress ** 

A place for some FabricPool notes.

Note: These are just personal notes from reading documentation. By reading I mean skimming through a document trying to pick out the most important bits.

[1] FabricPool Best Practices - ONTAP 9.13.1 - TR 4598 (July 2023)

Requirements:

  • In releases earlier than ONTAP 9.8, FabricPool is only supported on SSD local tiers.

Volume Tiering Policies:

  • By default, volumes use the None volume tiering policy. The exception to this are newly created FlexVol volumes on FabricPool aggregates which use the Snapshot-Only volume tiering policy. 
  • Auto - moves all cold blocks in the volume to the cloud tier.
  • Snapshot-Only - cold Snapshot copy blocks in the volume are moved to the cloud tier.
  • All - primarily used with secondary data and data protection volumes. NetApp does not recommend using the All volume tiering policy with primary data (read/write volumes).

Intercluster LIFs:

  • No specific best practice (i.e. if you already have for SnapMirror, you could use the same)
  • If you are using more than one intercluster LIF on a node with different routing, NetApp recommends placing them in different IPspaces. During configuration, FabricPool can select from multiple IPspaces, but it is unable to select specific intercluster LIFs within an IPspace.
  • Note: Disabling or deleting an intercluster LIF interrupts communication to the cloud tier.
Volumes (must use space-guarantee none):

  • FabricPool cannot attach a cloud tier to a local tier that contains volumes by using a space guarantee other than None.
  • FlexGroup constituent volumes on heterogenous local tiers is not recommended.
  • QoS Min must be turned off on volumes in FabricPool local tiers. Alternatively, tiering must be turned off (-tiering-policy none) on volumes that require QoS Min.
Cloud Tiering license:
  • A Cloud Tiering license is not required when using StorageGRID or ONTAP S3 as the cloud tier or when using Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage as the cloud tier for Cloud Volumes ONTAP.
  • Tiering to the cloud tier stops when the amount of data (used capacity) stored on the cloud tier reaches the licensed capacity.
Certificate authority certification:
Data Movement:
  • volume object-store tiering show
  • Tiering fullness threshold:
    • By default, tiering to the cloud only happens if the local tier is >50% full.
    • Setting the threshold to a lower number  may be useful for large local tiers that contain little hot/active data.
    • Setting the threshold to a higher number may be useful for solutions designed to tier only when local tiers are near maximum capacity.
    • Note: The All volume tiering policy ignores the tiering fullness threshold.
    • storage aggregate object-store modify –aggregate AGGR –tiering-fullness-threshold (0%-99%) -object-store-name NAME
  • SnapMirror: Cascading SnapMirror relationships are not supported when using the All volume tiering policy. Only the final destination volume should use the All volume tiering policy.
  • Volume move:
    • If a volume move’s destination local tier uses the same bucket as the source local tier, data on the source volume that is stored in the bucket does not move back to the local tier.
    • Incompatible with optimized volume moves:
      • Source and destination aggregates use different encryption keys
      • FlexClone volumes and FlexClone parent volumes
    • If a volume tiering policy is not specified when performing a volume move, the destination volume uses the tiering policy of the source volume. If a different tiering policy is specified when performing the volume move, the destination volume is created with the specified tiering policy.
    • Note: When in an SVM DR relationship, source and destination volumes must use the same tiering policy.
    • Moving a volume to another local tier resets the inactivity period of blocks on the local tier.
FabricPool Mirror:
  • Enables the attachment of two cloud tiers to a single local tier (allowing you could move from one cloud tier to another.)
  • When both buckets are in a mirrored state, newly tiered data is synchronously tiered to both buckets. Because data is being tiered to two buckets synchronously, the effective throughput is half of standard single-bucket tiering.
  • FabricPool Mirror can be used to tier data to multiple cloud vendors for an additional level of resiliency as the likelihood that multiple cloud providers experiences outages at the same time is extremely rare.
  • Although essential for FabricPool with NetApp MetroCluster, FabricPool Mirror is a stand-alone feature that does not require MetroCluster to use.
Unreclaimed space threshold (no specific reason to change):
  • StorageGRID 40%
  • storage aggregate object-store modify –aggregate AGGR -object-store-name OBJSTORE –unreclaimed-space-threshold (0%-99%)
ONTAP storage efficiencies:
  • Storage efficiencies are preserved when moving data to the cloud tier.
  • Aggregate inline deduplication is supported on the local tier, but associated storage efficiencies are not carried over to objects stored on the cloud tier.
  • When using the All volume tiering policy, storage efficiencies associated with background deduplication  processes may be reduced as data is likely to be tiered before the additional storage efficiencies can be applied.
  • Note: Third-party deduplication has not been qualified by NetApp.
Temperature-sensitive storage efficiency:
  • Beginning in ONTAP 9.10.1:
    • TSSE supports with FabricPool. More efficient, but smaller blocks will require smaller GETs, reducing GET performance from the Cloud Tier.
    • AFF volumes are created using adaptive compression by default (-storage-efficiency-mode default)
    • TSSE must be manually enabled on volumes (-storage-efficiency-mode efficient)
Configuration - see page 21.
  • Attaching a single cloud tier to multiple local tiers in a cluster is the general best practice. NetApp does not recommend attaching a single cloud tier to local tiers in multiple clusters.
  • Note: Attaching a cloud tier to all FabricPool local tiers is the general best practice and provides significant benefits to environments that value manageability over public object store cloud tier performance.
  • Note: ONTAP and StorageGRID system clocks must not be out of sync by more than a few minutes. Significant clock skew prevents the StorageGRID bucket from being attached to the local tier.
  • Note: Attaching a cloud tier to a local tier is a permanent action. A cloud tier cannot be unattached from a local tier after being attached. (Using FabricPool Mirror, you can attach a different cloud tier.)
More on volume tiering:
  • It takes approximately 31 days for inactive blocks to become cold. The Auto cooling period is adjustable between 2 days and 183 days by using tiering-minimum-cooling-days
  • When cold blocks in a volume with a tiering policy set to Auto are read randomly, they are made hot and written to the local tier.
  • When cold blocks in a volume with a tiering policy set to Auto are read sequentially, they stay cold and remain on the cloud tier. They are not written to the local tier.
Cloud Retrieval Policies:
  • Default
  • Never
  • On-Read
  • Promote - Setting the cloud retrieval policy to Promote immediately queues tiered data to return to the local tier—provided the tiering policy allows it (e.g. when changing tiering policy from Auto to None, or Auto to Snapshot-Only)
Default Volume tiering-minimum-cooling-days:
  • For the Auto tiering policy is 31 days.
  • For the Snapshot-Only tiering policy is two days.
Security:
  • FabricPool supports NetApp Storage Encryption (NSE), NetApp Volume Encryption (NVE), and NetApp Aggregate Encryption (NAE). 
  • NetApp highly recommends using client-side NVE or NAE encryption - encrypting data at rest is the recommended best practice. If required to disable on cloud tier: storage aggregate object-store config modify -serverside-encyption false
Interoperability - see page 39.

StorageGRID:
  • NetApp recommends provisioning enough StorageGRID nodes to meet or exceed capacity and performance requirements
  • You can use high-availability (HA) groups to provide highly available data connections for S3 client. You can also use HA groups to provide highly available connections to the Grid Manager and the Tenant Manager. HA groups use virtual IP addresses (VIPs) to provide active-backup access to Gateway Node or Admin Node services.
  • NetApp recommends disabling stored object compression in StorageGRID.
  • NetApp recommends disabling stored object encryption in StorageGRID.
  • Intrasite erasure coding using a 2+1 scheme is the recommended best practice for cost efficient data protection. Erasure coding uses more CPU, but significantly less storage capacity, than replication. 4+1 and 6+1 schemes use even less capacity than 2+1, but at the cost of lower throughput and less flexibility when adding storage nodes during grid expansion.
  • FabricPool also supports advanced ILM rules such as filtering based on object tags when StorageGRID is used as the cloud tier. ILM rules can be used in conjunction with tags to direct objects to specific nodes, change data protection policies (from replication to erasure coding), etc.
  • ONTAP supports up to four tag key=value pairs per volume: volume modify VOLNAME -tiering-object-tags KEY1=VALUE1,KEY2=VALUE2...
  • NetApp recommends using the default, read-after-new-write, consistency control for buckets used as FabricPool targets. Note: Do not use the available consistency control for buckets used as FabricPool targets.
Performance:
  • Object Store Profiler:
    • You must add the cloud tier to ONTAP before you can use it with the object store profiler.
    • Start the object store profiler:
      • storage aggregate object-store profiler start -object-store-name OSNAME -node NODE
    • View the results:
      • storage aggregate object-store profiler show
  • Sequential read performance: Beginning in ONTAP 9.13.1, FabricPool performance was improved by increasing the concurrency and parallelism of byte-ranged GETs during sequential reads.
  • PUT throttling:
    • FabricPool PUT operations do not compete for resources with other applications. FabricPool PUT operations are automatically placed at a lower priority (bullied) by client applications and other ONTAP workloads, such as SnapMirror.
    • storage aggregate object-store put-rate-limit modify -node NODE -default TRUE|FALSE -put-rate-bytes-limit X[KB|MB|GB|TB|PB]
Loss of Connectivity (p44):
  • If for any reason connectivity to the cloud is lost, the FabricPool local tier remains online, but applications receive an error message when attempting to get data from the cloud tier. Cold blocks that exist exclusively on the cloud tier remain unavailable until connectivity is reestablished.
  • NetApp recommends using the following guidance when tiering data in volumes hosting LUNs:
    • Snapshot-Only is an acceptable tiering policy for most SAN use cases.
    • Auto should only be used for non-critical applications. 
    • All should not be used on volumes hosting LUNs.
Sizing (p45):
  • Writes from the cloud tier to the local tier are disabled if local tier capacity is greater than 90%. If this occurs, blocks are read directly from the cloud tier.
  • Inactive data reporting (IDR):
    • Enabled by default on non-FabricPool SSD local tiers. Enable on HDD using CLI.
    • Uses a 31-day cooling period. Adjust with the volume -tiering-minimum-cooling-days
    • To enable: storage aggregate modify -aggregate AGGRNAME -is-inactive-data-reporting-enabled true
  • Tiering during data migrations
    • Because of the difference in ingress and egress rates, it is possible run out of space on a small local tier when attempting to migrate more data to it than it has capacity to hold. Data is usually coming into the local tier at a faster rate than it can be converted into objects and tiered out.
    • For example, if a volume move takes place at 2GBps but tiering takes place at 500MBps, 50TB completes the volume move to the local tier in ~7 hours. However, ~28 hours are required for tiering to an object store. The local tier must have enough capacity to store the data before it is tiered.
  • Do not host virtualized object stores (e.g. virtual StorageGRID storage nodes) in volumes that tier inactive data. Set the tiering policy on those volumes to None.
ONTAP CLI
  • storage aggregate object-store show-space
  • volume show-footprint
[2] StorageGRID 11.6 Documentation - April 14, 2023

This is a 2288 page PDF. I've skimmed by searching for Best Practice.
  • If you do not have gateway, the best practice is to set the gateway address to be the IP address of the network interface.
  • The best practice is to specify at least two DNS servers.
  • Default Grid CA certificate:
    • Although you can use the Grid CA certificate for a non-production environment, the best practice for a production environment is to use custom certificates signed by an external certificate authority. 
  • Global/other certificates:
    • The best practice for a production environment is to use custom certificates signed by an external certificate authority.
  • Best practices for StorageGRID load balancing:
    • As a general best practice, each site in your StorageGRID system should include two or more nodes with the Load Balancer service.
    • You must configure a StorageGRID load balancer endpoint to define the port that Gateway Nodes and Admin Nodes will use for incoming and outgoing FabricPool requests.
  • Best practices for the load balancer endpoint certificate:
    • When creating a load balancer endpoint for use with FabricPool, you should use HTTPS as the protocol. Communicating with StorageGRID without TLS encryption is supported but not recommended.
    • You can then either upload a certificate that is signed by either a publicly trusted or a private certificate authority (CA), or you can generate a self-signed certificate. The certificate allows ONTAP to authenticate with StorageGRID.
    • As a best practice, you should use a CA server certificate to secure the connection. Certificates signed by a CA can be rotated nondisruptively.
  • Best practices for high availability (HA) groups:
    • Before attaching StorageGRID as a FabricPool cloud tier, you should use the StorageGRID Grid Manager to configure a high availability (HA) group.
    • If you plan to use FabricPool with primary workload data, you must create an HA group that includes at least two load-balancing nodes to prevent data retrieval interruption.
  • The best practices for creating a traffic classification policy for FabricPool depend on the workload, as follows:
    • If you plan to tier FabricPool primary workload data to StorageGRID, you should ensure that the FabricPool workload has the majority of bandwidth.
    • You can create a traffic classification policy to limit all other workloads.
    • In general, FabricPool read operations are more important to prioritize than write operations. You should not impose quality of service limits on any FabricPool workload; you should only limit the other workloads.
  • Other best practices for StorageGRID and FabricPool:
    • Object encryption: not required (already encrypted)
    • Object compression: not required (already compressed)
    • Consistency level: recommended is Read-after-new-write (default). Do not use available or any other consistency level
    • Never use FabricPool to tier any data related to StorageGRID back to StorageGRID itself.

[3] FabricPool S3 Compatible Test Guide - April 2020
  1. Pre-requisites
    1. Install and Setup ONTAP Cluster
    2. Create SSD aggregate
      1. ::> storage aggregate create
    3. Configure dedicated intercluster LIF per node
      1. ::> net int create -service-policy default-intercluster
    4. Verify LIF connectivity to Cloud Object-storage::>
      1. ::> network ping
    5. Create volumes
      1. ::> stor aggr object-store modify -aggr AGGR -tiering-fullness-threshold X%
    6. Adding data to volumes
      1. ::> set d
      2. ::> run local mkfile 1g /vol/VOLNAME/FILE [x files]
      3. ::> run local ls -l /vol/VOLNAME
      4. ::> snapshot create
      5. ::> ...
  2. Configuring FabricPool
    1. FabricPool license (not required if StorageGRID is used)
    2. Install CA certificates in ONTAP
      1. % openssl s_client -connect <object-server-ip>:<port> -showcerts
      2. ::> security certificate install -type server-ca
      3. (Optional) Configure Proxy (when using different cloud providers)
      4. Add a Cloud Tier to ONTAP
      5. Configuring other S3 compliant object storage
        1. ::> object-store config create
      6. Attach external capacity tier to an aggregate:
        1. ::> storage aggr object-store attach
      7. Validate FabricPool aggregate:
        1. ::> stor aggr show
  3. Test Plans
    1. Understand IDR in a non-FabricPool Aggregate
    2. Auto Tiering Workflows
    3. Snapshot-only Tiering Workflows
    4. Archive a volume using vol move
    5. Recovery from local / backup / DR volume snapshot
Capture ONTAP Statistics:
  • set d
  • node run -node * "wafl composite stats show"
  • node run -node * "wafl composite stats counter show FPAGGR"
  • node run -node * "walf cloudio_stats"
  • statistics show -object object_store_client_conn -instance *
  • statistics show -object object_store_client_op -instance * -raw
  • statistics show -object ktls_global -instance ktls_global -raw
  • statistics show -object ktls_session -raw
  • snapmirror show -instance
  • df -aggregates -composite -aggregate AGGR
  • volume show-footprint
  • aggregate show-space
[4] NetApp FabricPool with StorageGRID - Recommendation Guide - TR-4826

Additional to the above.
  • High-availability groups
    • In StorageGRID, high-availability (HA) groups use virtual IP addresses (VIPs) to provide active-backup access. HA groups can consist of a combination of admin nodes, gateway nodes, or both. Gateway nodes are dedicated load balancer nodes, whereas admin nodes run both management and Amazon Simple Storage Service (Amazon S3) load-balancing services. You assign one node in the group to be the active primary. When admin nodes are configured in an HA group, ports 443 and 80 cannot be configured for Amazon S3 access because they are reserved for the Grid Manager and Tenant Manager UIs. 
  • Load balancer endpoints
    • The load balancer service in StorageGRID provides Layer-7 load balancing. It also performs Transport Layer Security (TLS) termination of client requests, inspects the requests, and establishes new secure connections to the storage nodes. First, create a load balancer endpoint to be used for the FabricPool workload with the display name (for example, S3.netapp.com) and the port to be used (for example, 10443). When selecting HTTPS, you must upload or generate a certificate. Make sure to upload the certificate to the ONTAP cluster, as well as the root and any subordinate certificate authority (CA) certificates.
  • Load balancer topology
    • A general best practice is to use two load-balancing nodes in each site in an HA group with at least one load balancer endpoint configured.
[5] StorageGRID Design and Implementation - Guidelines and Best Practices: TR-4889 - January 2023

Additional to the above.
  • Abbreviations:
    • Administrative Domain Controller (ADC). Maintains topology and grid-wide configuration. Typically exists only on 3 Storage Nodes at each site. Requires a minimum of three ADCs per site.
    • Configuration Management Node (CMN). Manages system-wide configuration, exists on a primary Admin Node only.
    • Connection Load Balancer (CLB). Provides layer 3 and 4 load balancing of S3 and Swift traffic from clients to Storage Nodes. This service exists on every Gateway Node. The CLB service is deprecated. The Load Balancer service on a Gateway or Admin Node is recommended, which provides a layer 7 load-balancing mechanism.
    • Local Distribution Router (LDR). Processes object storage protocol (S3 or Swift) requests and manages object data on disk. This service runs on every Storage Node.
  • DDP:
    • For SG5760/SG6060/SGF6024, DDP8 is the recommended RAID volume configuration and is our default. For DDP16, there is a performance decrease compared to DDP8 for certain object sizes. DDP16 allows greater capacity.
  • StorageGRID topology notes:
    • You can deploy a grid with two logical sites at the same physical data center.
    •  Understand the trade-offs between a multisite grid and independent grids. StorageGRID provides redundancy for object metadata by storing the metadata for all objects in the system on Storage Nodes at each site. Each site must support the full metadata load of the entire grid.
    • A grid must have one primary Admin Node. You cannot promote a non-primary Admin Node to primary Admin Node. Each additional Admin Node generates extra management-related traffic and activities. The general guideline is one primary and one non-primary Admin Node in a single grid regardless of the number of sites
    • Each grid site must have a minimum of three Storage Nodes. The general recommendation is one extra node above minimum in case one Storage Node requires maintenance.
    • Cassandra database size. StorageGRID reserves space on an object store volume 0 of each Storage Node for the Cassandra database. On a Storage Node with greater than or equal to 128G RAM, 4TB is reserved. The total grid metadata capacity is limited by the smallest site.
  • Network topologies:
    • Grid Network. All nodes on the Grid Network must be able to talk to all other nodes. The Grid Network can consist of multiple subnets. You can also add networks containing critical grid services, such as Network Time Protocol (NTP), as grid subnets. When the Grid Network is the only StorageGRID network, it is also used for all admin traffic and all client traffic. The Grid Network gateway is the node default gateway unless the node has the Client Network configured.
    • Admin Network (optional). Using the Admin Network is recommended if the grid includes StorageGRID appliances.
    • Client Network (optional). Client network is an external facing network to allow S3 clients to access the grid. Using a Client Network, you can configure a Grid Network that is private and not externally routable or use it for FabricPool data, while using an external network for other application/client access. You can configure Client Network interfaces to interface with untrusted networks, exposing only the ports configured for the S3 endpoint (typically 443)
    • Note: Each type of network must be on a different subnet with no overlap.
    • Note: You can enable admin and client networks later, as a post-deployment maintenance task.
  • StorageGRID Implementation Best Practices: Planning and Preparation
    • Understand proposed solution details.
    • Review sales orders.
    • Complete StorageGRID installation workbook from ConfigBuilder.netapp.com.
    • If you plan to setup a high-availability group (HA group), the Admin and/or Gateway Nodes in same group must have at least one interface (Grid or Client) on same subnet and you must configure this interface with a static IP.
    • Download StorageGRID software and latest hotfix,
    • Download StorageGRID software licenses (grid serials from SO)
    • Download SGA firmware - supported SANtricity and disk firmware (as required)
  • StorageGRID Implementation Best Practices: Grid Installation
    • Configure SGA networks and DDP mode, Verify network connectivity.
    • SG100/1000 Admin Node, confirm time within 15 mins of UTC. SSH to 8022, login as admin, su - to root, and enter date. To set: date --set ="yyyy-mm-dd hh:mm:ss"
    • The StorageGRID appliance serial port must not be connected before you click Start Installation.
    • On the primary Admin Node grid installation interface, confirm that the grid name, site name, node name entries are correct and have no errors. There is no procedure to change these names after you select Start Install.
    • The Grid Network subnets list includes subnets for NTP, DNS, and LDAP if you want to access them through the Grid Network gateway.
    • Node approval page - all present and correct, and in the correct site?
    • Admin Network - if NTP is not accessible through the Grid Network, add the NTP server subnet.
    • NTP Role and ADC Service Settings. NetApp recommends leaving these as Automatic for new grid install.
    • NTP servers must be accessible either on the Grid Network or Admin Network during grid installation. 
    • Passwords:
      • Provisioning Passphrase - used for grid maintenance and topology changes.
      • Grid Management Root User - a superuser login to the Grid Manager UI.
      • For production deployment, do not clear Create Random Command Line Passwords for security reasons.
    • Make sure that the recovery package is kept in a secure place because it contains a login password to each grid node.
    • Do not reboot any grid node if you observe a node stuck at same status for over 30 minutes.
    • Use SANtricity Management UI to configure DNS, NTP, and SMTP to receive email alerts. To configure AutoSupport Delivery Method and test the configuration, go to Support > AutoSupport.
    • Using the Grid Manager UI, perform basic installation verification:
      • Alerts, Capacity, Nodes, Support > Diagnostics
      • AutoSupport - and confirm it is sent successfully
  • Grid Configuration: Grid Options
    • Compress Stored Object: NO for FabricPool (FabricPool compresses objects before tiering out to StorageGRID.)
    • Stored Object Encryption: NO for FabricPool (FabricPool encrypts objects when tiering out data to StorageGRID, it is not necessary to enable encryption on the StorageGRID side.)
    • Stored Object Hashing. By default, object data is hashed using the SHA-1 algorithm. The SHA-256 algorithm requires additional CPU resources and is generally not recommended for integrity verification.
    • Enable HTTP Connection. This option does not apply if client applications connect to the grid’s Load Balancer service on the Gateway Node or Admin Node. You can configure HTTP or HTTPS on each load balancer endpoint. Internally, the Load Balancer service connects to Storage Node’s LDR using HTTPS regardless of whether the endpoint is using HTTP or HTTPS.
  • ILM configuration best practices:
    • Create a storage pool per site and use the site-specific storage pool in the rule. Even if there is one site in the grid, create a site-specific storage pool instead of the All Storage Nodes pool.
    • Recommendation of ILM rule for FabricPool. NetApp recommends using local EC, for example, EC 2+1 or 4+1 (depending on the number of Storage Nodes installed at the site).
  • TLS Certificate Configuration: (from page 28)
    • 1) Submit a certificate signing request (CSR) to a publicly known certificate authority (CA) (for a fee). Recommended. Must use this option if the grid is intended for external clients.
    • 2) Submit a CSR to the internal IT security department. Recommended if your customer prefers not to pay for a public CA and all clients are internal within the company. Example commands: https://github.com/NetApp-StorageGRID/SSL-Certificate-Configuration
    • 3) Use the self-signed certificate generated by StorageGRID. Suitable for short-term testing with no external client connections. Only available when you create a load balancer endpoint.
  • Install custom server certificate:
    • Management Interface Server Certificate ...
    • Object Storage API Service Endpoints Certificate
      • Note: This certificate is not used by the grid Load Balancer service. Skip if all client connections are made through the load balancer endpoint.
    • Load balancer endpoint ...
  • TLS Certificate Renewal
    • Most TLS server certificates expire within 1 year.
    • Using a public/internal CA issued server certificate streamlines the renewal process as the new certificate only needs to be installed on the StorageGRID side, not on the client's side.
  • Load Balancer endpoints:
    • The Load Balancer service runs on the Admin Node and Gateway Node. You create load balancer endpoints for client connections.
    • When creating a load balancer endpoint, you specify a port number, whether the endpoint accepts HTTP or HTTPS connections, S3 or swift, and the certificate to be used for HTTPS connections.
  • High Availability Groups:
    • You can create a HA group of Gateway Nodes or Admin Nodes to create an active-backup configuration, or you can use round-robin DNS or a third-party load balancer and multiple HA groups to achieve a HA configuration. Client connections are made using the virtual IP addresses of HA groups.
    • If an HA group includes different node types such as an Admin Node and Gateway Nodes, only services common in both nodes can failover to a standby Node when the preferred master fails.
  • Alert notification:
    • Metadata storage and object data storage are two major attributes to monitor.
    • In addition to alert email notifications, you can configure the SNMP agent to send SNMP notifications.
  • S3 bucket consistency guidelines:
    • StorageGRID stores object metadata in a Cassandra database; each Storage Node hosts an instance of Cassandra. StorageGRID stores 3 replicas of the metadata per site.
    • Consistency controls:
      • all
      • strong-global
      • strong-site(M)
      • read-after-new-write (default)
      • available
      • weak (not recommended)
  • What to monitor:
    • (D) The system health data show on the Grid Manager Dashboard
    • (D) Verify that there is no attention/caution warning on the Diagnostics page
    • (W) Rate at which Storage Node object and metadata capacity is being consumed
    • (W) ILM operations on Grid Manager Dashboard
    • (W) Performance, networking and system resources
    • (W) Tenant activity
    • (M) Availability of software hotfixes and software upgrades
    • (C) Load balancing operations after a configuration change
Additional Notes


To be continued...

Comments