Here I present a little script I cooked up to work out
the MTBF of one disk to fail amongst your entire Clustered ONTAP storage
estate.
The theory behind the script is slightly shaky. Pretty
much the only figure we have available to ourselves, to work out how regularly
we might expect a disk to fail in our estate, are the published MTBF figures,
and for the calculation, I’ve taken the view (probably wrong) that if one disk
has an MTBF of 2 million hours, 2 will have an MTBF of 1 million hours, and so
on …
Firstly, you’ll want to setup PowerShell to connect to
all your clusters (you might consider using my CDOT PowerShell connections
manager from this
post to do that.) Then copy the script below into notepad/notepad++ and
save as say mtbf.ps1, and load the function into PowerShell from the mtbf.ps1
script using (remember the space between dot and dot):
. .\mtbf.ps1
Finally, run the function using:
mtbf
The script scans all your clusters for various types of
disks, working out a per type disk failure rate for one disk to fail in the
estate. Then - la piece de resistance - is that it works out amongst all the
various disk types, an MTBF calculated value for one disk to fail in the entire
estate.
The Script
### START OF SCRIPT - mtbf V1.3b ###
FUNCTION mtbf {
<# The following two hashed lines contain all disk types. So
as not to waste cluster CPU cycles searching for stuff that isn't there, RECOMMEND
reducing the list by removing the disk types you know you definitely don't
have. #>
# $diskMTBFs =
@("ATA",1.2,"BSAS",1.2,"FCAL",1.6,"FSAS",1.6,"LUN",0,"MSATA",1.2,"SAS",1.6,"SATA",1.2,"SSD",2.0)
# Figures in millions of hours!
# $diskRecord = @(0,0,0,0,0,0,0,0,0) # Need a 0 for each active
type of disk!
$diskMTBFs
= @("ATA",1.2,"BSAS",1.2,"FCAL",1.6,"FSAS",1.6,"LUN",0,"MSATA",1.2,"SAS",1.6,"SATA",1.2,"SSD",2.0)
$diskRecord
= @(0,0,0,0,0,0,0,0,0)
$diskTypesCount
= ($diskMTBFs.count)/2
$diskAttributes
= Get-NcDisk -template
$diskAttributes.name
= ""
$diskQuery
= get-ncdisk -template
Initialize-NcObjectProperty
-object $diskQuery -name DiskInventoryInfo
$count
= 0
do {
$disks
= $null
$diskQuery.DiskInventoryInfo.DiskType
= $diskMTBFS[$count*2]
$disks
= Get-NcDisk -Query $diskQuery -attributes $diskAttributes
if ($disks){$diskRecord[$count]
= $disks.count}
$count++
}
until ($count
-eq $diskTypesCount)
$mtbfOutput
= @()
$mtbfOutput
+= " "
$mtbfOutput
+= "Using MTBF: For a NetApp CDOT estate roughly how much time for 1 disk
to fail!"
$mtbfOutput
+= "##############################################################################"
$mtbfOutput
+= " "
$mtbfOutput
+= "The following types of disks were detected in your estate:"
$mtbfOutput
+= " "
$count
= 0
$totalDisks
= 0
$diskMTBFMultiplier
= 1
do {
$diskRecordCount
= $diskRecord[$count]
$diskRecordType
= $diskMTBFs[$count*2]
$diskRecordMTBF
= $diskMTBFs[$count*2+1]
if (($diskRecordCount -ne 0) -and ($diskRecordType -eq "LUN")){
$mtbfOutput
+= "$diskRecordCount disks of type $diskRecordType with an unknown
MTBF."}
if (($diskRecordCount -ne 0) -and ($diskRecordType -ne "LUN")){
[int]$diskRecordHours
=(1000000*($diskMTBFs[$count*2+1])/$diskRecord[$count])
[int]$diskRecordDays
= $diskRecordHours / 24
$mtbfOutput
+= "$diskRecordCount disks of type $diskRecordType with an MTBF of
$diskRecordMTBF million hours each."
$mtbfOutput
+= "MTBF based time for one disk to fail amongst all these disks is
$diskRecordHours hours ($diskRecordDays days)."
$mtbfOutput
+= " "
$diskMTBFMultiplier
= $diskMTBFMultiplier * $diskRecordMTBF
$totalDisks
= $totalDisks + $diskRecord[$count]}
$count++
}
until
($count -eq $diskTypesCount)
$count
= 0
$diskRecordCountSpecial
= 0
$mtbfOutput
+= "Combined"
$mtbfOutput
+= "########"
$mtbfOutput
+= " "
do {
$diskRecordCount
= $diskRecord[$count]
$diskRecordType
= $diskMTBFs[$count*2]
$diskRecordMTBF
= $diskMTBFs[$count*2+1]
if (($diskRecordCount -ne 0) -and ($diskRecordType -ne "LUN")){
$diskRecordCountSpecial
= $diskRecordCountSpecial + ($diskRecordCount * $diskMTBFMultiplier /
$diskRecordMTBF)}
$count++
}
until
($count -eq $diskTypesCount)
[int]$diskRecordTotalHours
= (1000000*$diskMTBFMultiplier/$diskRecordCountSpecial)
[int]$diskRecordTotalDays
= $diskRecordTotalHours / 24
$mtbfOutput
+= "Considering all $totalDisks disks:"
$mtbfOutput
+= " "
$mtbfOutput
+= "MTBF based time for one disk to fail in the entire disk estate is
$diskRecordTotalHours hours ($diskRecordTotalDays days)."
$mtbfOutput
+= " "
return $mtbfOutput}
### END OF SCRIPT ###
An Example
Output
As an example of
the script in action, the below doesn’t give an unreasonable figure for an
estate of greater than 2000 disks! In reality we’d expect a figure a fair bit
lower than given - I did say the theory behind the script was a bit shaky - as
a curiosity though, it serves its purpose.
Image: An MTBF
based calculation of 1 disk to fail amongst an estate of many!
Comments
Post a Comment