Monday, 4 April 2016

How to Delete Files with Access Time before Year X on NetApp cDOT

Post 666, 13th post of 2016, guess that means it’s time for something wicked...

The following tool enables you to scan a NAS volume/path on cDOT, for all files and folders in the folder hierarchy, and get the size and last accessed timestamp. What’s more, we can - if we so desire - automatically delete all files with access time older than year X, and this is using API, no CIFS/NFS protocol access required.

The tool is just a fling, and I really don’t expect anyone to try this on their production environment. The usual Caveat Utilitor applies. Potentially, you could reclaim a lot of space and save money with this tool. There’s likely to be many many files on a NAS filesystem that have not been accessed in a long time, and will remain on the filesystem in perpetuity until someone/something deletes them (and even after they’ve been deleted, they remain on disk until the snapshots have aged out.)

Note: The recursive function in the script has been used before, check out these posts - Jan 2015, July 2015, July 2015, July 2015, October 2015.

An Example

The following outputs were generated using the script running with the following Syntax (note that I overrode the detection of years in the future, to demonstrate deleting some files.)


PS> .\ScanFilesAndFoldersForLastAccessedTime.ps1 -Cluster 10.10.10.100 -Vserver SVMA -Volume WDATA001 -SubPathAfterVolume "/TEST FOLDER" -DeleteFilesWithLastAccessTimeBeforeYear 2017 -YesIreallyWantToDeleteFiles


Image: PowerShell Output
Image: CSV Output
The Script (ScanFilesAndFoldersForLastAccessedTime.ps1)

Note: As always, formatted for blogger by replacing tabs with double spaces.

############################################
## ScanFilesAndFoldersForLastAccessedTime ##
############################################

Param(
  [parameter(Mandatory=$true)][String]$Cluster,
  [parameter(Mandatory=$true)][String]$Vserver,
  [parameter(Mandatory=$true)][String]$Volume,
  [parameter(HelpMessage="If specified, path is: /vol/Volume + SubPathAfterVolume")]
  [String]$SubPathAfterVolume,
  [String]$Title   = "ScanFilesAndFoldersForLastAccessedTime",
  [String]$LogFile = "$Title-$Cluster-$Vserver-$Volume.log",
  [Int]$DeleteFilesWithLastAccessTimeBeforeYear,
  [Switch]$YesIreallyWantToDeleteFiles
)

## ===== OUTPUT FUNCTION ===== ##
Function Wr{
  Param([String]$Echo = "",[String]$Ink = "WHITE",[Switch]$EXIT,[Switch]$N)
  $Echo >> $LogFile
  If($EXIT){ Write-Host $Echo -ForegroundColor RED; Write-Host; EXIT }
  If($N){ Write-Host $Echo -ForegroundColor $Ink -NoNewLine }
  else{ Write-Host $Echo -ForegroundColor $Ink }
};Wr

## ===== TITLE ===== ##
Wr ("+++++ $Title (Runtime: " + (Get-Date).DateTime + ") +++++");Wr
Wr "IMPORTANT: If possible, run this against your DR system since it will generate substantial disk, CPU, and logging (command-history.log) load on the destination system."
Wr "NOTE: This tool does not use protocol acccess, it does not recquire NFS or CIFS access to the file system to be scanned."
Wr "NOTE: You do not need to suspend mirrors if scanning DR."
Wr "PERFORMANCE TIP 1: Point this at the node-management LIF of the node that has the volume you're scanning."
Wr "PERFORMANCE TIP 2: The server you're running this script on should ideally be local to the cDOT system ..."
Wr "PERFORMANCE TIP 3: ... and of a reasonable spec (i.e. 4GB RAM minimum, 2 cores or better recommended.)"
Wr "WARNING: Deleting files only works against RW volumes!";Wr

## ===== YEAR DETECTION ===== ##
If($YesIreallyWantToDeleteFiles){
  If($DeleteFilesWithLastAccessTimeBeforeYear -gt (Get-Date).Year){
    $YesIreallyWantToDeleteFiles = $FALSE
    Wr "$DeleteFilesWithLastAccessTimeBeforeYear is in the future - disabling file deletion!" YELLOW;Wr
  }
}

## ===== TRAP DETECTION ===== ##
Trap{
  Wr;Wr "TRAP DETECTED!" RED
  Wr ("Time of Error : " + (Get-Date).DateTime) RED
  Wr ("Line Number   : " + $_.InvocationInfo.ScriptLineNumber) RED
  Wr ("Offset in Line: " + $_.InvocationInfo.OffsetInLine) RED
  Wr ("Error Message : " + $_.Exception.Message) RED;Wr
}

## ===== LOAD THE DATA ONTAP PSTK ===== ##
If(!(Get-Module DataONTAP)){ [Void](Import-Module DataONTAP -ErrorAction SilentlyContinue) }
If(!(Get-Module DataONTAP)){ Wr "Failed to load DataONTAP PSTK!" -EXIT }
else{ Wr "Loaded DataONTAP PSTK" GREEN }

## ===== CREDENTIALS & CONNECT TO CONTROLLER DETECTION ===== ##
If(!(Get-NcCredential $Cluster)){ Wr "Failed to Get-NcCredential $Cluster, please Add-NcCredential $Cluster!" -EXIT }
else{ Wr ("Using credential " + ((Get-NcCredential $Cluster).Credential.UserName) + " for $Cluster") GREEN }
If(!(Connect-NcController $Cluster -ErrorAction SilentlyContinue)){ Wr "Failed to connect to cluster $Cluster!" -EXIT }
else{ Wr "Successfully tested connection to $Cluster" GREEN }  

## ===== DATA SVM DETECTION ===== ##
If( !(Get-NcVserver | where { ($_.VserverType -eq "data") -and ($_.VserverName -eq $Vserver) }) ){ Wr "Cluster $Cluster has no running data SVM called $Vserver!" -EXIT }
else{ Wr "Found data SVM called $Vserver" GREEN }

## ===== VOLUME DETECT ===== ##
$VolAttrs = Get-NcVol -Template
Initialize-NcObjectProperty -Object $VolAttrs -Name VolumeIdAttributes
$VolAttrs.VolumeIdAttributes.Type = ""
$GetNcVol = Get-NcVol -Attributes $VolAttrs -VserverContext $Vserver | where{ $_.Name -eq $Volume }
If(!$GetNcVol){ Wr "Volume $Volume does not exist in SVM $Vserver!" -EXIT }
else{ Wr "Found volume $Volume in SVM $Vserver" GREEN }
$GetNcVol = $GetNcVol | where{ $_.Name -eq $Volume }

## ===== RW VOLUME DETECT ===== ##
If($YesIreallyWantToDeleteFiles){
  If($GetNcVol.VolumeIdAttributes.Type -ne "rw"){
    $YesIreallyWantToDeleteFiles = $FALSE
    Wr "Can only delete from RW type volumes!" YELLOW
  }
}

## ===== READ-NCDIRECTORY ATTRIBUTES FILTER ===== ##
$NcDirAttrs = Read-NcDirectory -Template
$NcDirAttrs.Name = ""
$NcDirAttrs.Empty = ""
$NcDirAttrs.FileType = ""
$NcDirAttrs.FileSize = ""
$NcDirAttrs.AccessedTimestamp = ""
$NcDirAttrs.ModifiedTimestamp = ""
$NcDirAttrs.CreationTImestamp = ""

## ===== PATH DETECT ===== ## 
[String]$StartReadNcDirPath = "/vol/$Volume" + $SubPathAfterVolume
If( !(Read-NcDirectory -Path $StartReadNcDirPath -Attributes $NcDirAttrs -VserverContext $Vserver)){ Wr "Path $StartReadNcDirPath does not exist!" -EXIT }
else{ Wr "Verified path $StartReadNcDirPath" GREEN };Wr

##############
## SCANNING ##
##############

## ===== START ===== ##
$StartDate = (Get-Date).DateTime
Wr "<<<<< Starting scanning @ $StartDate >>>>>";Wr

## ===== INITIALIZE SOME VARIABLES ===== ##
[Int64]$Global:FileCount = 0
[Int64]$Global:KBdeleted = 0
[Int64]$Global:FolderCount = 0
[Int64]$Global:FilesDeleted = 0
[System.Array]$Global:FilesAndFolders = @()

## ===== RECURSIVE SCANNING FUNCTION ===== ##
Function GetDirInfoRecursive {
  Param([String]$PathToReadRecursive)   
  Wr "." -N
  $GetDirInfo = Read-NcDirectory -Path $PathToReadRecursive -Attributes $NcDirAttrs -VserverContext $Vserver
  Foreach ($line in $GetDirInfo){
    $PathToInspect = ($PathToReadRecursive + "/" + $line.Name)
    $Object = New-Object PSObject
    [Void](Add-Member -InputObject $Object -MemberType NoteProperty -Name "Type" -Value $line.FileType)
    [Void](Add-Member -InputObject $Object -MemberType NoteProperty -Name "Path" -Value $PathToInspect)
    [Void](Add-Member -InputObject $Object -MemberType NoteProperty -Name "Size KB" -Value $line.FileSize)
    [Void](Add-Member -InputObject $Object -MemberType NoteProperty -Name "Accessed Time"  -Value $line.AccessedTimestampDT)
    [Void](Add-Member -InputObject $Object -MemberType NoteProperty -Name "Modified Time"  -Value $line.ModifiedTimestampDT)       
    [Void](Add-Member -InputObject $Object -MemberType NoteProperty -Name "Creation Time"  -Value $line.CreationTimestampDT)
    If($line.FileType -eq "directory"){
      If(($line.Name -ne ".") -and ($line.Name -ne "..") -and ($line.Name -ne ".snapshot")) {
        $Global:FolderCount ++
        $Global:FilesAndFolders += $Object
        If ($line.Empty -ne "False"){GetDirInfoRecursive $PathToInspect}
      }                   
    } elseif($line.FileType -eq "file"){
        If($YesIreallyWantToDeleteFiles){
          If($line.AccessedTimeStampDT.Year -lt $DeleteFilesWithLastAccessTimeBeforeYear){
            [Void](Remove-NcFile -Path $PathToInspect -VserverContext $Vserver -Confirm:$FALSE)
            Wr " DELETED FILE -> $PathToInspect " RED -N
            $Global:KBdeleted += $line.FileSize
            $Global:FilesDeleted ++
          }
        }
        $Global:FileCount ++
        $Global:FilesAndFolders += $Object
    }
  }
}

## ===== MAIN PROGRAM (Calls GetDirInfoRecursive) ===== ##
GetDirInfoRecursive $StartReadNcDirPath

## ===== FINISH ===== ##
$FinishDate = (Get-Date).DateTime
Wr;Wr;Wr ">>>>> Finished scanning @ $FinishDate <<<<<";Wr

############
## OUTPUT ##
############

$Date = Get-Date -uformat "%Y%m%d%H%M"
$OutputFileName = "$Title-$Cluster-$Vserver-$Volume-$Date.CSV"
Wr "<<<<< Output >>>>>";Wr
Wr "Started scanning at: $StartDate"
Wr "Finished scaning at: $FinishDate";Wr
Wr ("Processed folders  : " + [String]$Global:FolderCount)
Wr ("Processed files    : " + [String]$Global:FileCount)
If($YesIreallyWantToDeleteFiles){
  Wr ("Files deleted      : " + [String]$Global:FilesDeleted)
  Wr ("KB deleted         : " + [String]$Global:KBDeleted)
};Wr
Wr "Log file           : $LogFile"
Wr "Output CSV         : $OutputFileName";Wr
$Global:FilesAndFolders | Export-CSV $OutputFileName -NoType

3 comments:

  1. Thank you for this post! This is exactly what I need to help automatically maintain a CIFS share that people use as a "temp" file dump.

    ReplyDelete
  2. I've been thinking about this script and one thing here that isn't clear to me: if I ran this script as-is on January 2, 2017... wouldn't everything previous to January 1, 2017 be deleted? And then the volume would be practically empty?

    Also, if my assumption is wrong, how I could be more aggressive and delete files with an access time of more than 6 months previous? I don't see where you're calculating "DeleteFilesWithLastAccessTimeBeforeYear" except in the Year Detection section.

    ReplyDelete
    Replies
    1. Hello Kukhuvud,
      I only wrote this with a granularity of a year. Say your organization had a 2 year snapshot retention, and you wanted to get rid of all files not accessed in the last 3 years; in 2017, you'd go for year 2014, this leave all the files last accessed in 2014, 2015, 2016, and the current year (2017), completely untouched; but everything last accessed before 2014 would be deleted. It wouldn't delete files created in say 2012, but accessed in say 2016. To get it to work on months you'd need to modify the script.
      Cheers, VC

      Delete