In the previous
post I mentioned I’d create an index for a load of downloaded PDFs, where the
PDFs are of the filename TR-XXXX.PDF {XXXX is a number}, so we have a mapping
of TR-XXXX to document title. At the time of writing that post, I wasn’t sure how
I’d create the index.
The
problem is to map the TR-XXXX.PDF to a document title, and I can’t see any easy
way to map this without opening each PDF and typing out that title. It seems
like the task is one of manual labour, but the following tool helps a little.
It opens each PDF in turn using PowerShell, and allows you to type out the
title into PowerShell, which feeds into an index file. You can have the PDF and
PowerShell open side by side, so it’s just a case of: type out title, press
ENTER, type out another title, and on... It’s a little automation that helps a little!
Image: PDF Indexer in Action!
Note: There probably is an index somewhere
listing NetApp TRs and document title, I’ve not found it though and I’ve not
asked (I found it not un-useful to be aware of all the TR titles - some of the
TRs I never knew existed). If any reader is aware of an official index, please
share the knowledge
The Script
Copy
into a text editor and save as say PDF_Indexer.ps1
and then run in PowerShell. It needs to be run in the same folder as the PDFs.
[Int]$RangeStart = 4000
[Int]$RangeEnd = 5000
If(Test-Path
"Index.txt"){}
else{New-Item
"Index.txt" -Type File -Force}
$FolderContents =
Get-ChildItem
For($i=$RangeStart; $i -le
$RangeEnd; $i++){
$FolderContents | Foreach{
If($_.Name -match ".+\.pdf$"){
If($_.Name.Contains("$i")){
Start-Process $_.Name
$Title = Read-Host "Title
TR-$i"
("TR-$i" + ":" +
" $Title") | Out-File "Index.txt" -Append
}
}
}
}
You can use the metatag "title" of a PDF to rename the file.
ReplyDeleteI've tried it with the tool "Advanced Renamer" (https://www.advancedrenamer.com/) which worked quite well. But unfortunately many TRs have no title or just the name of the original template:
https://abload.de/img/unbenanntwbsy0.png
There is also a Python-script on GitHub which does the same:
https://github.com/jdmonaco/pdf-title-rename
Anways, thanks for the index. I used that list to rename all the TRs with the Advanced Renamer. Unfortunately now your PowerShell-Script thinks the PDFs are not there anymore and redownloads the PDFs.
It would be nice if it would only check the first seven chars of the filename (TR-XXXX) and then decide if there is a newer version of that PDF or not.
Even more awesome would it be if the newly downloaded version of a TR would also get the filename of the older version. So that one would not need to rename the newer version of a TR again.
Another suggestion: Adjust the download-part of the script so that it at least retries to download a TR for three times. I had two TRs in the 4000 to 5000 range which I actually could download in the second run.
DeleteThanks Oli.
DeleteI was originally going to check only on the TR-XXXX part of the filename, then got lazy. I'll revisit the script later (also to re-attempt download). Since there's now an index, I could make the PowerShell read the index post and rename that way.
I'll put a reminder in my calendar to spend a few moments updating the index every month. Really, I should just ask NetApp to publish a page with links to all their TRs with latest title (not totally sure why this doesn't exist, or - if it does - where it is.)