Search folder for files using a reference document with 1k a/c numbers

JoeBloggs · February 2, 2022, 9:55am

Hi guys,
New to PowerShell. Been asked by legal/HR to search a folder for files that either have in the name or in the contence of the Word files contain a reference account numbers from another file that has over 1000 numbers. The folder contains over 60k documents.

How do I even start…help please?
I assume it will be at least a 2 step process, 1 for the file name and the other for what’s in the file.

matt-bloomfield · February 2, 2022, 12:03pm

Hi, welcome to the forum

Searching the filenames is trivial with PowerShell but you’re going to struggle to search the contents of the Word documents. It’s doable with COM objects but it will be a pain.

I would suggest looking at a tool like Everything instead.

Depending on why you’ve been asked to do this, I would also suggest pushing this back on HR/Legal to pay for a professional resource to gather this information. If this is evidence gathering for some sort of legal proceedings there will be processes that need to be followed and there will be professionals out there that do this investigative stuff for a living.

JoeBloggs · February 2, 2022, 12:41pm

Thanks Matt,

I’ve already pushed back to legal/HR but they are looking for an initial search.
I’m looking at “Everything”, looks interesting.

As I’ve said before, I’m new to PowerShell, how would I reference a document with a/c numbers and search for these in the file name?

Thanks again.

matt-bloomfield · February 2, 2022, 1:55pm

You can read a text file with Get-Content. The text file should contain your list of a/c numbers with each number on a separate line.
You can get files with Get-ChildItem.

You don’t want to run Get-ChildItem 1000 times so build a list of files once, and then look for matches within the list.

This is a very basic example to demonstrate the idea above. I suspect it will match too many files to be useful, but without knowing how the filenames are structured it’s hard to be more specific.

$searchTerms = Get-Content E:\Temp\Files\searchTerms.txt
$fileList    = Get-ChildItem E:\Temp\ -Recurse

foreach ($term in $searchTerms) {
    $fileList | Where-Object {$_.FullName -match $term} | Select-Object FullName
}

We don’t provide complete scripts on request so you should try to build on this example and come back to us if you get stuck.

JoeBloggs · February 2, 2022, 2:41pm

Thanks Matt.
I’ll have a go.

“We don’t provide complete scripts on request so you should try to build on this example and come back to us if you get stuck.”
Not a problem, I’m here to learn

tonyd · February 2, 2022, 4:14pm

What is the context of the folder where these files reside? Any chance the back end is a SharePoint site? I wrote some C# code once to search massive amounts of Office documents and was impressed at how little code that took. I would think one could do the same with PowerShell.

Just thinking outside the box. I also agree with Matt, a somewhat daunting task. One other outside the box thought, if the documents are .DOCX, you can rename/copy them to .ZIP and process the underlying XML from that export. That might be a ton of work though.

My $.02

Topic		Replies	Views
Get-ChildItem and Searching from an Array PowerShell Help	1	178	June 22, 2018
Checking for a given file in a directory PowerShell Help	1	163	June 2, 2022
How to search for files with the same name but different extensions PowerShell Help	6	895	January 20, 2023
searching/filtering/where-object PowerShell Help	6	162	June 30, 2015
Powershell Script, which compare Name in List ans Files in Directory PowerShell Help	5	211	October 28, 2021

Search folder for files using a reference document with 1k a/c numbers

Related Topics