Search folder for files using a reference document with 1k a/c numbers

Hi guys,
New to PowerShell. Been asked by legal/HR to search a folder for files that either have in the name or in the contence of the Word files contain a reference account numbers from another file that has over 1000 numbers. The folder contains over 60k documents.

How do I even start…help please?
I assume it will be at least a 2 step process, 1 for the file name and the other for what’s in the file.

Hi, welcome to the forum :wave:

Searching the filenames is trivial with PowerShell but you’re going to struggle to search the contents of the Word documents. It’s doable with COM objects but it will be a pain.

I would suggest looking at a tool like Everything instead.

Depending on why you’ve been asked to do this, I would also suggest pushing this back on HR/Legal to pay for a professional resource to gather this information. If this is evidence gathering for some sort of legal proceedings there will be processes that need to be followed and there will be professionals out there that do this investigative stuff for a living.

1 Like

Thanks Matt,

I’ve already pushed back to legal/HR but they are looking for an initial search.
I’m looking at “Everything”, looks interesting.

As I’ve said before, I’m new to PowerShell, how would I reference a document with a/c numbers and search for these in the file name?

Thanks again.

You can read a text file with Get-Content. The text file should contain your list of a/c numbers with each number on a separate line.
You can get files with Get-ChildItem.

You don’t want to run Get-ChildItem 1000 times so build a list of files once, and then look for matches within the list.

This is a very basic example to demonstrate the idea above. I suspect it will match too many files to be useful, but without knowing how the filenames are structured it’s hard to be more specific.

$searchTerms = Get-Content E:\Temp\Files\searchTerms.txt
$fileList    = Get-ChildItem E:\Temp\ -Recurse

foreach ($term in $searchTerms) {
    $fileList | Where-Object {$_.FullName -match $term} | Select-Object FullName
} 

We don’t provide complete scripts on request so you should try to build on this example and come back to us if you get stuck.

1 Like

Thanks Matt.
I’ll have a go.

We don’t provide complete scripts on request so you should try to build on this example and come back to us if you get stuck.
Not a problem, I’m here to learn :slight_smile:

What is the context of the folder where these files reside? Any chance the back end is a SharePoint site? I wrote some C# code once to search massive amounts of Office documents and was impressed at how little code that took. I would think one could do the same with PowerShell.

Just thinking outside the box. I also agree with Matt, a somewhat daunting task. One other outside the box thought, if the documents are .DOCX, you can rename/copy them to .ZIP and process the underlying XML from that export. That might be a ton of work though.

My $.02

2 Likes