Why does this Where / Where-Object command NOT work?

I have a folder consisting of 16,497 subfolders containing over 72,000 files of all types, including thousands of image files with BMP, EMF, GIF, JPG, PDF, PNG, TIF, and WMF extensions. The image files are supposed to all have an accompanying DOC file that explains the image in detail. However, I have discovered that not all do. Therefore, I’m trying to write a PowerShell script to find all the image files that do not have DOC files with the same base name and report the FULLNAME to a file called NODOCS.TXT. I will then examine each reported image file to see if it has a misnamed DOC file or if the DOC file is truly missing. (NOTE: I’ve discovered that the person who created all these files didn’t always match the image and DOC files’ base names.)

As a start, I tried writing a script to identify all the image files in all these 16K subfolders. It didn’t work as planned, and I don’t understand what I’ve done wrong.

Here’s what I wrote to try and find all the image files:

$LookHere = "\\VFS1\CompanyShared\ENotebook Directory"
$ExtensionsToCheck = @(
	'.bmp'
	'.emf'
	'.gif'
	'.jpg'
	'.pdf'
	'.png'
	'.tif'
	'.wmf'
)
Get-ChildItem $LookHere -File -Recurse | Where ({$_.extension -eq $ExtensionsToCheck}) | Select -ExpandProperty FullName

I ran this and got nothing. When I say, “I got nothing,” I mean the script appeared to run, and it didn’t return any error messages, but it also returned no files. So, then I modified it as follows:

$LookHere = "\\VFS1\CompanyShared\ENotebook Directory"
$ExtensionsToCheck = @(
	'.bmp'
	'.emf'
	'.gif'
	'.jpg'
	'.pdf'
	'.png'
	'.tif'
	'.wmf'
)
Get-ChildItem $LookHere -File -Recurse | Where ({$_.extension -eq "*.pdf"}) | Select -ExpandProperty FullName

I had the same results as above. I made a further modification to my script.

$LookHere = "\\VFS1\CompanyShared\ENotebook Directory\*.pdf"
$ExtensionsToCheck = @(
	'.bmp'
	'.emf'
	'.gif'
	'.jpg'
	'.pdf'
	'.png'
	'.tif'
	'.wmf'
)
Get-ChildItem $LookHere -File -Recurse | Select -ExpandProperty FullName

This returned about 17,000 PDF files.

I then changed “Where” to “Where-Object” thinking that might make a difference. The script is as follows:

$LookHere = "\\VFS1\companyshared\ENotebook Directory"
$ExtensionsToCheck = @(
	'.bmp'
	'.emf'
	'.gif'
	'.jpg'
	'.pdf'
	'.png'
	'.tif'
	'.wmf'
)
Get-ChildItem $LookHere -File -Recurse | Where-Object {$_.extension -eq "*.pdf"} | Select -ExpandProperty FullName

Again, the script appeared to run, and I didn’t get any error messages, but it returned no files.

Can someone please explain to me why the scripts don’t work with Where or Where-Object in them?

Also, given how much trouble I’m having even finding the image files, I’m not sure at this point how I’m going to run a comparison of the base names to find matching base names with .doc extensions. Any suggestions or code would be appreciated.

Robert,
Welcome back to the forum. :wave:t4: … long time no see. :slight_smile:

I’d use a slightly different approach … we’ll come to that later … :wink:

Get-ChildItem seems to act a little bit bitchy and counterintuitive sometimes. :wink:

To speed up your query you should try to filter as far left in a pipeline as possible. So it would be the best option to filter directly with the parameters of Get-ChildItem if possible. To achieve that you may use the paramter -Include and feed it with the list of extensions you like. So it could look something like this:

$Path = '\\VFS1\CompanyShared\ENotebook Directory\*'
$Include = @(
    '*.bmp'
    '*.emf'
    '*.gif'
)

$FileList = 
Get-ChildItem -Path $Path -Include $Include 

$FileList

(I just want to show how it works, so I shortened your extension list :wink: )

How does it look now?

It does not make any diffrence. Where is an alias for Where-Object. You can find all aliasses with Get-Alias. But actually you should not use aliasses in scripts as they make your code harder to read. I’d recommend to always read the help for the cmdlets you’re about to use completely including the examples to learn how to use them.

The issue with your Where-Object filter is your comparison in it. You cannot compare an array to a single element with -eq. You can learn more about comparison operators here:

With this step you limit yourself to a plain text list of paths. I wouldn’t do that. Since PowerShell works with objects and properties you should use this advantages for you.

Ok … my approach …

First of all - if possible I’d recommend to run you script locally on the computer where the files are saved. It will speed up your script by some magnitudes compared to be running over a UNC path.

Second - I’d collect all needed files at once, save it in an array and use this array as often as needed. So it’s not necessary to run the slow file system query more than once.

You’re basically looking for file basenames without any twin brother or sister, right? :wink: Nothing easier than that:

$Path = '\\VFS1\CompanyShared\ENotebook Directory\*'
$Include = @(
    '*.bmp'
    '*.emf'
    '*.gif'
    '*.jpg'
    '*.pdf'
    '*.png'
    '*.tif'
    '*.wmf'
    '*.doc'
    '*.docx'
)

$FileList = 
Get-ChildItem -Path $Path -Include $Include -Recurse -File

$FileList |
    Group-Object -Property BaseName |
        Where-Object -Property Count -LT -Value 2

I’d recommend to test it with local test folders! :wink:

First you collect ALL files with their properties (no -ExpandProperty!!)
Then you group this list by the basename property. So all files with the same name but different extensions end up in one group.
Now you filter for groups with less than 2 members. And there you have your orphaned image or doc files. :wink:

At the end you can export the results in any thinkable way … for example to a CSV for further steps.

You may pay special attention to one edge case for this scenario. If you have files with the same basename in different folders or subfolders you have to adapt your grouping and filtering.

2 Likes

Hi, Olaf,

Thank you! This looks very promising, and I appreciate the help.

I avoided using -Include because everything I’ve read says -Include is slow, and with >72K files, I thought it might take an exceedingly long time to generate the list. I was also working with the network location for expediency in testing my code. With the implementation, I plan to map my top-level folder as a drive and then delete the mapped drive when I finish getting the data.

I will test your solution to verify that it works, and I’ll mark it as a solution once I’ve verified it.

Thanks again for your help!

Robert

At least it will be faster than using Where-Object.

Depeding on the general content of the folders - if these folders are dedicated to these file types and there are basically no others - you can omit the -Include completely and simply collect ALL files. :man_shrugging:t4:

Again … you can speed up your script when you run it locally on the fileserver. Running the query against a network drive - and it does not matter if it’s mapped or not - will take very much more time.

1 Like

Hi, Olaf,

Your method worked, and I marked it as the solution. I had to add a list of exclusions to weed out .doc files that were irrelevant to my search, and I removed the ‘.docx’ include because it gave me only irrelevant files. My final code is shown below. Thanks so much for your help!

$Path = '\\VFS1\CompanyShared\ENotebook Directory\*'
$SendTxt = 'E:\Company\Automation-Projects\PowerShell\NODOCS.TXT'
$Include = @(
    '*.bmp'
    '*.emf'
    '*.gif'
    '*.jpg'
    '*.pdf'
    '*.png'
    '*.tif'
    '*.wmf'
    '*.doc'
)
$Exclude =@(
	'*_Reaction*.doc'
	'*Table of Contents.doc'
	'_Flash*.doc'
)
$FileList = 
Get-ChildItem -Path $Path -Include $Include -Exclude $Exclude -Recurse -File

$FileList |
    Group-Object -Property BaseName |
        Where-Object -Property Count -LT -Value 2 | Out-File -FilePath $SendTxt

Best Regards,

Robert

Great to hear. :+1:t4:

And thank’s for sharing. :love_you_gesture:t4:

1 Like