Find Part of a Folder name that is duplicate

I have thousands of folders in a directory and each time a set of files are processed a new version is created of that folder. For example:
3-CC-TEST-v1
3-CC-TEST-v3
14061-TISB-v1
14061-TISB-v8
14061-TISB-v20

How do I look at everything before the first dash in the folder name and find duplicates? The number can be anything (no fixed length) so I just need to look at everything before the first dash and find duplicates? I just need to write it out to the screen and view the list.

Sean,
Welcome to the forum. :wave:t4:

If it’s always the dash you can use a calculated property splitting the folder name on the dashes and use the first element of the resulting array for further investigations:

$InputFolderNameList = 
'3-CC-TEST-v1',
'3-CC-TEST-v3',
'14061-TISB-v1',
'14061-TISB-v8',
'14061-TISB-v20'

foreach ($InputFolderName in $InputFolderNameList) {
    [PSCustomObject]@{
        Oringinal = $InputFolderName
        FirstPart = ($InputFolderName -split '-')[0]
    }
}

BTW: When you post code, sample data, console output or error messages please format it as code using the preformatted text button ( </> ). Simply place your cursor on an empty line, click the button and paste your code.

Thanks in advance

How to format code in PowerShell.org <---- Click :point_up_2:t4: :wink:

1 Like

So based on that I came up with this…

$InputFolderNameList = Get-ChildItem "D:\VODContent" -Recurse -Directory

foreach ($InputFolderName in $InputFolderNameList) {
    [PSCustomObject]@{
        Oringinal = $InputFolderName
        FirstPart = ($InputFolderName -split '-')[0]      
    }
}

The Output is

Oringinal      FirstPart
---------      ---------
14061-TISB-v1  14061    
14061-TISB-v20 14061    
14061-TISB-v8  14061    
3-CC-TEST-v1   3        
3-CC-TEST-V3   3        
4-TEST2-v1     4 

Now I only want to see duplicates from the list. 4-TEST2-v1 or 4 from the FirstPart is unique and shouldn’t show in the list.

Great. :+1:t4:

You can use Group-Object to group the output for the property FirstPart and limit the output of that to the groups having a count of more than one. :wink:

So then I suspect I need to include

Group-Object -InputObject $FirstPart | Where-Object {$_.Count -gt 1} 

Somewhere but I don’t know where?

i followed olaf’s instructions. try something like below. i’m also learning :slight_smile:

$InputFolderNameList = 
'3-CC-TEST-v1',
'3-CC-TEST-v3',
'14061-TISB-v1',
'14062-TISB-v8',
'14061-TISB-v20'

$names = 
foreach ($InputFolderName in $InputFolderNameList) {
    [PSCustomObject]@{
        Oringinal = $InputFolderName
        FirstPart = ($InputFolderName -split '-')[0]
    }
}

$names | Group-Object -Property FirstPart | Where-Object {$_.count -gt 1} | 
Select-Object -ExpandProperty Group
1 Like

Thanks both of you.

I ended up with these scripts.

Folders:

$InputFolderNameList = Get-ChildItem "E:\VODContent" -Recurse -Directory

$names = 
foreach ($InputFolderName in $InputFolderNameList) {
    [PSCustomObject]@{
        Oringinal = $InputFolderName
        FirstPart = ($InputFolderName -split '-')[0]
    }
}

$names | Group-Object -Property FirstPart | Where-Object {$_.count -gt 1} | 
Select-Object -ExpandProperty Group

Files: (filtering .mp4 files only)

$InputFileNameList = Get-ChildItem "E:\VODContent" *.mp4

$names = 
foreach ($InputFileName in $InputFileNameList) {
    [PSCustomObject]@{
        Oringinal = $InputFileName
        FirstPart = ($InputFileName -split '-')[0]
    }
}

$names | Group-Object -Property FirstPart | Where-Object {$_.count -gt 1} | 
Select-Object -ExpandProperty Group

Removing the filtering does combine both files and folders and is additionally helpful.
I found more old unnecessary duplicate files. In a folder containing over 6000 items.


Just a little more background on the purpose if you are curious…
(All of the files and folders are generated by a system creating Video On Demand Files. The software has changed so now everything new is contained within folders but older content is just in the root as mp4 files until retranscoded. The retranscoding process is what adds the v and number to the end of the file or folder.)

1 Like

I’m glad it helped. And thanks for sharing. :+1:t4:

It might be complaining at a high level and it depends on the amount of files and folders you have to treat and on the speed of your infrastructure but when you need to run this task regularily you may use only one query collecting the files AND folders and process them at once.

Usually file system operations are the most expensive ones according to CPU time and it saves a lot of runtime when you can minimize those. :wink: