Find Duplicate in array of objects

by fazlook at 2012-10-29 06:34:01

Hi there,
I am trying to find a duplicate in my array $ArrayOfRecords for one of my objects and i dont really know what to use ? Is it 2 for loops ?

Here is my code:


#Variables
$Location ="H:\PowerShell\Final Scripts\Validate Delimiter File\bp_vendors.txt"
$File = Get-Content $Location
$PipesNeeded = 13
$Date =(Get-Date).ToString(‘yyyMMdd’)
$LineIndex = 1
$InvalidLines=0
$ArrayOfRecords = @()
$countModifiedRecords = 0
$linesCount = $File.Length
Write-host "START***"



Write-Host "This File Contain:" $linesCount "Lines."
foreach($line in $File)
{
$a = ([char]$line -match ‘|’).count
if ($a -ne $PipesNeeded)
{
Write-Host "Pipe Error: Line" $LineIndex "Does Not Contain" $NumOfPipes "The # Of Pipes (13) Requested"
$InvalidLines++
}
$LineIndex++
#***********
#Arr is an array of records (12 records each line in this script)
$ar = $Line.Split(‘|’)
#Create new Object
$record = New-Object System.Object

#Assign a name to each record to work with it as an object
$record | Add-Member -type NoteProperty -name ID -value $ar[0]
$record | Add-Member -type NoteProperty -name Vendor -Value $ar[1]
$record | Add-Member -type NoteProperty -name VendorType -value $ar[2]
$record | Add-Member -type NoteProperty -name Generator -Value $ar[3]
$record | Add-Member -type NoteProperty -name Settings -value $ar[4]
$record | Add-Member -type NoteProperty -name Description -value $ar[5]
$record | Add-Member -type NoteProperty -name BusinessDay -Value $ar[6]
$record | Add-Member -type NoteProperty -name BusinessType -value $ar[7]
$record | Add-Member -type NoteProperty -name DateTime -Value $ar[8]
$record | Add-Member -type NoteProperty -name Field9 -value $ar[9]
$record | Add-Member -type NoteProperty -name Field10 -value $ar[10]
$record | Add-Member -type NoteProperty -name Field11 -value $ar[11]
$record | Add-Member -type NoteProperty -name Field12 -value $ar[12]
$record | Add-Member -type NoteProperty -name Field13 -value $ar[13]
#ArrayOfRecords contains all records for each line
$ArrayOfRecords += $record
}

if ($InvalidLines -eq 0)
{
write-host "The File Is Valid And Ready For Processing…"
}
else
{
Write-Host "Then The File Is Not Valid. It Contains:" $InvalidLines "Invalid Lines.Contact Someone."
# Activebatch exit 16002
}

#All File Modifications Go Here#
#
#
#Write-Output "The File Is Now Converting…"
foreach ($i in $ArrayOfRecords)
{

I want to find duplicate for $i.Vendor object
}
by Jason_Yoder_MCT at 2012-10-29 16:26:40
Hi Fazlook,

Take a look at this example code below. What I did as to create a set of 11 objects. One of the objects had a duplicate value in Prop1. I displayed two sets of data. One set is the raw data that will show you all 11 objects (duplicate included). The second set uses the Sort-Object cmdlet and its -Unique parameter to filter out the duplicate value. You will be most interested in line 34. I’ve included the output as well.

I hope this helps,
Jason

# Create a dynamic array to hold the test objects.
$Array = @()

# Create 10 objects in the dynamic array.
For ($X = 0;$X -lt 10;$X++)
{
$Obj = New-Object PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Prop1" -Value $X
$Obj | Add-Member -MemberType NoteProperty -Name "Prop2" -Value $True
$Obj | Add-Member -MemberType NoteProperty -Name "Prop3" -Value $False

# Add the object to the array
$Array += $Obj
}

# Add a duplicate to the array. The duplication will be on the property
# Prop1 with a value of 3
$Obj = New-Object PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Prop1" -Value 3
$Obj | Add-Member -MemberType NoteProperty -Name "Prop2" -Value $True
$Obj | Add-Member -MemberType NoteProperty -Name "Prop3" -Value $False

# Add the object to the array
$Array += $Obj

# View the contents of the array. Notice the duplicate value in Prop 1
$Array

# Divider between the two sets of data.
Write-host "-----------------------------------------------------------"

# Use the -Unique parameter of the Sort-Object cmdlet.
# Note the duplicate object is gone.
$Array | Sort-Object -Property Prop1 -Unique


Prop1 Prop2 Prop3
----- ----- -----
0 True False
1 True False
2 True False
3 True False
4 True False
5 True False
6 True False
7 True False
8 True False
9 True False
3 True False
-----------------------------------------------------------
0 True False
1 True False
2 True False
3 True False
4 True False
5 True False
6 True False
7 True False
8 True False
9 True False
by fazlook at 2012-10-31 05:39:41
Seems cool, I will test this out for sure, but how would I leave only 1 in my txt file so there is no duplicate ? should I use delete ?
by Jason_Yoder_MCT at 2012-10-31 08:10:14
Fazlook,

Just so I do not steer you the wrong way.
Will your output be:
1. - All objects with duplicate values in the Vendor property?
2. - All objects that do not have duplicate values in the Vendor property?
3. - All objects, removing any object that has a value in the Vendor property that is previously in the array (Keeps the first object but removes any duplicates)

Jason
by fazlook at 2012-10-31 08:20:55
Yes
I want #1 to display all objects to show a report after…
I want#3 to remove any object that has a value in the Vendor property that is previously in the array.

And Btw, how do I check quickly if I have a Null value in my arrays of objects ? instead of checking each record ?
by Jason_Yoder_MCT at 2012-10-31 08:37:35
I only have a few minutes so let me quickly address The first item.

Take a look at the code that I sent you above. Line 34 pipes the array containing the objects to Select-Object using the -Unique parameter. Try this and see is it removes any duplicates.

$FilteredArray = $ArrayOfRecords | Select-Object -Property Vendor -Unique

Take a look and see if the collection of objects in $FilteredArray are all unique. Through some test data in there to make sure at least one of the records are retained in the data set.
by fazlook at 2012-10-31 10:09:16
That works :slight_smile: but how I display only the objects that they are duplicate ?
by Jason_Yoder_MCT at 2012-10-31 11:26:00
OK, We have some miss communication here. I’m trying to remove duplicates. You want to know which ones are. OK, Let me put some thought into that one and I’ll get to you.
by Jason_Yoder_MCT at 2012-10-31 11:36:29
OK, I believe I have what you want. I modified my original code to provide 2 different duplicate entries.

# Create a dynamic array to hold the test objects.
$Array = @()

# Create 10 objects in the dynamic array.
For ($X = 0;$X -lt 10;$X++)
{
$Obj = New-Object PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Prop1" -Value $X
$Obj | Add-Member -MemberType NoteProperty -Name "Prop2" -Value $True
$Obj | Add-Member -MemberType NoteProperty -Name "Prop3" -Value $False

# Add the object to the array
$Array += $Obj
}

# Add a duplicate to the array. The duplication will be on the property
# Prop1 with a value of 3
$Obj = New-Object PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Prop1" -Value 3
$Obj | Add-Member -MemberType NoteProperty -Name "Prop2" -Value $True
$Obj | Add-Member -MemberType NoteProperty -Name "Prop3" -Value $False

# Add the object to the array
$Array += $Obj

# Add a duplicate to the array. The duplication will be on the property
# Prop1 with a value of 3
$Obj = New-Object PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Prop1" -Value 6
$Obj | Add-Member -MemberType NoteProperty -Name "Prop2" -Value $True
$Obj | Add-Member -MemberType NoteProperty -Name "Prop3" -Value $False

# Add the object to the array
$Array += $Obj

# View the contents of the array. Notice the duplicate value in Prop 1
#$Array

# Divider between the two sets of data.
Write-host "-----------------------------------------------------------"

# Use the -Unique parameter of the Sort-Object cmdlet.
# Note the duplicate object is gone.
$Array2 = $Array | Sort-Object -Property Prop1 -Unique
(Compare-Object -ReferenceObject $Array2 -DifferenceObject $Array).InputObject


Take a look at lines 44 and 45. Line 44 Creates a second collection that has any duplicates filtered out. Line 45 compares the two sets and returns only objects that have duplicate values.

Hope this helps,
Jason
by Jason_Yoder_MCT at 2012-10-31 11:49:05
One more thing. If you have an object that is duplicated more than one, then the list of duplicates objects will list it more than once. If you only want the object listed once, change the last line to:

(Compare-Object -ReferenceObject $Array2 -DifferenceObject $Array).InputObject | Select-Object -Property * -Unique
by fazlook at 2012-10-31 13:24:47
Thank you for taking the time to make everything clear.

those 2 lines would make the trick but I cannot display line 45 !!!
by Jason_Yoder_MCT at 2012-10-31 14:48:22
OK, this one got messy.

I could not find a way to ask an array of objects if any of the properties of one of the objects is $NULL without looping. I also did not know what your properties were. So, I played around a bit.

$Object = @()

$Obj = New-Object -TypeName PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Name" -Value "One"
$Obj | Add-Member -MemberType NoteProperty -Name "Value" -Value "Green"
$Obj | Add-Member -MemberType NoteProperty -Name "Number" -Value 1
$Object += $Obj

$Obj = New-Object -TypeName PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Name" -Value "Two"
$Obj | Add-Member -MemberType NoteProperty -Name "Value" -Value $Null
#$Obj | Add-Member -MemberType NoteProperty -Name "Value" -Value "Red"
$Obj | Add-Member -MemberType NoteProperty -Name "Number" -Value 2
$Object += $Obj

$Obj = New-Object -TypeName PSObject
$Obj | Add-Member -MemberType NoteProperty -Name "Name" -Value "Three"
$Obj | Add-Member -MemberType NoteProperty -Name "Value" -Value "Blue"
$Obj | Add-Member -MemberType NoteProperty -Name "Number" -Value 3
$Object += $Obj

# Check to see if an object has any NULL values in
# its properties.

# Extract the names of the properties of the object.
$PropNames = (($Object | GM) |
Where {$.MemberType -eq "NoteProperty"} |
Select-Object -Property Name).Name

#Cycle through each object
ForEach ($Obj in $Object)
{
# Set to $True is a $Null value
#is found in a property.
$NullFound = $False
ForEach ($Prop in $PropNames)
{
If ($Obj.$Prop -contains $Null)
{
$NullFound = $True
}
}
If ($NullFound)
{
# Do what you want here is $NULL is found.
$NullFound
}
}


The good stuff starts on line 22. The rest just sets up 3 objects with 3 properties. The second object has a value set to $NULL in one of its properties. Line 26-28 extracts the names of your objects properties. Line 31 starts looping through each object in the array. Line 36 loops through each property in the individual object. Line 38 test to see if the property is $NULL. If it is, then $NullFound is set to $True. In line 43, you put what you want inside the IF statement for when you find an object with $NULL as one of its parameters.

That was an interesting one. Have fun!
by Jason_Yoder_MCT at 2012-10-31 17:37:18
Ok, while I’m stuck in hotel this evening with little to do, I thought that I would answer the second question that you asked. OK, I’m teaching PowerShell this week and decided to keep the geek persona up for a while longer. I could not find a way to search for a property with a NULL value in a collection of objects. The code below will return to you the index number in the collection of the object with the NULL value. I included a help file with this function. Just send it an array of objects and it will return the index number in the array of an object with a NoteProperty that is NULL.

Function Find-NullProperty
{
Param (
[Parameter(Mandatory=$True)]$Object
)
# Check to see if an object has any NULL values in
# its properties.

# Create an object to output to the calling statement.
$Output = @()

# Index number to return for to denote the instance of an object
# (should multiple instances be sent) that the Null value if found
# in.
$Index = -1

# Extract the names of the properties of the object.
$PropNames = (($Object | GM) |
Where {$
.MemberType -eq "NoteProperty"} |
Select-Object -Property Name).Name

#Cycle through each object
ForEach ($Obj in $Object)
{
# Increment the Index.
$Index++

# Set to $True is a $Null value
#is found in a property.
$NullFound = $False

ForEach ($Prop in $PropNames)
{
If ($Obj.$Prop -contains $Null)
{
$NullFound = $True
}
}
If ($NullFound)
{
# Create the object to be returned.
$Obj = New-Object -TypeName PSObject

# Add object members
$Obj | Add-Member <br> -MemberType NoteProperty
-Name "Index" <br> -Value $Index<br> <br> # Send the object to the output array.<br> $Output += $Obj <br> }<br> }<br> Write-Output $Output<br>&lt;#<br>.SYNOPSIS<br>Indicates if an object has a property that is $NULL<br><br>.DESCRIPTION<br>Search through a collection of custom objects and returns the index<br>number of any instance that has a $NULL value in a NoteProperty<br><br>.PARAMETER Object<br>The Object, or collection of objects that may contain a NULL value<br>in a note property.<br><br>.EXAMPLE<br>Find-NullProperty $Object<br><br> Index<br> -----<br> 1<br> 2<br><br>Returns the index number of instance of a collection of object in which<br>a NoteProperty has a NULL value.<br><br>.NOTES<br>Provided without warranty or support.<br>Author: Jason Yoder.<br>#&gt; <br>}</code></blockquote>by nohandle at 2012-11-01 03:08:17<blockquote>[quote=&quot;fazlook&quot;]That works :) but how I display only the objects that they are duplicate ?[/quote]<br>Hello, <br>from your example (reading list of vendors from a file) I assume you have a array of strings and try to fing a duplicates.<br>You can use this snippet.<br><code>$array = &quot;one&quot;,&quot;two&quot;,&quot;two&quot;,&quot;three&quot;,&quot;Three&quot;,&quot;Three&quot;<br><br>$array | ForEach-Object -Begin {<br> #create a hasthtable <br> $duplicates = @{}<br>} -process { <br>#here the magic happens, if you query<br>#a key in the hashtable that does not exist <br>#it is created<br> $duplicates.$_++<br>} -End { <br> $duplicates.GetEnumerator&#40;&#41; | <br> Where-Object {$_.value -gt 1} | <br> Sort-Object -Property Value -Descending<br>}<br>&lt;#output<br>Name Value <br>---- ----- <br>three 3 <br>two 2 <br>#&gt;</code></blockquote>by fazlook at 2012-11-02 19:40:50<blockquote>My array is an array of objects,I need to look for any duplicate in vendor name property and of course display how many and what's thier vendor number if possible. I will attach my code in my next post. <br><br>Vendor Number Vendor Name New Vendor Name <br>------------- ----------- --------------- <br>1 BC Hydrohahhaahahhahahah BC <br>2 Telus Barahahahahhaahaahahaha <br>3 Rogers Barahahahahhaahaahahaha Rogesr man <br>4 Telus Barahahahahhaahaahahaha Telus Bar2 <br>1 BC Hydrohahhaahahhahahah BC <br>1 BC Hydrohahhaahahhahahah BC <br>1 BC Hydrohahhaahahhahahah BC</blockquote>by fazlook at 2012-11-02 19:43:16<blockquote>this is an example of the file :<br>Line 1 , i dont count in....... all remaining lines must have 2 delimeter....i also check for null value such as line 2 (property new vendorname is null)<br>C1 Biller Fix Version|23<br>1|BC Hydrohahhaahahhahahah|BC<br>2|Telus Barahahahahhaahaahahaha|<br>3|Rogers Barahahahahhaahaahahaha|Rogesr man<br>4|Telus Barahahahahhaahaahahaha|Telus Bar2<br>1|BC Hydrohahhaahahhahahah|BC<br>1|BC Hydrohahhaahahhahahah|BC<br>1|BC Hydrohahhaahahhahahah|BC</blockquote>by fazlook at 2012-11-02 19:44:50<blockquote>#Variables<br>$inFile =&quot;c:\Validate Delimiter File\C1_Biller_Fix_File.txt&quot;<br>$inFileName = Get-ChildItem $inFile<br>$Firstline,$inFileContentRemaining = Get-Content $inFile<br>$PipesNeeded = 2<br>$RecordIndex = 1 # 1 because we dont need first line<br>$InvalidLines=0<br>$ArrayOfRecordsRemaining = @()<br>$countModifiedRecords = 0<br>$linesCount = $inFileContentRemaining.Length<br>$currentDate = Get-Date<br><br>Write-host &quot;*******************************************&quot;<br>Write-host &quot;******** C1 FILE INFORMATION START ********&quot;<br>Write-host &quot;*******************************************&quot;<br>write-host &quot;C1 Biller List Filename: &quot; $inFileName.name<br>Write-Host &quot;C1 File Contains : &quot; $linesCount &quot;Records&quot;<br><br>if($inFileName.CreationTime -gt $currentDate.AddDays(-14))<br><br> {<br> Write-Host &quot;File Date Status: Valid Date,the file was produced in the last 2 weeks&quot;<br> }<br> else<br> {<br> Write-Host &quot;File Date Status: Invalid Date,the file was produced more than 2 weeks ago&quot; <br> #exit 16001<br> }<br> <br><span style="color: #008040">foreach($Record in $inFileContentRemaining) <br>{<br>$a = ([char[]]$Record -match '\|').count<br> if ($a -ne $PipesNeeded)<br> {<br> Write-Host &quot;Pipe Error: Record&quot; $RecordIndex &quot;Does Not Contain&quot; $NumOfPipes &quot;The # Of Pipes (2) Requested&quot; <br> $InvalidLines++<br> }<br> $RecordIndex++<br>} <br><br>if ($InvalidLines -eq 0)<br>{<br>write-host &quot;The File Is Valid And Ready For Processing....&quot;<br>}<br>else<br>{<br>Write-Host &quot;Then The File Is Not Valid. It Contains:&quot; $InvalidLines &quot;Invalid Lines.Contact Someone.&quot;<br># exit 16002 <br>}</span><span style="color: #FF0000">foreach($line in $inFileContentRemaining)<br>{<br>$ar = $Line.Split('|')<br>$record = New-Object System.Object<br>$record | Add-Member -type NoteProperty -name VENDORNUM -value $ar[0]<br>$record | Add-Member -type NoteProperty -name VENDORNAME -Value $ar[1]<br>$record | Add-Member -type NoteProperty -name NEWVENDORNAME -value $ar[2]<br>$ArrayOfRecordsRemaining += $record <br>}</span><br>$a = @{Expression={$_.VENDORNUM};Label=&quot;Vendor Number&quot;;width=20},
@{Expression={$.VENDORNAME};Label="Vendor Name";width=60}, `
@{Expression={$
.NEWVENDORNAME};Label="New Vendor Name";width=40}
$ArrayOfRecordsRemaining | Format-table $a
by fazlook at 2012-11-02 19:52:27
I just posted the part of the code that’s working :slight_smile: so far, I thought it is better to work with the my code instead of examples.

The whole idea behind this is:

what I have: 2 files wchich both have same VendorName
1- Big file with 9 delimiter -
2- small file with 2 delimeter (current)

First: Need to remove all duplicate from 1 and 2 (part of validation) on Vendorname
second: Check property 1 is Numeric on both files an check dates (I have DONE it)
Third : check file number 2 for Null value (it should not have a null value, if there is one i will report it , so 3 objects on each line must NOT be Null)
fourth : read first file on VendorName property and if it is more than 60 char , we will replace it with the New Vendor Name from file 2 (less than 60 char)

So basically it 's Match and replace process and the most I report my output the better to display what’s happening
by fazlook at 2012-11-02 22:22:46
#Here I am trying to get the VendorNum of the duplicate array and loop trhough my txt file to delete the entire record but I am afraid that there are numbers somwhere else. Maybe I could Math Number + Delimiter (example 4| or 7| ) to delete the record. my script need a fix arround $_ -notmatch $dup + ‘|’
#Write-Host "Duplicates:"
$c = Compare-Object -ReferenceObject $FilteredArray -DifferenceObject $ArrayOfRecordsRemaining -PassThru
#$c
ForEach ($duplicate in $c )
{
$dup = $duplicate.VENDORNUM

$inFileContentRemaining | where-Object {$_ -notmatch $dup + ‘|’ } | out-file ‘c:\Validate Delimiter File\new_non_dup.txt’
}
by fazlook at 2012-11-08 11:44:36
any help on this ?
by DonJ at 2012-11-08 11:50:02
I’m having trouble catching up on the thread here - can you kind of summarize where you’re at now and what’s not working for you?
by fazlook at 2012-11-10 16:26:41
Ok well:
I have 2 txt files -
First txt file has records with 10 delimiter each (I want to work on the string of each record) ex : A|record1|C|GHH|hHHHHH|etc…
Second txt file has also records with 2 delimeter A|record2|NewRecord

Our concern here is Record1 from file 1 , record2 ( = record1 if record1 lengh is > 60 char) and newrecord from 2
After validating the files (date, number of delimeter etc…) I want to delete duplicates on Record1 from file1 and record2
from file2.

I already created the objects in my code , validation , the remainning part is how to deleting duplicate based on record1 from file1 and record2 from file2
After deleting all duplicates, I wanna do match and kill: If record1 lenght is > 60 char then we go to second file and we replace record1 (wchich in this case = record2 ) by newrecord (wchich less than 60 char)
by fazlook at 2012-11-15 05:28:14
Hi Guys,

Any help doing this would be appreciated.
by fazlook at 2012-11-19 07:20:51
Guys, any help would be appreciated. Whenever you have time of course.

Thank you so much
by Infradeploy at 2012-11-19 08:28:46
I didn’t watch the whole thread, and maybe it’s already said, but: I think it’s easier to create 2 new files based on your query, and replace the old files with the new ones