I’m trying to come up with the most efficient way to compare two arrays where there could be any number of columns in each array, but we are matching on a specified column from each array. I want to retain all data from each array. Some arrays could contain duplicates. I am finding that when I compare arrays with more than a few thousand lines, the compare can start to take a long time, over an hour in some cases.
Is my Compare Function the most efficient way to do this?
$List1 = @(
[PSCustomObject]@{Alias = 1; Place = 1; Extra = ‘c’}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = ‘a’}
[PSCustomObject]@{Alias = 3; Place = 2; Extra = ‘c’}
[PSCustomObject]@{Alias = 4; Place = 1; Extra = ‘a’}
[PSCustomObject]@{Alias = 22; Place = 3; Extra = ‘g’}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = ‘a’}
[PSCustomObject]@{Alias = 5; Place = 6; Extra = ‘e’}
[PSCustomObject]@{Alias = 4; Place = 2; Extra = ‘c’}
[PSCustomObject]@{Alias = 1; Place = 6; Extra = ‘b’}
)
$List2 = @(
[PSCustomObject]@{Name = 1; Place = 5; Somthing = ‘a1’}
[PSCustomObject]@{Name = 1; Place = 1; Somthing = ‘b6’}
[PSCustomObject]@{Name = 5; Place = 1; Somthing = ‘c3’}
[PSCustomObject]@{Name = 2; Place = 4; Somthing = ‘a3’}
[PSCustomObject]@{Name = 12; Place = 6; Somthing = ‘a1’}
[PSCustomObject]@{Name = 1; Place = 2; Somthing = ‘b1’}
[PSCustomObject]@{Name = 2; Place = 7; Somthing = ‘d4’}
[PSCustomObject]@{Name = 44; Place = 2; Somthing = ‘a5’}
)
$Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name
Dan, your way absolutely ineffective, because you combine both arrays in beginning (for what reason? ) $List = $List1 | %{[PSCustomObject]@{L1Data = $; L2Data = ‘NA’}}
$List += $List2 | %{[PSCustomObject]@{L1Data = ‘NA’; L2Data = $}}
and later never use it in combined form, but several times filter it for original arrays: $Object in $List.L1Data.$L1Match +$List.L2Data.$L2Match foreach ($Object1 in $List.L1Data -ne ‘NA’) { if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
and so on.
The code below 3 times more effective even on your sample. and should be more effective on large arrays. Btw, tell me how much…
#Require -Version 4.0
Function RJ-CombinedCompare() {
[CmdletBinding()]
PARAM(
#Every parameter must be mandatory
[Parameter(Mandatory=$True)]$List1,
[Parameter(Mandatory=$True)]$L1Match,
[Parameter(Mandatory=$True)]$List2,
[Parameter(Mandatory=$True)]$L2Match)
#Fill HASH with arrays of data from both arrays, hash keys is value to compare
$hash = @{}
foreach ($data in $List1) {
$hash[$data.$L1Match] += ,$data
}
foreach ($data in $List2) {
$hash[$data.$L2Match] += ,$data
}
# filter every hash value by existance of $L1Match field in data.
# {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
# or may be $null
foreach ($kv in $hash.GetEnumerator()) {
$m1, $m2 = $kv.Value.where( {$_.$L1Match}, 'Split')
[PSCustomObject]@{
MatchValue = $kv.Key
L1Matches = $m1.Count
L2Matches = $m2.Count
List1 = $m1
List2 = $m2
}
}
}
$List1 = @(
[PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
[PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
[PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
[PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
[PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
[PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
[PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
)
$List2 = @(
[PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
[PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
[PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
[PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
[PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
[PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
[PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
[PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
)
RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name
#Speed measurement
measure-command { 1..10000 | %{ $Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name } }
Thanks so much for your assistance. In my tests, your Function is extremely fast. The only issue I am having is when I run the compare where I am Matching on columns that have the same name, in the result its combining the result into list 1. I’m trying to see how I can modify your Function to handle that case.
Here’s the final Function. It’s a lot more than 3 times faster, my large compare went from 45min to 5 seconds!
Function RJ-CombinedCompare() {
[CmdletBinding()]
PARAM(
#Every parameter must be mandatory
[Parameter(Mandatory=$True)]$List1,
[Parameter(Mandatory=$True)]$L1Match,
[Parameter(Mandatory=$True)]$List2,
[Parameter(Mandatory=$True)]$L2Match
)
#Fill HASH with arrays of data from both arrays, hash keys is value to compare
$hash = @{}
foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='l1';Value=$($data)}}
foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='l2';Value=$($data)}}
# filter every hash value by existance of $L1Match field in data.
# {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
# or may be $null
foreach ($kv in $hash.GetEnumerator()) {
$m1, $m2 = $kv.Value.where({$_.Owner -eq 'l1'}, 'Split')
[PSCustomObject]@{
MatchValue = $kv.Key
L1Matches = $m1.Count
L2Matches = $m2.Count
L1MatchObject = $L1Match
L2MatchObject = $L2Match
List1 = $m1.Value
List2 = $m2.Value
}
}
}
you sould replace Owner=‘1’; to Owner=1; Without quotes! to convert owner field from [string] to [int]
and of course $.Owner -eq ‘1’ to $.Owner -eq 1
and if you gonna more faster use arraylist, instead of array as in Join-Object.
that modification get more time, but lit learn you well, if you need
because each = ,obj assignment recreates new array