Comparing two multi-dimensional arrays

I’m trying to come up with the most efficient way to compare two arrays where there could be any number of columns in each array, but we are matching on a specified column from each array. I want to retain all data from each array. Some arrays could contain duplicates. I am finding that when I compare arrays with more than a few thousand lines, the compare can start to take a long time, over an hour in some cases.

Is my Compare Function the most efficient way to do this?

Here is my sample code:

Function RJ-CombinedCompare() {
[CmdletBinding()]
PARAM([Parameter(Mandatory=$True)]$List1,$L1Match,$List2,$L2Match)
$List = $List1 | %{[PSCustomObject]@{L1Data = $; L2Data = ‘NA’}}
$List += $List2 | %{[PSCustomObject]@{L1Data = ‘NA’; L2Data = $
}}
foreach ($Object in $List.L1Data.$L1Match + $List.L2Data.$L2Match | select -Unique) {
$Match1 = @()
$Match2 = @()
foreach ($Object1 in $List.L1Data -ne ‘NA’) {
if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
}
foreach ($Object2 in $List.L2Data -ne ‘NA’) {
if ($Object2.$L2Match -eq $Object) {$Match2 += $Object2}
}
[PSCustomObject]@{MatchValue = $Object; L1Matches = $Match1.count; L2Matches = $Match2.count; List1 = $Match1; List2 = $Match2}
}
}

$List1 = @(
[PSCustomObject]@{Alias = 1; Place = 1; Extra = ‘c’}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = ‘a’}
[PSCustomObject]@{Alias = 3; Place = 2; Extra = ‘c’}
[PSCustomObject]@{Alias = 4; Place = 1; Extra = ‘a’}
[PSCustomObject]@{Alias = 22; Place = 3; Extra = ‘g’}
[PSCustomObject]@{Alias = 2; Place = 3; Extra = ‘a’}
[PSCustomObject]@{Alias = 5; Place = 6; Extra = ‘e’}
[PSCustomObject]@{Alias = 4; Place = 2; Extra = ‘c’}
[PSCustomObject]@{Alias = 1; Place = 6; Extra = ‘b’}
)

$List2 = @(
[PSCustomObject]@{Name = 1; Place = 5; Somthing = ‘a1’}
[PSCustomObject]@{Name = 1; Place = 1; Somthing = ‘b6’}
[PSCustomObject]@{Name = 5; Place = 1; Somthing = ‘c3’}
[PSCustomObject]@{Name = 2; Place = 4; Somthing = ‘a3’}
[PSCustomObject]@{Name = 12; Place = 6; Somthing = ‘a1’}
[PSCustomObject]@{Name = 1; Place = 2; Somthing = ‘b1’}
[PSCustomObject]@{Name = 2; Place = 7; Somthing = ‘d4’}
[PSCustomObject]@{Name = 44; Place = 2; Somthing = ‘a5’}
)

$Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name

Have you looked at Compare-Object?

$List1 = @(
 [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
 [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
 [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
 )
 
$List2 = @(
 [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
 [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
 [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
 [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
 [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
 [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
 )

 Compare-Object -ReferenceObject $List1 -DifferenceObject $List2 -Property Place

Thanks for the response,

That works to some degree, but it doesn’t seem to maintain all the columns from each array. Can that be done with Compare-Object?

E.g:
Compare-Object -ReferenceObject $List1 -DifferenceObject $List2 -Property Place -IncludeEqual

Place SideIndicator


1 ==
1 ==
2 ==
6 ==
2 ==
5 =>
4 =>
7 =>
3 <=
3 <=
3 <=
6 <=

With my Function, each Array's content is preserved in the result:
RJ-CombinedCompare -List1 $List1 -L1Match Place -List2 $List2 -L2Match Place | ft

MatchValue L1Matches L2Matches List1 List2


1 2 2 {@{Alias=1; Place=1; … {@{Name=1; Place=1; S…
3 3 0 {@{Alias=2; Place=3; … {}
2 2 2 {@{Alias=3; Place=2; … {@{Name=1; Place=2; S…
6 2 1 {@{Alias=5; Place=6; … {@{Name=12; Place=6; …
5 0 1 {} {@{Name=1; Place=5; S…
4 0 1 {} {@{Name=2; Place=4; S…
7 0 1 {} {@{Name=2; Place=7; S…

Dan, your way absolutely ineffective, because you combine both arrays in beginning (for what reason? ) $List = $List1 | %{[PSCustomObject]@{L1Data = $; L2Data = ‘NA’}}
$List += $List2 | %{[PSCustomObject]@{L1Data = ‘NA’; L2Data = $
}}

and later never use it in combined form, but several times filter it for original arrays:
$Object in $List.L1Data.$L1Match +$List.L2Data.$L2Match
foreach ($Object1 in $List.L1Data -ne ‘NA’) {
if ($Object1.$L1Match -eq $Object) {$Match1 += $Object1}
and so on.

The code below 3 times more effective even on your sample. and should be more effective on large arrays. Btw, tell me how much…

#Require -Version 4.0
Function RJ-CombinedCompare() {
 [CmdletBinding()]
 PARAM(
	#Every parameter must be mandatory
	[Parameter(Mandatory=$True)]$List1,
	[Parameter(Mandatory=$True)]$L1Match,
	[Parameter(Mandatory=$True)]$List2,
	[Parameter(Mandatory=$True)]$L2Match)
 #Fill HASH with arrays of data from both arrays, hash keys is value to compare
 $hash = @{}
 foreach ($data in $List1) {
    $hash[$data.$L1Match] += ,$data
 }
 foreach ($data in $List2) {
    $hash[$data.$L2Match] += ,$data
 }
 # filter every hash value by existance of $L1Match field in data.
 # {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
 # or may be $null
 foreach ($kv in $hash.GetEnumerator()) {
	$m1, $m2 = $kv.Value.where( {$_.$L1Match}, 'Split')
    [PSCustomObject]@{
		MatchValue = $kv.Key
		L1Matches = $m1.Count
		L2Matches = $m2.Count
		List1 = $m1
		List2 = $m2
	}
 }
}

$List1 = @(
 [PSCustomObject]@{Alias = 1; Place = 1; Extra = 'c'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 3; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 4; Place = 1; Extra = 'a'}
 [PSCustomObject]@{Alias = 22; Place = 3; Extra = 'g'}
 [PSCustomObject]@{Alias = 2; Place = 3; Extra = 'a'}
 [PSCustomObject]@{Alias = 5; Place = 6; Extra = 'e'}
 [PSCustomObject]@{Alias = 4; Place = 2; Extra = 'c'}
 [PSCustomObject]@{Alias = 1; Place = 6; Extra = 'b'}
 )

$List2 = @(
 [PSCustomObject]@{Name = 1; Place = 5; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 1; Somthing = 'b6'}
 [PSCustomObject]@{Name = 5; Place = 1; Somthing = 'c3'}
 [PSCustomObject]@{Name = 2; Place = 4; Somthing = 'a3'}
 [PSCustomObject]@{Name = 12; Place = 6; Somthing = 'a1'}
 [PSCustomObject]@{Name = 1; Place = 2; Somthing = 'b1'}
 [PSCustomObject]@{Name = 2; Place = 7; Somthing = 'd4'}
 [PSCustomObject]@{Name = 44; Place = 2; Somthing = 'a5'}
 )

RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name
#Speed measurement
measure-command { 1..10000 | %{ $Result = RJ-CombinedCompare -List1 $List1 -L1Match Alias -List2 $List2 -L2Match Name } }

should warn You that in my code List1 and List2 in result objects is not arrays but collections. But may be You do not even notice the difference :slight_smile:

Hi Max,

Thanks so much for your assistance. In my tests, your Function is extremely fast. The only issue I am having is when I run the compare where I am Matching on columns that have the same name, in the result its combining the result into list 1. I’m trying to see how I can modify your Function to handle that case.

E.g.:

RJ-CombinedCompare -List1 $List1 -L1Match Place -List2 $List2 -L2Match Place | ft

                         MatchValue                           L1Matches                           L2Matches List1                               List2                             
                         ----------                           ---------                           --------- -----                               -----                             
                                  7                                   1                                   0 {@{Name=2; Place=7; Somthing=d4}}   {}                                
                                  6                                   3                                   0 {@{Alias=5; Place=6; Extra=e}, @... {}                                
                                  5                                   1                                   0 {@{Name=1; Place=5; Somthing=a1}}   {}                                
                                  4                                   1                                   0 {@{Name=2; Place=4; Somthing=a3}}   {}                                
                                  3                                   3                                   0 {@{Alias=2; Place=3; Extra=a}, @... {}                                
                                  2                                   4                                   0 {@{Alias=3; Place=2; Extra=c}, @... {}                                
                                  1                                   4                                   0 {@{Alias=1; Place=1; Extra=c}, @... {}                    

You can save data owner as

$hash[$data.$L1Match] += ,[pscustomobject]@{Owner=‘l1’;Value=$data }
and for List2 too

use it here
$m1, $m2 = $kv.Value.where( {$_.Owner -eq ‘l1’}, ‘Split’)

but then need to modify result assinging:
List1 = $m1.Value

I do not test it, correct any erors if i make it :slight_smile:

for owner better use [bool] or [int] identifier, not a [string], I use it only for information purposes

Thanks for all you help Max.

Here’s the final Function. It’s a lot more than 3 times faster, my large compare went from 45min to 5 seconds!

Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM(
        #Every parameter must be mandatory
	    [Parameter(Mandatory=$True)]$List1,
	    [Parameter(Mandatory=$True)]$L1Match,
	    [Parameter(Mandatory=$True)]$List2,
	    [Parameter(Mandatory=$True)]$L2Match
    )
    #Fill HASH with arrays of data from both arrays, hash keys is value to compare
    $hash = @{}
    foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='l1';Value=$($data)}}
    foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='l2';Value=$($data)}}
    # filter every hash value by existance of $L1Match field in data.
    # {$_.$L1Match} - subject to change if $L1Match property exists in both $List1 and $List2
    # or may be $null
    foreach ($kv in $hash.GetEnumerator()) {
        $m1, $m2 = $kv.Value.where({$_.Owner -eq 'l1'}, 'Split')
        [PSCustomObject]@{
            MatchValue = $kv.Key
		    L1Matches = $m1.Count
		    L2Matches = $m2.Count
            L1MatchObject = $L1Match
            L2MatchObject = $L2Match
            List1 = $m1.Value
            List2 = $m2.Value
        }
    }
}

Congrats! :slight_smile:

but change everywhere ‘l1’ and ‘l2’ to 1 and 2, it save you some more execution time and memory eliminating string allocation.

It’s already memory hungry.

Thank you Sir! Its done:

Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM(
	    [Parameter(Mandatory=$True)]$List1,
	    [Parameter(Mandatory=$True)]$L1Match,
	    [Parameter(Mandatory=$True)]$List2,
	    [Parameter(Mandatory=$True)]$L2Match
    )
    $hash = @{}
    foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner='1';Value=$($data)}}
    foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner='2';Value=$($data)}}
    foreach ($kv in $hash.GetEnumerator()) {
        $m1, $m2 = $kv.Value.where({$_.Owner -eq '1'}, 'Split')
        [PSCustomObject]@{
            MatchValue = $kv.Key
            L1Matches = $m1.Count
            L2Matches = $m2.Count
            L1MatchObject = $L1Match
            L2MatchObject = $L2Match
            List1 = $m1.Value
            List2 = $m2.Value
        }
    }
}

Hello there!

A short while back, I borrowed some code from Lucio Silveira (MSFT) and Dave Wyatt (superstar) and duct taped together a modified Join-Object.

That link walks through using it in a number of scenarios. It has quickly becomer one of the ‘most-used’ functions in my PowerShell toolbox.

Cheers!

Dan, be more watchful :slight_smile:

you sould replace Owner=‘1’; to Owner=1; Without quotes! to convert owner field from [string] to [int]
and of course $.Owner -eq ‘1’ to $.Owner -eq 1

and if you gonna more faster use arraylist, instead of array as in Join-Object.
that modification get more time, but lit learn you well, if you need :slight_smile:
because each = ,obj assignment recreates new array

The code works perfectly on v5. When running it on v3 I get the following exception:

Method invocation failed because [System.Management.Automation.PSCustomObject] doesn't contain a method named 'where'.
+         $m1, $m2 = $kv.Value.where({$_.Owner -eq 1}, 'Split')

How can I re-write the line

$m1, $m2 = $kv.Value.where({$_.List -eq 1}, 'Split')

to run it on v3 without changing the functionality? (Installing v5 ist not possible.)