Picking out a record in a DataTable

Hello, can someone please help me with the following (I had a similar thread but I started a new one for clarify)

I want to find an alternative to using Where-Object in a pipeline. I have two large CSV files (where some of the fields are the same and some are different). I have a forach loop at the moment and where object but it is too slow.

I found the following function which takes a CSV and converts it to a DataTable

e.g.

$D = Out-DateTable -inputObject ( Import-csv c:\data\abc.csv )

if I then look at get-member for $D

$D | get-member

Name MemberType Definition


AcceptChanges Method void AcceptChanges()
BeginEdit Method void BeginEdit()
CancelEdit Method void CancelEdit()
ClearErrors Method void ClearErrors()
Delete Method void Delete()
EndEdit Method void EndEdit()
Equals Method bool Equals(System.Object obj)
GetChildRows Method System.Data.DataRow GetChildRows(string relationName), System.Data.DataRow GetChildRows(string…
GetColumnError Method string GetColumnError(int columnIndex), string GetColumnError(string columnName), string GetColumn…
GetColumnsInError Method System.Data.DataColumn GetColumnsInError()
GetHashCode Method int GetHashCode()
GetParentRow Method System.Data.DataRow GetParentRow(string relationName), System.Data.DataRow GetParentRow(string rel…
GetParentRows Method System.Data.DataRow GetParentRows(string relationName), System.Data.DataRow GetParentRows(stri…
GetType Method type GetType()
HasVersion Method bool HasVersion(System.Data.DataRowVersion version)
IsNull Method bool IsNull(int columnIndex), bool IsNull(string columnName), bool IsNull(System.Data.DataColumn c…
RejectChanges Method void RejectChanges()
SetAdded Method void SetAdded()
SetColumnError Method void SetColumnError(int columnIndex, string error), void SetColumnError(string columnName, string …
SetModified Method void SetModified()
SetParentRow Method void SetParentRow(System.Data.DataRow parentRow), void SetParentRow(System.Data.DataRow parentRow,…
ToString Method string ToString()
Item ParameterizedProperty System.Object Item(int columnIndex) {get;set;}, System.Object Item(string columnName) {get;set;}, …
AccountExpires Property string AccountExpires {get;set;}
Description Property string Description {get;set;}
DisplayName Property string DisplayName {get;set;}
DistinguishedName Property string DistinguishedName {get;set;}
Domain Property string Domain {get;set;}
Enabled Property string Enabled {get;set;}
LastLogonDate Property string LastLogonDate {get;set;}
PasswordLastSet Property string PasswordLastSet {get;set;}
PasswordNeverExpires Property string PasswordNeverExpires {get;set;}
PasswordNotRequired Property string PasswordNotRequired {get;set;}
sAMAccountName Property string sAMAccountName {get;set;}
UserAccountControl Property string UserAccountControl {get;set;}
WhenChanged Property string WhenChanged {get;set;}
WhenCreated Property string WhenCreated {get;set;}

The properties at the end are the original headings from my CSV files e.g. AccountExpires etc. (see below)

I want to find a record ‘directly’ for example where the property sAMAccountName is equal to Fred

I could do this $D | where-object {$_.sAMAccountName -eq ‘Fred’}

however this is not direct and therefore slow when you have to do over and over for each record, I want to find the record by ‘indexing’ as it where to go straight to the record I want

I thought the GetChildRows method may help but cannot figure it out

Can someone please assist, thanks very much in advance :slight_smile:

If it’s the same issue that doesn’t make any sense. Didn’t you want to compare two CSV files?

What is it actually what you want to do? What’s the “end goal”?

1 Like

Hello

Thanks for the reply much appreciated

At the moment my script takes days to run, I am trying to speed it up.

I basically have two CSVs of data so I import them both and save as a variable

$A = import-csv C:\Data\file1.csv
$B = import-csv C:\Data\file2.csv

foreach ($item in $A) {

$B | where {$_.sAMAccountName -eq $item.UniqueID}
etc…
}

The above is slow, I want to remove the ‘where’ and find the item in $B directly where the sAMAccountName matches a string (e.g. $item.UniqueID) without going through all the records each time in the loop

The kind of thing you can do with a hash table, but I believe you can do with a DateTable so I imported the CSV to a DataTabe using the function I found

Thanks very much
Charlie

You can make a hash table then your lookups would be quicker. There are several ways to accomplish this, one is using Group-Object

$A = Import-Csv C:\Data\file1.csv
$B = Import-Csv C:\Data\file2.csv | Group-Object -Property SamAccountName -AsHashTable

foreach($item in $A){
    $B[$item.uniqueID]
}

Did you try the code suggestion I gave in the other thread?

Thanks very much for your replies, I like to idea of Group-Object I tried this but unfortunately the data takes too long to load (just sits there), there are about 63,000 rows.

I guess I could split the file up into smaller files first, but then I will have to add the CSV header back to each file, not the end of the world I guess

Thanks
Charlie

Hmm are you running this on a 1 cpu/2GB ram virtual machine? 63k is a drop in the bucket.

Let us set up a ~63K line CSV with 3 properties.

$date = Get-Date

$tempfile = New-TemporaryFile

1..62KB | % {
    [PSCustomObject]@{
        String = "$([char]$_)" * 10
        Number = $_ * 1KB
        Date   = $date.AddDays((get-random (1..100)))
    }
} | Export-Csv $tempfile -NoTypeInformation

Check the count

Import-Csv $tempfile | Measure-Object


Count    : 63488
Average  : 
Sum      : 
Maximum  : 
Minimum  : 
Property : 

Now let’s create that hashtable with Group-Object

Measure-Command -Expression {
    $grouped = Import-Csv $tempfile | Group-Object String -AsHashTable
}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 358
Ticks             : 13580720
TotalDays         : 1.57184259259259E-05
TotalHours        : 0.000377242222222222
TotalMinutes      : 0.0226345333333333
TotalSeconds      : 1.358072
TotalMilliseconds : 1358.072

1.35 seconds, not bad at all. Perhaps the issue isn’t with the size of the CSV?

1 Like

I was thinking almost the same. At least it’s not just the amount of lines I suspect.

It’s one file with about 63000 rows and one with about 21000, right? How many columns do the files have?

OK, you tried one suggested approach. Why don’t you try the other suggested approach as well? It’s the only way to know if it fits your requirement!! :wink:

Hello everyone, thank you very much for your help so far, much appreciated

I have placed a copy of the script and two sample CSV files are the following location on my onedrive

https://1drv.ms/u/s!AqL5zUwOWToZgdg8Kd0p0JwDxQ9RIA?e=PBl0X4

If you try it with the files (see parameters section at the top of the script), you will see how slow it is, and I have many more records.

I would be extremely grateful if someone can take a look and hopefully tweak to make it run at normal speed (if you see what I mean).

Thanks very much
Charlie

$Start = Get-Date
$ADData = Import-Csv -Path D:\sample\ADData.csv
$Data = Import-Csv -Path D:\sample\Data.csv | Select-Object -Property *,@{Name = 'sAMAccountName'; Expression = {$_.UniqueID}}

$Result = Compare-Object -ReferenceObject $Data -DifferenceObject $ADData -Property sAMAccountName -IncludeEqual -PassThru
$End = Get-Date
New-TimeSpan -Start $Start -End $End

And the console output is :

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 474
Ticks             : 14741538
TotalDays         : 1,70619652777778E-05
TotalHours        : 0,000409487166666667
TotalMinutes      : 0,02456923
TotalSeconds      : 1,4741538
TotalMilliseconds : 1474,1538

BTW: That’s the solution I recommended 3 days ago. :wink:

1 Like

It’s probably the rest of your script that’s slow, because neither the Compare-Object nor the Group-Object portion is.

With Write-Host (for still seeing output when measuring expression)

Measure-Command -Expression {

    $A = Import-Csv .\Data.csv
    $grouped = Import-Csv .\ADData.csv | Group-Object SamAccountName -AsHashTable

    foreach($item in $A){
        if($found = $grouped[$item.uniqueid]){
            Write-Host UniqueID $item.UniqueID found: $found -ForegroundColor Green
        }
        else{
            Write-Host UniqueID $item.UniqueID not found! -ForegroundColor Yellow
        }
    }
}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 13
Milliseconds      : 171
Ticks             : 131719351
TotalDays         : 0.000152452952546296
TotalHours        : 0.00365887086111111
TotalMinutes      : 0.219532251666667
TotalSeconds      : 13.1719351
TotalMilliseconds : 13171.9351

and without

Measure-Command -Expression {

    $A = Import-Csv .\Data.csv
    $grouped = Import-Csv .\ADData.csv | Group-Object SamAccountName -AsHashTable

    foreach($item in $A){
        if($found = $grouped[$item.uniqueid]){
            "UniqueID $($item.UniqueID) found: $found"
        }
        else{
            "UniqueID $($item.UniqueID) not found!"
        }
    }
}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 936
Ticks             : 9360056
TotalDays         : 1.08333981481481E-05
TotalHours        : 0.000260001555555556
TotalMinutes      : 0.0156000933333333
TotalSeconds      : 0.9360056
TotalMilliseconds : 936.0056

Less than a second without the output.

1 Like

Thank you so much Krzydoug and everyone else :slight_smile:

It turns out running the code on my modern HP i7 works power laptop takes 11 seconds just to import the first CSV, then to group the second one still running after 5 minutes

However, taking my clunkey slow laptop (which is several years old). I get the same timings as you above. I have not checked the PowerShell versions, but I think it may be some thing wiered like the AV on my works laptop.

Any way point is I know where the issue is now, and thanks for all the tips like Grouping as this will help speed up and simplify the script too :slight_smile: :slight_smile: :slight_smile:

Charie

@cxmelga, I would like to make a somewhat off-topic suggestion. I’m a rank noob and out of my d̶e̶p̶t̶h̶ mind trying to follow this discussion. But I’m trying and learning, and in the next few years maybe someone else will also try. Do you intend to keep these sample files available forever? Maybe you could edit your post and put a tiny sample of the data in it. (If the post gets removed, no prob, but if the post refers to a dead link, somebody might get sad. Or maybe “I’m straining at gnats”….

1 Like