Comparing 2 CSV files

Hello,

I have 2 CSV files, let’s call them old.csv and new.csv - it’s an automated file spat out by our HRIS every hour or so and has about 1000 rows and about 20 columns, and I’m comparing the new one with the one that’s an hour old.

I need to create a third file called changes.csv which lists the entire row for any row where any of the fields in that row has a different value in the new.csv than in the old.csv, and also any new.csv row that isn’t in the old.csv

Both old.csv and new.csv have a header row, and there’s a unique field in the EmployeeID column.

What I’ve written so far doesn’t really bear posting here, as it’s tortuous and clunky, and I haven’t tried running it yet.

As with many scriplets I’m sure someone somewhere has had the same need I do and thrown something together that works and then gets streamlined and then gets simplified until it’s a work of art, haha. So I was wondering if anyone here has something approaching a one-liner I can incorporate into our custom reporting scripts that creates the changes.csv I’m looking for.

If not, any pointers/suggestions/gotchas would be most welcome.

Thanks!

So we have to start from scratch to create something for your, right? You could at least post a few sanitized but still representative rows of the source data you’re dealing with.

Did you try to search for it? I’m pretty sure you could have find something adaptable in StackOverflow or even here in this forum or in the PowershellGallery.

Have you tried the cmdlet Compare-Object?

Compare-Object was the way to go, thanks. I was comparing each entry in the hashtables in two foreach loops. Longwinded and tortuous, as I said.

grefeeman please walk through this link , might help you https://powershell.org/forums/topic/using-ps-to-compare-a-list-of-user-in-two-excel-workbook-of-different-months/

AMGPoker adalah situs Judi Online terpercaya se-Asia
Kami juga menyediakan bonus untuk anda seperti :

I’ve actually had similar requirements in the past trying to create reports that showed changes (e.g. deltas) from one HR sync to another. The goal was to get who and what the current and old value was. Compare-Object tells us something is different, but not exactly what changed, only the new value. Be curious if any other experts know a better way to get this information:

function Get-ObjectDelta {
    [CmdletBinding()]
    param (
        [psobject]$ReferenceObject,
        [psobject]$DifferenceObject,
        [string]$Key,
        [string]$KeyDesc,
        [switch]$ExcludeNewRecord
    )
    begin {
        $exclude = @()
    }
    process {
        $result = foreach ($prop in $ReferenceObject[0].PSObject.Properties.Name) {

            if ($prop -eq $Key) {
                $cmp = Compare-Object -ReferenceObject $ReferenceObject -DifferenceObject $DifferenceObject -Property $prop -PassThru | 
                       Select @{Name='Key';Expression={$_ | Select -ExpandProperty $Key}}, 
                              @{Name='KeyDesc';Expression={$_ | Select -ExpandProperty $KeyDesc}}, 
                              @{Name='Property';Expression={$prop}},
                              @{Name='NewValue';Expression={'New Record'}}, 
                              @{Name='OldValue';Expression={'New Record'}}

                $cmp

                $exclude += $cmp.Key

            }
            else {
                $cmp = Compare-Object -ReferenceObject $ReferenceObject -DifferenceObject $DifferenceObject -Property $prop -PassThru | 
                       Select @{Name='Key';Expression={$_ | Select -ExpandProperty $Key}},
                              @{Name='KeyDesc';Expression={$_ | Select -ExpandProperty $KeyDesc}},
                              @{Name='Property';Expression={$prop}},
                              @{Name='NewValue';Expression={$_ | Select -ExpandProperty $prop}}, 
                              @{Name='OldValue';Expression={$currentKey=$_.$Key;$old | Where{$_.$Key -eq $currentKey} | Select -ExpandProperty $prop}}
                
                if ($ExcludeNewRecord) {
                    $cmp | Where{$exclude -notcontains $_.Key}
                }
                else {
                    $cmp
                }
            }
        }
    }
    end {
        $result
    }
}


$old = @"
EmployeeId,Name,Department,Title,CostCenter
434242,Joe Smith,Marketing,Analyst I,50000003421
434245,Julie Johnson,Legal,Director,50000004444
434248,Terry Tripp,Security,Manager,50000001234
"@ | ConvertFrom-CSV


$new = @"
EmployeeId,Name,Department,Title,CostCenter
434242,Joe Smith,Marketing,Analyst I,50000003421
434245,Julie Johnson,Information,Director,50000004141
434248,Terry Tripp,Security,Manager,50000001234
434260,Sal Goodman,Legal,Analyst II,50000004444
"@ | ConvertFrom-CSV

$params = @{
    ReferenceObject  = $old 
    DifferenceObject = $new 
    Key              = 'EmployeeId'
    KeyDesc          = 'Name'
    ExcludeNewRecord = $true
}

Get-ObjectDelta @params

Output:

Key      : 434260
KeyDesc  : Sal Goodman
Property : EmployeeId
NewValue : New Record
OldValue : New Record

Key      : 434245
KeyDesc  : Julie Johnson
Property : Department
NewValue : Information
OldValue : Legal

Key      : 434245
KeyDesc  : Julie Johnson
Property : CostCenter
NewValue : 50000004141
OldValue : 50000004444

I would create PowerBi reports from this data, which would also include a Date to see what happened during each sync. The function could be made more efficient by possibly running a Compare-Object to get only changed rows and then process the properties, but curious how long this would take against 1000 rows * 20 properties. Another option is to use SQL as well:

https://codingsight.com/different-ways-to-compare-sql-server-tables-schema-and-data/