Check for duplicate value in CSV file

david-schmidtberger · June 11, 2015, 12:00am

I am looking for a quick way to check if a value is duplicated in the csv after i import it.

the script i’m workin with is related with disabling active directory accounts and we recieve data in a similar format

status,networkid
update,e11111
terminate,e12345
update,e12345

as the script processes the information, i need to check if the network id is duplicated anywhere else in the file (essentially do not process the terminate if it exists anywhere else in the column)

(i do recieve a large amount of other information in the file, but really am only concerned with the networkid and the status)

i’m sure this is doable, just can’t get my mind wrapped around it

will-anderson · June 11, 2015, 12:54am

Hey there David,

Don Jones actually posted an article on this on Hey! Scripting Guy some time back. It’s a good read. Long and short, here’s an example:

$CSV = Import-Csv C:\scripts\Network.csv

$CSV.networkid | Group-Object | Where-Object {$_.Count -gt 1}

Give that a shot and see if it works for you. Here’s the article if you’d like to read it:
http://blogs.technet.com/b/heyscriptingguy/archive/2008/01/31/how-can-i-use-windows-powershell-to-retrieve-the-non-unique-items-in-a-list.aspx

david-schmidtberger · June 11, 2015, 2:14am

well that definately tells me if objects are duplicated, what i really need is to be able to check that value as it processes hundreds of records. i have managed to get something cobbled together that tells me if the id is duplicated. but not really efficient to use as a check to see if to proceed on each line

$path         = Split-Path -parent $MyInvocation.MyCommand.Definition
$input  = $path + "\*.csv"
$csv = Import-Csv $input

$duplicate = $CSV.networkid | Group-Object | Where-Object {$_.Count -gt 1}

foreach ($user in $csv)
{
if ($user.status -like "*TERMINATION*")
{
	foreach ($dupe in $duplicate)
	{

	if ($dupe.NAME -like $user.networkid)

	{
	Write-Host "$($user.networkid) set for termination with 2 records"
	}
	else
	{
	}
	}
	
}
else
{
}
}

dan-potter · June 11, 2015, 2:49am

$csv |group networkid |select name,count | ? {$_.count -gt 1}

dan-potter · June 11, 2015, 3:00am

or this?

$csv |group networkid | ? {$_.count -gt 1} |select -ExpandProperty group

rob-simmers · June 11, 2015, 3:39am

In this scenario you have above, my assumption is you would NOT terminate the user because there was an update, so take a look at something like this:

#Get your CSV and remove any exact duplicates
$csv = Import-CSV C:\Temp\test.csv | Select * -Unique

#Create a collection of all update1s
$terminations = $csv | Where{$_.Status -eq "terminate"}
#Create a collection of updates
$updates = $csv | Where{$_.Status -ne "terminate"}

#Process your updates
foreach ($update in $updates){
    "Processing {0} for {1}" -f $update.Status, $update.NetworkID
    # - CODE FOR UPDATE - #
    # Check if there are any other updates for the same network ID
    $termCheck = $terminations | Where{$_.NetworkID -eq $update.NetworkID -and $_.Status -eq "terminate"}
    if ($termCheck) {
        #There is a termination record, so..
        "Found {0} terminate request employee {1}, removing from terminations" -f @($termCheck).Count, $update.networkid
        #Update terminations collection and remove the network ID with a terminate status
        $terminations = $terminations | Where{$_.NetworkID -ne $update.NetworkID -and $_.Status -eq "terminate"}
    }
}

#Now process terminations...
if ($terminations) {
    foreach ($term in $termination) {
        "Processing {0} for {1}" -f $term.Status, $term.NetworkID
        # - CODE FOR TERMS - #
    }
}
else {
    "No terminations to process"
}

Output:

Processing update for e11111
Processing update for e12345
Found 1 terminate request employee e12345, removing from terminations
No terminations to process

david-schmidtberger · June 11, 2015, 3:49am

Thanks rob, that is similar to how i was thinking i might have to do it.

what i was hoping for was a relatively simple true/false check on each id being duplicated as the existing script processes. (written by someone else years ago)
due to changes in back-end data we have to perform this new check.

rob-simmers · June 11, 2015, 3:58am

You could simplify it even more if you didn’t want to log that someone was remove from the term pool, but I would assume it would be logged somewhere. Update the post as resolved if this works for you.

dan-potter · June 11, 2015, 4:35am

Then not duplicated and should be terminated.

$csv |group networkid | ? {($.count -eq 1) -and (($.group).status -eq ‘terminate’)}

Topic		Replies	Views
Find duplicate entries in CSV file PowerShell Help	8	202	February 3, 2017
Dropping Rows from an Import-Csv PowerShell Help	6	458	April 19, 2015
Check if value exists in CSV PowerShell Help	4	11048	March 4, 2021
Duplicate names in csv PowerShell Help	8	164	August 12, 2020
Removing all instances of duplicates PowerShell Help	4	216	October 31, 2019

Check for duplicate value in CSV file

Related Topics