Check for duplicate value in CSV file

I am looking for a quick way to check if a value is duplicated in the csv after i import it.

the script i’m workin with is related with disabling active directory accounts and we recieve data in a similar format

status,networkid
update,e11111
terminate,e12345
update,e12345

as the script processes the information, i need to check if the network id is duplicated anywhere else in the file (essentially do not process the terminate if it exists anywhere else in the column)

(i do recieve a large amount of other information in the file, but really am only concerned with the networkid and the status)

i’m sure this is doable, just can’t get my mind wrapped around it

Hey there David,

Don Jones actually posted an article on this on Hey! Scripting Guy some time back. It’s a good read. Long and short, here’s an example:

$CSV = Import-Csv C:\scripts\Network.csv

$CSV.networkid | Group-Object | Where-Object {$_.Count -gt 1}

Give that a shot and see if it works for you. Here’s the article if you’d like to read it:
http://blogs.technet.com/b/heyscriptingguy/archive/2008/01/31/how-can-i-use-windows-powershell-to-retrieve-the-non-unique-items-in-a-list.aspx

well that definately tells me if objects are duplicated, what i really need is to be able to check that value as it processes hundreds of records. i have managed to get something cobbled together that tells me if the id is duplicated. but not really efficient to use as a check to see if to proceed on each line

$path         = Split-Path -parent $MyInvocation.MyCommand.Definition
$input  = $path + "\*.csv"
$csv = Import-Csv $input

$duplicate = $CSV.networkid | Group-Object | Where-Object {$_.Count -gt 1}

foreach ($user in $csv)
{
if ($user.status -like "*TERMINATION*")
{
	foreach ($dupe in $duplicate)
	{

	if ($dupe.NAME -like $user.networkid)

	{
	Write-Host "$($user.networkid) set for termination with 2 records"
	}
	else
	{
	}
	}
	
}
else
{
}
}

$csv |group networkid |select name,count | ? {$_.count -gt 1}

or this?

$csv |group networkid | ? {$_.count -gt 1} |select -ExpandProperty group

In this scenario you have above, my assumption is you would NOT terminate the user because there was an update, so take a look at something like this:

#Get your CSV and remove any exact duplicates
$csv = Import-CSV C:\Temp\test.csv | Select * -Unique

#Create a collection of all update1s
$terminations = $csv | Where{$_.Status -eq "terminate"}
#Create a collection of updates
$updates = $csv | Where{$_.Status -ne "terminate"}

#Process your updates
foreach ($update in $updates){
    "Processing {0} for {1}" -f $update.Status, $update.NetworkID
    # - CODE FOR UPDATE - #
    # Check if there are any other updates for the same network ID
    $termCheck = $terminations | Where{$_.NetworkID -eq $update.NetworkID -and $_.Status -eq "terminate"}
    if ($termCheck) {
        #There is a termination record, so..
        "Found {0} terminate request employee {1}, removing from terminations" -f @($termCheck).Count, $update.networkid
        #Update terminations collection and remove the network ID with a terminate status
        $terminations = $terminations | Where{$_.NetworkID -ne $update.NetworkID -and $_.Status -eq "terminate"}
    }
}

#Now process terminations...
if ($terminations) {
    foreach ($term in $termination) {
        "Processing {0} for {1}" -f $term.Status, $term.NetworkID
        # - CODE FOR TERMS - #
    }
}
else {
    "No terminations to process"
}

Output:

Processing update for e11111
Processing update for e12345
Found 1 terminate request employee e12345, removing from terminations
No terminations to process

Thanks rob, that is similar to how i was thinking i might have to do it.

what i was hoping for was a relatively simple true/false check on each id being duplicated as the existing script processes. (written by someone else years ago)
due to changes in back-end data we have to perform this new check.

You could simplify it even more if you didn’t want to log that someone was remove from the term pool, but I would assume it would be logged somewhere. Update the post as resolved if this works for you.

Then not duplicated and should be terminated.

$csv |group networkid | ? {($.count -eq 1) -and (($.group).status -eq ‘terminate’)}