Remove entire rows from csv with same value under specific column heading.

Hi folks,
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I hope someone will be able to help me with this.</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I have a csv file called ‘original.csv’. Within this file there are around 1000 rows of data. The issue I have is that there are many duplicate rows in this file whereby some rows are exactly the same as other rows except one column value being different – this column value (Col2) has a different time value but all other values in each of the columns are the same. Please see the example below:</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”> Col1,Col2,Col3,Col4,Col5,Col6</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>1,11:56,3,4,Smith,6</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>1,11:57,3,4, Smith,6</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>1,11:58,3,4, Smith,6</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I’d like to keep the most recent value (11.58) and export this row and only this row to a separate csv file based on the value of Col5 (Smith). So, I need to have the most recent row for each surname (Smith). At the moment I have multiple rows with the same values whereby only the time value for Col2 differs. I only want one record per surname and this record has to be the most recent. </p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I have looked into input.csv, export csv, compare, sort and select but cannot get to the bottom of this. Any help would be greatly appreciated. My current csv file also has a header row at the top of the file.</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”> </p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Thanks in advance</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Tess</p>

#region Input

$CSV = '.\CSV1.csv'
@'
Col1,Col2,Col3,Col4,Col5,Col6
1,11:56,3,4,Smith,6
1,11:58,3,4, Smith,6
1,11:57,3,4, Smith,6
1,10:57,3,4, Tom,6
1,10:58,3,4, Tom,6
1,10:56,3,4,Tom,6
'@ | Out-File $CSV

#endregion

#region read and View CSV

$myData = Import-Csv $CSV 
$myData | FT -a 
<#
Col1 Col2  Col3 Col4 Col5  Col6
---- ----  ---- ---- ----  ----
1    11:56 3    4    Smith 6   
1    11:57 3    4    Smith 6   
1    11:58 3    4    Smith 6   
1    10:56 3    4    Tom   6   
1    10:57 3    4    Tom   6   
1    10:58 3    4    Tom   6    
#>

#endregion

#region Process

$SurNameList = $myData.Col5 | select -Unique
$SurNameList
<#
Smith
Tom
#>

$myOutput = foreach ($SurName in $SurNameList) {
    ($myData | where Col5 -EQ $SurName | sort Col2)[-1] 
    # 'Where' selects the rows belonging to the current SurName in the loop
    # 'sort' sorts the resulting rows by Col2
    # [-1] selects the last member of this array
}

#endregion

#region Output

$myOutput | FT -a 
<#
Col1 Col2  Col3 Col4 Col5  Col6
---- ----  ---- ---- ----  ----
1    11:58 3    4    Smith 6   
1    10:58 3    4    Tom   6   
#>

#endregion

less verbose…

$CSV = '.\CSV1.csv'
@'
Col1,Col2,Col3,Col4,Col5,Col6
1,11:56,3,4,Smith,6
1,11:58,3,4, Smith,6
1,11:57,3,4, Smith,6
1,10:57,3,4, Tom,6
1,10:58,3,4, Tom,6
1,10:56,3,4,Tom,6
'@ | Out-File $CSV



$myData = Import-Csv $CSV 

$myOutput = foreach ($SurName in ($myData.Col5 | select -Unique)) {
    ($myData | where Col5 -EQ $SurName | sort Col2)[-1] 
}

$myOutput | FT -a 

Many thanks Sam for your speedy and very helpful reply. I already have a formatted csv file and I was therefore wondering how I would export the two lines in your filtered example to a new csv file. I’m trying to get to grips with PowerShell and my apologies if this seems very obvious. Also what does the | FT -a do?
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Thanks in advance.</p>

FT -a is short for Format-Table -Autosize
You would export the output to CSV by piping it to Export-Csv

okay thanks, this is what I have at present:
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>I have changed the variable names to better suit their purpose:</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>[pre]</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>$CleanLog= “C:_logs\CleanAppLog.csv”</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>$myData = Import-Csv $CleanLog</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>$myOutput = foreach ($recordRow in ($myData.HostIP | select -Unique)) {
($myData | where HostIP -EQ $recordRow | sort UserIdentifier)[-1] | Export-CSV $CleanLog

‘Where’ selects the rows belonging to the current HostIP in the loop

‘sort’ sorts the resulting rows by UserIdentifier column heading

[-1] selects the last member of this array

}</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>[/pre]</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Please note that I have now changed Surname to HostIP and Col5 to UserIdentifier.</p>
<p style=“outline-color: transparent; outline-style: none; outline-width: 0px; -webkit-text-stroke-width: 0px; cursor: text; word-spacing: 0px;”>Thanks again.</p>
 

I know that I’m currently overwriting the source file with the updated file but it would be better to create a completely new file using export csv rather than overwriting the source. Thanks

Feel free to change the code suggestion you got here according to your needs. There is no copyright on it. :wink:

thanks, do you have any ideas how to solve this?

Of course. Take a look at Sams last example code … instead of piping the result to FT (Format-Table) you could pipe it to Export-CSV.

BTW: He already recommended that earlier on. :wink:

Thanks for the reply. I have tried piping it using export-csv to variable path $CleanLog (the code is located a few messages up) instead of a table but the output file does not contain the expected filtered/parsed data. This got me thinking that perhaps the export-csv was not the best/only way to go.

If export-csv is giving you row counts or row numbers instead of your data you might need to add -NoTypeInformation, you could also tee-object or out-file depending on your script and it’s purpose.