stuid lastname firstname gradelevel status entrydate
28372 ACEVEDO KEYLIANIS 0 E 8/31/2015
28166 CAHILL ALLANNA 0 N 8/31/2015
28166 CAHILL ALLANNA 0 E 9/24/2015
ive got data in a csv as above. alot more lines of course with a few more duplicates. im trying to drop the duplicates, but keep the one of the two with the most recent “entrydate”, and send it to a new csv along with the otheres that arent duplicates.
having a difficult time figuring out how to go about this.
Well… hmm. Tough thing here is that, if the dates are different, then they’re not actually duplicates as far as the computer is concerned. Do I have that correct?
my bad… the “stuid” would be determining that its a duplicate… and keeping the stuid(row) with the most recent “entrydate” is what im trying to accomplish
I might start by passing them to Group-Object, grouping on the “stuid” field. That’ll give you a group object for each student. I’d probably then pass them to a ForEach.
For each one whose group contained one item, I’d just output the item since there was no duplicate. For each one with more than one, sort them on entry date in descending order and output only the first object from the collection. That’ll give you the most recent.
That’s completely off the top of my head and will doubtless need some mangling to make it work, but that’s the general approach I’d probably start with. Like, I need to see if there’s a Count property on the collections output by Group-Object, but I bet there is. I’d need to see if that $_[0] syntax worked, but I bet it would. Or something similar.
ok, so this is definitely working correctly for me. by that i mean its giving me the results i would expect, and it seems to work fairly quicky on almost 1800 rows. would this be the best way?
Curtis,
this is actually another step i have to add to a much larger powershell script im working on. so yes… it has to be powershell due to the way the data is coming down initially. i have no control over how its coming to me. everything is scripted so far, so im looking to script the “cleanup” of the original csv.
Then I would say what you have is probably the best way, although you might be able to improve it a little by moving your sort-object. Put it before the Group-Object so that you sort the whole list once rather than having to sort multiple times on the individual groups. You can also typecast your property for sort rather than converting it in a foreach loop first.
so by your suggestion, im sorting by the “entrydate” column first, then by using “group-object stuid” im running a foreach against it to get rid of the duplicates?