Finding fuzzy duplicate items in rows of a CSV

archives · December 31, 2011, 6:30pm

by notarat at 2013-02-21 16:17:11

Bear with me please…I’m still very new to Powershell (I use PS 2.0 at work) and I’ve been using it to pull lists in CSV format of all users in my OU and the security/distribution groups they are in.

I see instances where users are assigned to, basically, the same distribution groups, where their names are basically 99% duplicated.

Example:

tim.jones, Marketing_East, Marketing_West, Marketing_South, Marketing_North, Marketing_East2, Marketing_US

Marketing_East & Marketing_East2 are identical in all but name.

I’d like to find a script that would automate the search through each row of my CSV File to identify instances where there are "fuzzy" duplicate groups like the example.

I’m still getting the hang of Power shell scripting, to be honest, and my efforts so far have been directed more towards "getting" the information, than "processing" it.

I searched, but I’m having a tough time finding examples of searching through a CSV row by row for, what I would call "fuzzy duplicates" (groups or items that are spelled nearly alike, but differ only a little at either the beginning or end of the field)

Are there any example scripts out there or does anyone have an example they can share?

by DonJ at 2013-02-22 04:34:28

Nothing I’ve seen. This is a pretty tough task, because the shell doesn’t have any native functionality to do this. You’ll essentially have to make a collection of every group name, and then enumerate that and perform some kind of wildcard comparison. It might be easier to load them into a SQL Server database, since you could then take advantage of SQL-side comparisons like SOUNDEX(), which is explicitly a fuzzy-comparison. PowerShell doesn’t have anything native that’s quite like it.

by notarat at 2013-02-22 05:08:54

Don,

Thanks for the response(!) even though it was a confirmation of my fears…

I’m even less adept at SQL than Power Shell, lol. I guess I’ll be buying some SQL Books this weekend, haha

Have a good one.

Topic		Replies	Views
Another need help to find dupes in csv file PowerShell Help	2	257	March 20, 2019
Check for duplicate value in CSV file PowerShell Help	8	928	June 11, 2015
How would you make this faster? PowerShell Help	17	306	October 23, 2014
Why the Duplicate Rows PowerShell Help	3	307	May 27, 2016
Find duplicate entries in CSV file PowerShell Help	8	497	February 3, 2017

Finding fuzzy duplicate items in rows of a CSV

Related topics