Trying to quantify corruption and data loss in my ECM. Did a query in SQL to get the number of items that are SUPPOSED to be there. Query returned 25 million lines. Saved as csv becomes almost 8 GB. I think there are multiple lines per file.
I want to run PowerShell against the csv and get a new csv with only unique values. Plan to then use that csv with PowerShell (Test-Path ) to get a list of what is/isn’t there so I can try to remediate.
I have tried gc large.csv | get-unique > new.csv - I ended up with an 11 GB csv.
I am trying:
Import-CSV .\large.csv | srt filepath -Unique | Export-CSV .\new.csv - NoTypeInformation
As you can imagine, it is using huge amounts of time and huge amounts of RAM. What am I doing wrong? How can I do this more efficiently?
Thank you for your help!