edit many CSV files

i need some guidance. i have hundreds of CSV files i need to work with–they have some bad data we have to manually clean up prior to an import–as we visually verify the bad data :frowning:

first, i wanted to get all of the bad data into one file so i wouldnt have to sort and edit hundreds of them. working based on the “date” column, i got everything with an empty date or a bad date of “00-00-0000” copied into a new file with this:

$csvPath = "\\server\share\CSV Files"
$outpath = "H:\Documents\Scripts\csv_admin_empty.csv"
$colHeader = 'Date'

Get-ChildItem $csvPath -Recurse -Include *.csv | ForEach-Object { import-csv $_.FullName | Where-Object {$_.$colHeader -eq '' -or $_.$colHeader -match "00-00-0000"}} | Export-Csv -Path $outpath -NoTypeInformation

second, i want to now remove all rows with bad date info from the existing CSV files–that way we wont import bad data/duplicates, and i can update the CSV file i just made with good data for import.

but…i am still learning powershell and cant get what i want. i have tried a few iterations of this but i get a blank file, or i get my file “updated” where every empty column looks like: ,"", instead of just , /facepalm

so im looking for some guidance on how im processing this wrong. ive done some reading and found some examples but im not understanding something still. i dont think the -ne and -notmatch in this are matching correctly–if i import a single file with that pattern, then export it to a new file, it hasnt ignored the “” or “00-00-0000” rows like i want.

$csvPath = "H:\Documents\Scripts\csv"
$outpath = "H:\Documents\Scripts"
$colHeader = 'Date'

Get-ChildItem $csvPath -Recurse -Include *.csv | ForEach-Object { import-csv $_.FullName | Where {$_.$colHeader -ne '' -or $_.$colHeader -notmatch "00-00-0000"} | Export-Csv $_.fullname -NoTypeInformation }

simplified sample CSV data

site,id,Date,Form Id,Doc ID,DOB,Pat Loc,FileType,apptla
DPL,1/26/2006, Records,100052,5,CRMI
DPL, Records,100052,5,CRMI
DPL,1/27/2006, Records,100006,5,CRMI
DPL,00-00-0000, Records,100028,5,CRMI
DPL,00-00-0000, Records,100028,5,CRMI

any help would be appreciated.

Powershell will quote fields with the csv cmdlets; even empty fields. If your application doesn’t like the quotes, then the only way around that is to write your own Export-Csv or post process the data to remove quotes.

Provided you’re confident there are no double quotes that should be in the file:

$colHeader = 'Date'

$data = @"
site,id,Date,Form Id,Doc ID,DOB,Pat Loc,FileType,apptla
DPL,,1/26/2006, Records,100052,,,5,CRMI
DPL,,, Records,100052,,,5,CRMI
DPL,,1/27/2006, Records,100006,,,5,CRMI
DPL,,00-00-0000, Records,100028,,,5,CRMI
DPL,,00-00-0000, Records,100028,,,5,CRMI

$data | 
ConvertFrom-Csv | #Up to this point simulate Import-Csv
Where {
    $_.$colHeader -ne '' -or 
    $_.$colHeader -notmatch "00-00-0000"
} |
ConvertTo-Csv -NoTypeInformation |
ForEach-Object {
} # | pipe to a cmdlet like Set-Content or Out-File

Also you can’t set-content to a file that is open in the pipeline, so you either have to use temporary files or import the data first then loop over that apart from the pipeline. That way the file handle gets closed for editing.

ForEach($file in (Get-ChildItem #...#) {
$data = $file | Import-Csv

$data | #where-object etc... | set-content

thank you, that is very helpful!

i decided i can just output the files in another folder with the same name-- it makes sense that it cant work on them as they are open.

in a test group [4 files] takes a moment, so i think its processing them all with:

$csvPath = "H:\Documents\Scripts\csv"
$colHeader = 'Date'

ForEach($file in (Get-ChildItem $csvPath -Recurse -Include *.csv)) 
$name = (gi $file).name
$outpath = "H:\Documents\Scripts\csv\updated\$name"
$data = Import-Csv $file}
$data | Where-object {
    $_.$colHeader -ne '' -or 
    $_.$colHeader -notmatch "00-00-0000"
} | ConvertTo-Csv -NoTypeInformation |
ForEach-Object {
} | set-content $outpath

i think that where-object bit is giving me a fit. i am only testing on 4 of the 800+ files. that block just outputs one of them. 3 of 4 files have criteria in ‘date’ that are blank or 00-00-0000 so something isnt right–id like to get them all back, and only have 3 of the come out different.

the file that has “” date entries gets output–but the rows with “” date entries are still in the file

files with 00-00-0000 date entries do not get output at all.

re: "Powershell will quote fields with the csv cmdlets; even empty fields. "

why doesnt the output of my OP command have quotes in it? is it just because it was basically copying rows and not doing anything else? not that important but i will keep in mind that it quotes everything in the future. thank you for pointing that out.

Try moving the closing brace } that is at the end of the Import-Csv line to after Set-Content. I think that the logic in the where-object ought to be -and instead of -or.

aha! i understand why moving the } gave me all the files–i had tried enclosing the output similarly before; i should have tried again :slight_smile:

going from -or to -and worked for the match. im not sure i understand why, since -or worked fine for the first round where i was copying the data out i assumed it would work this way as well. ill have to read better about the operators /assumptions, grr

thanks so much!

There are two comparisons each. Your first set was positive, -eq and -match. The second is negative, -ne and -notmatch. That is where it changes.

Take an illustration trying to compare on your name. Assume $name is ‘Dave’.

$name -eq 'Dave' -or $name -eq 'John'
# becomes
$True -or $False
# becomes

So if your name is Dave or John it’ll be true. Now reverse it for trying to find if the name isn’t Dave or John, but try to use -or

$name -ne 'Dave' -or $name -ne 'John'
# becomes
$False -or $True
# because 'Dave' -ne 'Dave' is false
# but 'Dave' -ne 'John' is True
# so the whole statement is

So in the case of multiple negative comparisons you need -and, like:

$name -ne 'Dave' -and $name -ne 'John'
$false -and $true
# since both have to be $true for a result of $true
# it becomes:

that is a good explanation, and something i am likely to forget. i’ll definitely make a note of it.

thanks for your help