Powershell - Merging csv files, depending on filename.

Hi

I have tried to find a solution to my problem, but can´t find the solution I am looking for, so hopefully, someone in here can help me.

I have millions of .csv files that I need to merge by a wildcards match on the file name, the merged .csv file needs to be divided by some of the filenames and only contain one header.

Let´s say I have the following files.

0006041651_00017771_20190820.csv
0006041651_00017771_20190819.csv
0006041651_00017771_20190818.csv
0006041651_00017771_20190817.csv
0006041651_00017771_20190816.csv

0006041651_00013333_20190820.csv
0006041651_00013333_20190819.csv
0006041651_00013333_20190818.csv
0006041651_00013333_20190817.csv
0006041651_00013333_20190816.csv

What I need. Is something that can match some of the filename 00017771 merge it with all the other files in the folder that contains 00017771 and only keep one header for all the merged data.
When done with 00017771 it should move on to the next 00013333 and so on.

I think I need to set something that knows what I am looking for in the file name, is the second _ in the filename, and ends with the third _

xxxxxxxxxxx_00017771_xxxxxxxx.csv

Hope this is not too confusing :slight_smile:

 

You can have a grouped object using calculated properties and Group-Object cmdlet then merge with the help of ForEach-Object cmdlet.

$FileGroup = Get-ChildItem -Path <csv files located folder> | Select-Object -Property FullName,@{E={($_ -split '_')[1]};L='Pattern'} | Group-Object -Property Pattern

$FileGroup | ForEach-Object -Process {Import-Csv -Path $_.Group.FullName | Export-Csv -Path <NewMergedCsvPath> -NoTypeInformation}
#Requires -Version 5
#Requires -Modules AZSBTools

# Merge CSV files based on common 2nd block in file name


# Read metadata/file names
$FolderPath = 'C:\scripts'
$FileList = Get-ChildItem -Path $FolderPath -File -Filter *.CSV
Write-Log 'Identified',$FileList.Count,'CSV files in folder',$FolderPath Green,Cyan,Green,Cyan

# Merge files
$SecondBlockList = $FileList.Name | foreach { $_.Split('_')[1] } | select -Unique
Write-Log 'Identified',$SecondBlockList.Count,'CSV file groups' Green,Cyan,Green

foreach ($FileGroup in $SecondBlockList) {
    foreach ($FileName in ($FileList.FullName -match $FileGroup)) {
        Import-Csv $FileName | Export-Csv "$FileGroup.CSV" -Append
    }
}

Write-Log 'done' Darkyellow

Hi Sam

Thank you for the help

I have version 5, and module AZSBTools, but get the following error for each file.

Export-Csv : Cannot append CSV content to the following file: 50748463.CSV. The appended object does not have a property that corresponds to the following column: 0006041651;50748463;2019-06-27 03:00:00;00;5074846
3;2019-06-27 00:13:00;263. To continue with mismatched properties, add the -Force parameter, and then retry the command.
At C:\VK_Powershell\combine_csv_files.ps1:18 char:32
+ Import-Csv $FileName | Export-Csv "$FileGroup.CSV" -Append
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (0006041651;5074...27 00:13:00;263:String) [Export-Csv], InvalidOperationException
+ FullyQualifiedErrorId : CannotAppendCsvWithMismatchedPropertyNames,Microsoft.PowerShell.Commands.ExportCsvCommand

I have added the -force parameter at the location suggested. -> At C:\VK_Powershell\combine_csv_files.ps1:18 char:32

foreach ($FileGroup in $SecondBlockList) {
foreach ($FileName in ($FileList.FullName -match $FileGroup)) {
Import-Csv $FileName | Export-Csv "$FileGroup.CSV" -Force -Append
}
}

It now works as intended.

PS C:\Windows\system32> C:\VK_Powershell\combine_csv_files.ps1
Identified 173 CSV files in folder \\inputfolder  
Identified 16 CSV file groups 
done

But I am unable to find the merged files anywhere?

current folder C:\Windows\system32

Hi Sam

Thanks once again.

Is there any way to determine where the output should save the merged files after processing all of them

Also it looks like the header of the orginal file has not been preserved, instead I have a new header, and added “” to all the ,

, is now “,”

#TYPE System.Management.Automation.PSCustomObject
"0006041651;50748469;2019-08-15 03:15:00;00;50748469;2019-08-15 00:07:00;155","09;2019-12-31 00:00:00;132","63;2018-12-31 00:00:00;99","77;2017-12-31 00:00:00;1511;2015-02-10 00:00:00;0;14"
"0006041651;50748469;2019-08-15 03:30:00;00;50748469;2019-08-15 00:22:00;155","09;2019-12-31 00:00:00;132","63;2018-12-31 00:00:00;99","77;2017-12-31 00:00:00;1511;2015-02-10 00:00:00;0;14"

Original

#serial-number;device-identification;created;value-data-count;fabrication-no,,inst-value,0,0,0;datetime,,inst-value,0,0,0;volume,m3,inst-value,0,0,0;date,,inst-value,0,0,0;volume,m3,inst-value,0,0,1;date,,inst-value,0,0,1;volume,m3,inst-value,0,0,2;date,,inst-value,0,0,2;on-time,day(s),inst-value,0,0,0;date,,inst-value,0,0,5;error-flags-dev-spec,,inst-value,0,0,0;fw-version,,inst-value,0,0,0
0006041651;50748469;2019-08-18 03:15:00;00;50748469;2019-08-18 00:07:00;155,44;2019-12-31 00:00:00;132,63;2018-12-31 00:00:00;99,77;2017-12-31 00:00:00;1514;2015-02-10 00:00:00;0;14

 

 

#Requires -Version 5
#Requires -Modules AZSBTools

# Merge CSV files based on common 2nd block in file name


# Read metadata/file names
$FolderPath = 'C:\scripts'
$FileList = Get-ChildItem -Path $FolderPath -File -Filter *.CSV
Write-Log 'Identified',$FileList.Count,'CSV files in folder',$FolderPath Green,Cyan,Green,Cyan

# Merge files
$SecondBlockList = $FileList.Name | foreach { $_.Split('_')[1] } | select -Unique
Write-Log 'Identified',$SecondBlockList.Count,'CSV file groups' Green,Cyan,Green

foreach ($FileGroup in $SecondBlockList) {
    foreach ($FileName in ($FileList.FullName -match $FileGroup)) {
        Import-Csv $FileName | Export-Csv "$FolderPath\$FileGroup.CSV" -Append -NoTypeInformation
    }
}

Write-Log 'done' Darkyellow

Updated line 18 puts the output files in the same folder as the input files line 8 and removes 'type information line 1 in output files.

Thx once again, I now have to option to redirect them.

Do you know how to get it to stop adding “” to my commas?

, becomes “,”

00:07:00;155“,”09;2019-12-31 00:00:00;132“,”63;2018-12-31 00:00:00;99“,”77;

"0006041651;50748469;2019-08-15 03:15:00;00;50748469;2019-08-15 00:07:00;155","09;2019-12-31 00:00:00;132","63;2018-12-31 00:00:00;99","77;2017-12-31 00:00:00;1511;2015-02-10 00:00:00;0;14"
"0006041651;50748469;2019-08-15 03:30:00;00;50748469;2019-08-15 00:22:00;155","09;2019-12-31 00:00:00;132","63;2018-12-31 00:00:00;99","77;2017-12-31 00:00:00;1511;2015-02-10 00:00:00;0;14"

Look into the -Delimiter parameter of the Export-Csv cmdlet