Code Efficiency - CSV Trim and Replace Operations


My code below finds all CSVs recursively in a directory, loads located CSVs, converts the CSVs into TSVs, trims first and end double quotes and then removes all double quotes between a tab space.

My Question:

Is there a way to make the code more efficient? I can’t seem to find a way to remove the ForEach-Object loop. Please help. Thanks. (For some reason the accent symbol does not work to display the code below)

$content = "C:\somedir"

$files = Get-ChildItem -Path $content -Recurse -Filter *.csv | 
	Select-Object -Property *, @{
		expression ={
			Write-Host $_.FullName
			$source = $_.FullName
			$dest = $_.DirectoryName + "\" + $_.BaseName + ".tsv"
			Import-Csv -Path $source | 
			convertto-csv -NoTypeInformation -Delimiter "`t" | 
			% { 
				ForEach-Object {
					$_.Trim('"') |
					% { $_ -replace (""t"" ,"t") }
			} | Set-Content "$dest"

Which ForEach-Object loop are you trying to eliminate? I count 3 of them in your code (line 12, 13, 15). % is an alias for foreach-object. Without samples of your csv, I think you might be better working with the raw data that multiple conversions (import-csv then convertto-csv). I’m not sure if my example below is more efficient, but you could use measure-command to see for sure. Either way I find yours a little hard to read with nested loops, but that is just a preference.

$content = ".\documents"
$files = Get-ChildItem -Path $content -Recurse -Filter *.csv

foreach ($item in $files)
   (Get-Content $item.fullname -Raw) -replace ',',"`t" -replace '"','' |
       Set-Content -Path "$($item.fullname).tsv"

Loops are traditionally avoided if possible, but here are a couple of options. This is the CSV to mess with:

$csv = @"

This would parse using Import-CSV, so if there are “rogue” double qoutes in the content, this is parsing it as a CSV and then gluing it together with string builder. While this is using for loops, there is no Replace operations, it’s just parse and glue it together:

$csv = Import-Csv -Path 'C:\Scripts\temp.csv'

$myNewCSV = [System.Text.StringBuilder]::new()

#Header Row
foreach ($row in $csv) {
    [void]$myNewCSV.Append("{0}rn" -f ($row.PSObject.Properties.Name -join ';'))

#Data Row
foreach ($row in $csv) {
    [void]$myNewCSV.Append("{0}rn" -f ($row.PSObject.Properties.Value -join ';'))

$myNewCSV.ToString() #Set Content



#Measure Command
Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 15
Ticks             : 152401
TotalDays         : 1.76390046296296E-07
TotalHours        : 4.23336111111111E-06
TotalMinutes      : 0.000254001666666667
TotalSeconds      : 0.0152401
TotalMilliseconds : 15.2401

Another approach is using Get-Content -Raw, which should bring the content in as raw content and not parse lines. This is available Powershell 4+.

$content = Get-Content -Path 'C:\Scripts\temp.csv' -Raw


PS C:\Users\rasim> $content.Count

PS C:\Users\rasim> $content = Get-Content -Path 'C:\Scripts\temp.csv' -Raw

PS C:\Users\rasim> Measure-Command {
$content = Get-Content -Path 'C:\Scripts\temp.csv' -Raw

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 12
Ticks             : 127013
TotalDays         : 1.47005787037037E-07
TotalHours        : 3.52813888888889E-06
TotalMinutes      : 0.000211688333333333
TotalSeconds      : 0.0127013
TotalMilliseconds : 12.7013