Hi, I have some CSV files that I need to edit, the files have only 2 fields: ID wich is a 18 digit number and Code wich is a pretty long string.
In theory the Code field should always begin with the letter “F”, but I have some istances in which this isn’t true and I have either blanks or different characters, I need to remove those lines.
Here is my code so far, it seems to work, but it is a bit slow (on my first file I have 700000 rows) and I was wondering if there was a way to improve it
$Path1 = Read-Host "Inserisci file origine" #Folder with origin and edited files
$File = Read-Host "Inserisci il nome del file senza estensione" #file name I need to edit
$Path2 = ($Path1 + '\' + $File +'.csv') #file fullname
$PathNew = $Path1 + "\" + $File +"edited.csv" #edited file fullname
$newCSV = @() @this one wil be filled with the ok lines of my file
$Data = Import-Csv -Path $Path2
$RowIndex = 0
foreach ($row in $Data)
{
if( $Data[$RowIndex].Code -ne "") #if Code is not empty
{
if ($Data[$RowIndex].Code.Substring(0,1) -eq "F") #check if first character is equal to "F"
{
Write-Output $RowIndex #I'm printing the rows to keep track of advancement
$newCSV += $row
}
}
$RowIndex ++
}
$newCSV | Export-Csv $PathNew -NoTypeInformation
(Get-Content $PathNew).Replace('"', '') | Set-Content $PathNew #I need to remove the " from the CSV
Thank you in advance,
Alice
I think something like this should work:
$Data = Import-Csv -Path $Path2
$newCSV = $Data | Where-Object { $_.Code.StartsWith("F") }
$newCSV | Export-Csv $PathNew -NoTypeInformation -UseQuotes Never
Filtering should be much quicker than iterating through a loop.
1 Like
Thanks, I’ll define try this!
Here’s a shortened version of PSDarwin’s code. This avoids using a temporary array in memory, especially if the input file is very large.
remark about the initial code :
it is not a good idea to use $newcsv=@() and in the loop $newcsv += $row because each time, a new array is created and $row is added to new new array. Repeating this action 700 000 times is very slow and use a lot of memory. A faster solution is to use a collection arraylist like that :
and if you want to see the progress :
I concur with this statement, but I would say the best recommendation if possible is to just let the output fill the variable
$newCSV = foreach ($row in $Data){
if ( $row.startswith("F") )
{
$row
}
}
No need for a list/arraylist or anything. Now there are times when you can’t just collect all the output or you want to tightly control what gets added, I would shy away from the arraylist as it’s quite old and has the side effect that Christophe clearly knows about, but almost every new person gets tripped up by; it outputs the element number when you add so you have to $null or [void] it out. (as Christophe did.)
The generic list is performant and does not output junk
$newCSV = [System.Collections.Generic.List[object]]::new()
foreach ($row in $Data){
if ( $row.startswith("F") )
{
$newCsv.add($row)
}
}
You can also add Using Namespace to make it easier to type and read
using namespace System.Collections.Generic
$newCSV = [List[object]]::new()
foreach ($row in $Data){
if ( $row.startswith("F") )
{
$newCsv.add($row)
}
}
Just keep in mind the using statements need to be at the top of the script file.
2 Likes
Hello doug,
OK, i agree theses versions are “cleaner”.
But in my defense, I am system administrator in a company and manage about 500 servers with old OS (2003, 2008, 2008R2, 2012, 2012R2, 2016 and 2022 !!!) I have no choice and have to do with this context, so when i write powershell scripts, i am using old syntax that are compatible with powershell before 5.1 (your sample with using “syntax” or just with “$newCSV = [System.Collections.Generic.List[object]]::new()” is not working with powershell 4.0 and older).
I would like to use the newer syntaxes but I have to work within the constraints of my technical environment. I know I could install version 5.1 with WMF 5.1 but this version requires Windows Server 2008 R2 minimum (so not available on 2003 and 2008) and sometimes, il is not supported with some applications (for example, WMF 5.1 was not supported on Exchange 2013 servers).
No worries, just use New-Object
$newCSV = New-Object System.Collections.Generic.List[object]
foreach ($row in $Data){
if ( $row.startswith("F") )
{
$newCsv.add($row)
}
}
It works all the way back to powershell 2.0, I just tested.
1 Like
thanks.
Just try on a Windows Server 2003 and it’s OK with New-object