Make powershell script run faster

hdsouza · January 20, 2018, 1:56pm

I need to evaluate the lines in a file. Each line is made up of loan data like price, credit score, etc etc . I need to filter on these values and if selected write the line to an output file.

The problem is the file has 400,000 + lines.
Each line has about 20+ values to filter on,
And I cannot take more than 60 seconds for this part of the script.
Any ideas how i can make the code faster?

Here is the code:

$File_NotesDownload = "C:\temp\NotesDownload.csv"
$File_NotesGood = "C:\temp\NotesGood.txt"
if (Test-Path $File_NotesGood) { remove-item $File_NotesGood }
$Start = Get-Date

$Notes = Get-Content $File_NotesDownload
foreach($Note in $Notes) 
   {
    $Note = $Note -replace '"',''
    $Loan_Id = ($Note -split ',')[0].trim()
    $Loan_Id = $Loan_Id -as [int]
    if (($Loan_Id -is [int])  -eq $false) {Continue}

    $Loan_Status     = ($Note -split ',')[5].trim()
    $Ask_price       = ($Note -split ',')[6].trim()
    $Markup_Discount = ($Note -split ',')[7].trim()
    $YTM             = ($Note -split ',')[8].trim()
    $DaysSinceLastPayment  = ($Note -split ',')[9].trim()
    $CreditScoreTrend  = ($Note -split ',')[10].trim()
    $FicoEndRange      = ($Note -split ',')[11].trim()
    $NeverLate         = ($Note -split ',')[13].trim()
    $Loan_Class        = ($Note -split ',')[14].trim()
    $Loan_Maturity     = ($Note -split ',')[15].trim()
    $Interest_Rate     = ($Note -split ',')[17].trim()
    $RemainingPayments = ($Note -split ',')[18].trim()

    if ($Loan_Maturity -ne 36) {Continue}
    if (($Interest_Rate -lt 4) -Or ($Interest_Rate -gt 25)) {Continue}
    if ($Ask_price -gt 20) {Continue}
    if ($Markup_Discount -gt -0.01) {Continue}
    if ($YTM -lt 10) {Continue}
    if ($CredScoretrend -ne "UP") {Continue}

    $NotesList += $Note + "`r`n"
    }
$NotesList | out-file $File_NotesGood

$End = Get-Date
$TotalTime = New-Timespan -Start $Start -End $End
$TotalTimeSecs = $TotalTime.seconds
$TotalTimeSecs

postanote · January 20, 2018, 6:00pm

Notes on this topic —

There are things (commands, functions, API’s, modules, constructs, etc.) in any language that are just slow, and then the are things you introduce into your code which make it slow.

The former, means, things like native cmdlet/functions/operations. Loops are just slow by design, especially if you are reading in large data. Some cmdlets are just slow by design, for example the Web cmdlets.
Improvements in the core language occur over time. Take a look at PowerShell Core (PowerShell v6) as an example. PSCore is not a replacement for PowerShell. There are lots of things missing to make it cross platform capable.

The later is allow about you and what you are doing. Large files are going to be slow, period. Loops on large files are going to be slow.
Get-Content loads the entire file into memory (once loaded, then it acts on it) and if you don’t have enough, you will run into issues, performance and failures.

If you have large files, don’t use Get-Content, use the .Net libraries (StreamReader) instead. There are lots of articles on this topic on the web about how to use the .Net libraries and or Get content to read large files in chunks in order to not tax memory and improve performance.

So, by example…

The Get-Content cmdlet does not perform as well as a StreamReader when dealing with very large files. You can read a file line by line or in a batch using Get-Content or StreamReader. The .Net StreamReader is still faster.

System.IO.File.ReadLines()
It returns all the lines of a file, but lets you begin iterating over the lines immediately which means it does not have to store the entire contents in memory.

$path = ‘C:\A-Very-Large-File.txt’
$r = [IO.File]::OpenText($path)
while ($r.Peek() -ge 0) {
$line = $r.ReadLine()
# Process $line here…
}
$r.Dispose()

Some performance comparisons:

Measure-Command {Get-Content .\512MB.txt > $null}

Total Seconds: 49.4742533

Measure-Command {
$r = [IO.File]::OpenText(‘512MB.txt’)
while ($r.Peek() -ge 0) {
$r.ReadLine() > $null
}
$r.Dispose()
}

Total Seconds: 27.666803

Using Get-Content leveraging batching

$path = “E:\Documents\Projects\ESPS\Dev\DataFiles\DimProductionOrderOperation”
$infile = “14SEP11_ProdOrderOperations.txt”
$outfile = “PROCESSED_14SEP11_ProdOrderOperations.txt”

$batch = 1000

Here is one of the better articles on the topic.

matt-bloomfield · January 21, 2018, 6:12am

Have a read of some Chrissy LeMaire articles; she has blogged on techniques for increasing performance when working with CSV files.

Specifically, I think the Streaming a 32-bit OleDbconnection section in the second article, which describes performing a SELECT on CSV file using a WHERE clause, will be of use to you. You’ll just need to modify it to write to a file instead of a SQL database.

sam-boutros · January 21, 2018, 6:14am

Please post a dozen or so lines as representative sample of what a line in this file looks like. Make sure to remove/replace any PII

hdsouza · January 21, 2018, 7:57am

Thanks Postanote,
Your script worked great. Originally I believe I received about several minutes to maybe an hour.
With your script I managed to get it down to 9- 12 seconds !!!

hdsouza · January 21, 2018, 8:00am

Thanks Matt. Great tip

postanote · January 21, 2018, 2:47pm

No worries, glad it helped. I have to deal with large log files daily, so I get your pain on this one. Especially when trying to get results to folks on such large files who have low to no patience.

hdsouza · January 27, 2018, 7:05pm

I have another problem with the speed.
As I add more lines to the $NotesList the script takes longer and longer. I believe .NET is duplicating the entire array in memory, adding the new item, and deleting the old copy in memory https://blogs.technet.microsoft.com/ashleymcglone/2017/07/12/slow-code-top-5-ways-to-make-your-powershell-scripts-run-faster/. Is there a way around it?

$File_Path = "C:\Temp\"
$File_NotesDownload = $File_Path  + "SecondaryMarketAllNotes.csv"
$File_NotesEval     = $File_Path  + "SecNotesEval.txt"

$r = [IO.File]::OpenText($File_NotesDownload)
while ($r.Peek() -ge 0) 
    {
     $Note = $r.ReadLine()
    $NotesList += $Note
    }
$NotesList | out-file $File_NotesEval
$r.Dispose()

Topic		Replies	Views
Powershell - Delete line from text file if it contains certain string PowerShell Help	19	4403	May 16, 2024
Small Script - Massive Memory consumption, low CPU but takes hours to complete?? PowerShell Help	23	447	May 16, 2024
Read CSV file in chunks PowerShell Help	4	495	May 16, 2024
I want faster result! PowerShell Help	3	146	May 16, 2024
big files PowerShell Help	1	197	May 16, 2024

Make powershell script run faster

Some performance comparisons:

Related topics