Script to Find Replace Multiple strings in multiple text files using Powershell

I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (150 to 200 text files, with each text file having code, not more than 10000 lines. Sample DOG0001.g attached). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.

I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the requirement. I would also like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns.

Sample contents of Input.csv (Number of objects in csv may vary from 50 to 1500) and sample text file in which text needs to be replaced, are attached.

The Code

    $Iteration = 0
    $FDPATH = 'D:\opt\HMI\Gfilefind_rep'
    #& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
    $GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
    $FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
    foreach($Graphic in $Graphicslist){
        Write-Host "Processing Find Replace on : $Graphic"
        foreach($item in $FindReplaceList){
        Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
            Remove-Item $Graphic
            Rename-Item ($Graphic+".tmp") $Graphic
            $Iteration = $Iteration +1
            Write-Host "String Replace Completed for $($item.ReplaceString)"
        }
    }

I have gone through posts here and in other forums such as Stackoverflow, and gathered valuable inputs, based on which the code was built.

To summarize,

[ol]I would like to know if the above code can be optimized for execution, since I feel it takes a long time to execute.
I would like to add the number of Iterations being carried out in the loop. I was able to add the current Iteration number onto the console, but couldn’t figure how to pipe the output of
Measure-Command onto a variable, which could be used in Write-Host Command.
I would like to know, if it is possible to know the number of replacements made, at the end of execution.
I would also like to display the time taken for code execution, on completion.[/ol]

Thanks for the time taken to read this Query. Much appreciate your support!

Write-Host is rarely optimal :).

Something about the file names you used prevented the attachments from working, sorry.

You can possibly do everything you want, although counting the number of replacements will make this a lot more complex. It’s usually easier to just tackle one problem at a time… Is there one thing you’d like to start with?

Thanks for reply, Mr. Jones.

From what you said, I deduce, I would need to drop the Write-Host from my loop. Noted. But with such a large volume of execution, how do you suggest I monitor the progress?

My primary objective is to optimize the loop for efficient (fast and functional) execution. The others, like adding time taken, Iterations completed and replacements made, will be good to have. But they are secondary in nature.

Hope I am making my requirement clear. Thanks again for taking time and helping me out with my Query.

Well, for a huge file, there’s not a ton of optimization you can really do. A ForEach loop is often faster than a ForEach-Object statement, but will require more memory as the entire input has to be in RAM. In terms of just the find and replace loop, it’s probably memory-optimized already.

How large is the input file?

Typically, the code in each text file ranges between 2500 to 10000 Lines (Sample attached in the original post). I will be handling 100 to 150 text files at a time on an average. During freak instances it might go up to 300 or more.

The Input.csv file which contains the find / replace strings will be having anywhere between 50 to 1500 instances. (Sample attached in the original post).

Glad to hear from you that the loop is memory optimized already!! That’s a relief.

That’s a good-sized file. I don’t expect you’re going to be able to get it to run much faster without getting extremely complex. Although if anyone else has suggestions, I’m sure they’ll jump in.

Thanks for the insight Mr. Jones! Much appreciate your support.

As suggested, I’ll remove the Write-Host in the nested Loop, and try playing with the Write-Progress cmdlet. I hope Write-Progress does not bring down the efficacy of a loop like what a Write-Host does.

In the meantime, I’ll be sure to keep checking this space, if there is any better way to go about it.

Hey Sriram,

There’s quite a bit of disk activity going on that can slow things down, so this can be cut down a bit.

I did some testing and even with a dummy file with 10000 rows each of 500 characters, PowerShell is able to read a text file fully into a string variable. Using Out-String allows us to cast this variable as a string, whilst maintaining it’s format, and removing the need for the additional loop with get-content–>set-content action for every search and replace.

I took your maximum parameters (number of files and lines in the csv and graphic files) created and dummy files from the ones you had provided. After some alterations, the revised script on my T440s is able to process your maximums in just over 8 mins, including the use of a progress meter. Of course timescales for processing this can vary based on content and number of read/write operations in memory.

Hope this helps.


$fdPath = 'c:\data\test'
$graphicsFiles = Get-ChildItem -Path $FDPATH\*.txt | ForEach-Object FullName
$findReplaceList = Import-Csv -Path $FDPATH\Input.csv


$totalitems = $graphicsFiles.count
$currentrow = 0
foreach ($graphicFile in $graphicsFiles)
{
    $currentrow += 1
    Write-Progress -Activity "Processing record $currentrow of $totalitems" -Status "Progress:" -PercentComplete (($currentrow / $totalitems) * 100)
    [string] $txtGraphicFile = Get-Content $graphicFile | Out-String
    
    ForEach ($findReplaceItem in $findReplaceList)
    {
        $txtGraphicFile = $txtGraphicFile -replace "$($findReplaceitem.FindString)", "$($findReplaceitem.ReplaceString)"
    }
    
    $txtGraphicFile | Set-Content ($graphicFile)
}


This was precisely the solution I was looking for Mr. Pringle!

Nothing more, nothing less. Works perfectly fine.

There is a minor glitch though, at the end of every replaced file, I find an extra newline {\r\n} being added. (I use the compare plugin in Notepad++ to compare the two text files). This is an inconvenience, and I need to go about deleting this extra added line in all the files.

The code works absolutely perfectly, and performance is much faster than I had anticipated. But I am not able to understand, why this newline is being added at the end of file. Kindly help comprehend.

Thanks again for the support Mr. Pringle, you made my day!!!

Hi,
Thanks for the script. I have a similar need but with a couple of more restrictions. I need to search for a string in the first column of a CSV file in approximately 5000 text files and replace them with the second column in the CSV array. I would also need to make sure that I only perform the replacement if the line starts with “*” and also not to go over 100 characters per line.

extra line break added by Set-Content. it always add linebreak if -value is a string if /hint/ -NoNewline parameter not useed