I have a 300Mb file. I need to read the file, deduplicate the content, and also edit some of the lines based on some simpler criteria.
My issue is that I cannot seem to get this to complete in a sensible amount of time.
I have written some code which is even relatively slow with small files, but does do the job. Unfortunately not appropriate when the file gets to its full size.
I have tried using various approaches, but non ear performant, so I am looking for code that can
read a file
deduplicate the content, full lines
edit the content of a line, based on some simple criteria
The code that I have goes along the lines of $hash = @{} $outstream = [System.IO.StreamWriter] $newfile = [System.IO.File]::ReadLines($file.FullName) | % { if ($hash.$_ -eq $null -and $_ -ne [char]34) { $lastChar = $_.SubString($_.Length - 1, 1) if ($LastChar -ne [char]34) { $a = $_ + [char]34 } else { { $a = $_ } $stream.WriteLine($a) } $hash.$_ = 1 } }
This is only a snippet of the code. I am looking at approaches / guidance as to how how to make this approach more effecient
Anyone got any suggestions as to how to make this more effecient?
Your code seems to be broken/incomplete. That happens when you don’t use the proper formatting. Could you please edit your post and correct the formatting of the code? Simply click on the preformatted text button ( </> ) and paste the code where you’ve been told.
Thanks in advance.
It might be helpful as well when you post a small part of the input data as well. (formatted as code as well please)
Thanks for your comments. I have updated but the formatted doesn’t does to be as useful using the preformatting option.
I have tried a variety of methods to get efficient read - edit - write the above is just one of those attempts, and purely a guide as to the type of thing that I am doing. I was hoping someone might be able to point me in a directions that I could dig into that would make my task complete quicker.
As an idea this code is estimated to take 3.33 hours to process the file. I am can carry out those same tasks manually to a tool like notepad++ in a few minutes, as such I was hoping that there may be an alternate approach that may given improvements. 30 minutes if something that would be an acceptable to time for a coded solution.
I have just copied the code that you provided above and pasted that, but the formatting seems to go as soon as it goes into the back ticks. I am obviously doing something wrong but have no idea what that may be, sorry.
I did not provide any code at all. I just tried to show you how to post code. It is YOUR CODE. I just tried to reformat it and posted it just like it is!!!
OK, but I don’t like to create a text file big enough to play around with by myself typing a lot of text to be able to help you with your problem. Can you understand this? So it would be nice if you could post your code and some sample data correctly. So we can copy it and try it.