big files

by zim at 2013-03-15 13:51:29

I need to search really large (5gb) log files for operation that happen with in a 15 sec time period. Right now do something like get-content then use a foreach loop on each line until I find the 15secs of log that I want. This is taking a very long time. Does anyone know of a faster way to process large files?
Thanks
by DonJ at 2013-03-15 13:54:33
Depending on what you are looking for in the log file, Select-String is your other choice.
by mjolinor at 2013-03-15 14:28:56
I’ve had good luck handling large files using get-content with -readcount, and then running the resulting arrays through foreach-object, with a -match operation. Typically the optimum -readcount number is 10 -20K records at a time.
by MasterOfTheHat at 2013-03-17 11:03:26
What kind of log files? You may want to look at using something like LogParser instead of trying to run it through Powershell.

molinar may want to comment on this one, since he put it out on stackoverflow:
http://stackoverflow.com/questions/9439210/how-can-i-make-this-powershell-script-parse-large-files-faster
by mjolinor at 2013-03-17 19:35:21
Depending on the circumstances, a logparser solution might be faster. You’d need to test both in different scenarios.

In this application the readcount/foreach -match solution has the advantage of being able to put in a simple test so that it stops reading as soon as it stops finding any log entries in the specified time range. If the records your looking for are near the beginning of the file it will run relatively quickly since it will only have to read through a small portion of the file. If they’re near the end it will have to read most of the file to find them.

It’s been awhile since I used logparser, but I believe it’s going to read through the entire file.
by MasterOfTheHat at 2013-03-18 06:45:36
I’m not sure if does or not… Since it treats the entire log file or files as a single data entity, I guess it would have to read the entire file, though.

BTW, if you don’t want the learning curve and text-based interface, (wait… you’re using PowerShell, right??), then take a look at Log Parser Studio. I haven’t used it personally, but I’ve heard several good things about it.
by mjolinor at 2013-03-18 06:47:41
[quote](wait… you’re using PowerShell, right??), [/quote]

I like to think so.
by MasterOfTheHat at 2013-03-18 10:21:14
[quote="mjolinor"][quote](wait… you’re using PowerShell, right??), [/quote]

I like to think so.[/quote]
Was supposed to be funny because PowerShell is text-based, so anyone using posh shouldn’t be scared of/shouldn’t dislike a text-based interface… Apparently I needed another cup of coffee when I was writing that post! :wink:
by mjolinor at 2013-03-18 10:57:43
[quote]Was supposed to be funny because PowerShell is text-based, so anyone using posh shouldn’t be scared of/shouldn’t dislike a text-based interface… Apparently I needed another cup of coffee when I was writing that post! ;)[/quote]
I should have added the :wink: at the end of that reply.
It was understood as intended, just making fun of myself :).
by zim at 2013-03-19 13:07:29
Thanks for the suggestions one of the problems that I have is the 15 sec interval can be anywhere with in the 5gb file. So I am not sure how I could use the get-content with -readcount since I don’t know where in the file that I need to search.
Something that I don’t under stand is what does get-content really return. If I so something like: $tempFileContent = get-content $fileName, then use something like $tempFileContent[n], this returns the nth line in the file. But if I do $tempFileContent this returns some big number, it is not the number of lines in the file.
by mjolinor at 2013-03-19 13:47:09
You won’t have to know where in the file it is. You just have to write the match so that it matches any record in that 15-second range. Then write the script to run until it starts finding matches, keep running as long it’s still finding them, and quit when it stops finding them.

When you use get-content with readcount = n, it starts handing you chunks of n records. You can do a single -match operation against that entire "chunk" at once.
Start with a test variable set to $false.
Run the chunks of records through foreach-object, and check for matches.
If it finds any matches, store them in another variable or write them to a file, and set that test variable to $true.
If it didn’t find any matches, and the test variable is $true, it means you’ve gotten past that 15 second window. There aren’t going to be any more matches, so there’s no point in reading any more of the file, and you can quit there.
by MasterOfTheHat at 2013-03-19 13:50:35
Get-Content returns an array of objects where each element represents a single line. So that’s why $tempFileContent[n] returns the nth line in the file. And $tempFileContent should return every line of text in the file, (basically the same as using "cat <filename>" in Windows or Linux). I’ve never come across what you’re describing. The file you’re looking at is an ASCII text file, right?
by notarat at 2013-03-21 06:52:12
[quote="zim"]Thanks for the suggestions one of the problems that I have is the 15 sec interval can be anywhere with in the 5gb file. So I am not sure how I could use the get-content with -readcount since I don’t know where in the file that I need to search.
Something that I don’t under stand is what does get-content really return. If I so something like: $tempFileContent = get-content $fileName, then use something like $tempFileContent[n], this returns the nth line in the file. But if I do $tempFileContent this returns some big number, it is not the number of lines in the file.[/quote]

Is it possible for you to update your Event Viewer to log that particular event to a custom log? if so, you can review that logfile for the instanceID

Something similar to:
Get-EventLog Application | Where-Object {$_.InstanceID -eq "XXX"} |Export-CSV c:\outputfiles\EventHappened.txt