Log parsing for specific error, counting occurences, and show date

[quote quote=238016]Yeah powershell isn’t the fastest with large text files. May need another tool for faster speeds. The .net methods aren’t much different than get-content -raw. Also, the get-date seemed redundant and was called for each record. This should speed it up some.

PowerShell
33 lines
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$script = {
Get-Content .\sample.log* -raw |
ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
Select-Object -Property @{
Name = 'DateTime'
Expression = { '{0} {1}' -f $_.Date,$_.Time }
}, ID, Info,
@{
Name = 'DisplayName'
Expression = { ($_.DisplayName -split '=')[1].trim(',') }
},
@{
Name = 'ExAddress'
Expression = { ($_.ExAddress -split '=')[1].trim(',') }
},
@{
Name = 'SmtpAddress'
Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
} | Group-Object -Property displayname
}
foreach($group in (& $script))
{
$oldest = $group.group | sort datetime | select -First 1
$newest = $group.group | sort datetime | select -Last 1
[pscustomobject]@{
"First Occurrence" = "{0} {1}" -f $oldest.datetime,$oldest.id
"Last Occurrence" = "{0} {1}" -f $newest.datetime,$newest.id
Displayname = $group.name
Count = $group.Count
}
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[/quote]

Yes this is a bit faster.

[quote quote=238025]What came to my mind just right now – do you run this code locally on the computer where the logs files are or remote on a networkshare? What version of Powershell do you use? I noticed a distinct difference in performance between v5.1 and the current v7.0.2.
[/quote]

V5.1, I will test with the new version, logs would be local. Speed is requirement as it would be ran manually for multiple sets of data, on logs that are not always accessible.

 

 

The majority of the time is spent reading and grouping the content, which of course requires all the data to be available. I had fun playing with this and I thought I’d share the code I used to create the test files as well as a class for filtering/sorting the group data. I’m not sure if it’s any faster than my previous suggestion, just yet another approach. You could also take each file and run them in background jobs/runspaces and then group on the collection of their output. Like Olaf said though, if it’s running on a schedule would it matter if it took some time? I was getting about 5 minutes for 4 x 11MB files with my previous suggestion. With the following code I got 48 seconds for the 4 x 5.75MB files it creates.

Create the test files

function get-randomdate {
    $num = Get-Random(-3..-365)
    (get-date).AddDays($num).AddHours($num).AddMinutes($num).AddSeconds($num).ToString('yyyy-MM-dd HH:mm:ss')
}

function get-randomnum {
    Get-Random(3000..15000)
}

function get-randomemail{
    "Emailaddress$(get-random(1..12))@domain.com"
}

function generateline{
@"
$(get-randomdate) [$(get-randomnum)] INFO DisplayName=$(get-randomemail), ExAddress=, SmtpAddress=Emailaddress1@domain.com
"@
}

foreach($num in 1..4)
{
    1..50000|foreach{
        generateline
    }| Set-Content "c:\temp\Sample$num.csv"
}

Sorting class

class SortGroup
{
    [string]$FirstOccurrence
    [string]$LastOccurrence
    [string]$DisplayName
    [int]$Count

    [string]Sort([object]$subgroup)
    {
        $newest,$oldest = $subgroup | sort datetime | select -Last 1 -First 1
        $this.FirstOccurrence = $newest.datetime,$newest.id
        $this.LastOccurrence = $oldest.datetime,$oldest.id
        Return $this
    }

    SortGroup([object]$group)
    {
        $this.Count = $group.count
        $this.DisplayName = $Group.name
        $this.Sort($group.group)
    }
}

Read data in, group, and sort/arrange/output.

get-content c:\temp\sample*.csv -raw |
    ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
            Select-Object -Property @{
                Name = 'DateTime'
                Expression = { '{0} {1}' -f $_.Date,$_.Time }
                }, ID, Info,
                @{
                Name = 'DisplayName'
                Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                },
                @{
                Name = 'ExAddress'
                Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                },
                @{
                Name = 'SmtpAddress'
                Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                }  | Group-Object -Property displayname | Foreach{[SortGroup]::new($_)}

I must say I am really impressed with classes in powershell. This code would process the sample files I made in 1.1 - 2 minutes. The original suggestion I provided took 16 minutes and my last suggestion took ~ 6 minutes. This is pretty significant improvement if you ask me. There is still more speed you can gain by running these in parallel/background jobs if needed but this is very reasonable.

Class LogParser {

    static [object]ProcessFile($file)
    {
        return [system.io.file]::ReadAllText($file) |
            ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
                foreach{[FormatLine]::new($_)}
    }
}

Class FormatLine {

    [datetime]$Datetime
    [string]$Displayname
    [string]$ExAddress
    [string]$SmtpAddress
    [string]$ID

    FormatLine($line)
    {
        $this.Datetime = '{0} {1}' -f $line.Date,$line.Time
        $this.ID = $line.id
        $this.Displayname = ($line.DisplayName -split '=')[1].trim(',')
        $this.ExAddress = ($line.ExAddress -split '=')[1].trim(',')
        $this.SmtpAddress = ($line.SmtpAddress -split '=')[1].trim(',')
    }
}

class SortGroup
{
    [string]$FirstOccurrence
    [string]$LastOccurrence
    [string]$DisplayName
    [int]$Count

    [string]Sort([object]$subgroup)
    {
        $newest,$oldest = $subgroup | sort datetime | select -Last 1 -First 1
        $this.FirstOccurrence = $newest.datetime,$newest.id
        $this.LastOccurrence = $oldest.datetime,$oldest.id
        Return $this
    }

    SortGroup([object]$group)
    {
        $this.Count = $group.count
        $this.DisplayName = $Group.name
        $this.Sort($group.group)
    }
}

Get-ChildItem -Path C:\Temp\Sample*.csv | %{ [LogParser]::ProcessFile($_) } | 
    Group-Object -Property displayname | %{ [SortGroup]::new($_) }

Just for the record I also tested logparser class with get-content -raw and it really was about the same as readalltext()

Wow nice work Doug huge speed improvement. Ive honestly never used classes at all I’ll need to read up on this to use in future.

Something kinda odd if there are any lines that don’t match the data structure outlined , the script it fails to run. I had 2 entries that my parsed down log missed.

Example: 2019-07-17 14:40:13 [4303] INFO DisplayName=Emailaddress4@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com, Type=Sender
foreach : You cannot call a method on a null-valued expression.
At line:8 char:17
+ foreach{[FormatLine]::new($_)}
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [ForEach-Object], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull,Microsoft.PowerShell.Commands.ForEachObjectCommand

 

 

I would check for spaces at the beginning of the line. Any spaces there caused errors for me.