Log parsing for specific error, counting occurences, and show date

Hi All,

I need some help with some log parsing.

  1. (Done)I want to get a count number of times per line an address occurs
  2. (Need Help) I now want to add two columns to the output to show the first and last time the error occurred per user.
This is what the log format looks like

2020-06-04 15:06:53 [12655] INFO DisplayName=Emailaddress1@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com
2020-06-04 15:06:54 [12653] INFO DisplayName=Emailaddress1@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com
2020-06-04 15:06:56 [12651] INFO DisplayName=Emailaddress1@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com
2020-06-04 15:10:14 [8276] INFO DisplayName=Emailaddress2@domain.com, ExAddress=, SmtpAddress=Emailaddress2@domain.com
2020-06-04 15:11:01 [6800] INFO DisplayName=Emailaddress3@domain.com, ExAddress=, SmtpAddress=Emailaddress3@domain.com
2020-06-04 15:16:58 [11340] INFO DisplayName=Emailaddress4@domain.com, ExAddress=, SmtpAddress=Emailaddress4@domain.com
2020-06-04 15:16:59 [11340] INFO DisplayName=Emailaddress5@domain.com, ExAddress=, SmtpAddress=Emailaddress5@domain.com

What Ive written so far

$regexA = "\DisplayName."
select-string -Path .\logfile.log
-Pattern $regexa -AllMatches | % { $.Matches } | % { $.Value } | Group-Object -Verbose -noelement | select name, count | sort-object count -Descending > .\output.txt

 

 

Thanks you any help would be greatly appreciated.

logan, welcome to Powershell.org. Please take a moment and read the very first post on top of the list of this forum: Read Me Before Posting! You’ll be Glad You Did!.

When you post code, error messages, sample data or console output format it as code, please.
In the “Text” view you can use the code tags “PRE”, in the “Visual” view you can use the format template “Preformatted”. You can go back edit your post and fix the formatting - you don’t have to create a new one.
Thanks in advance.

You actually do not need external tools like findstr. You should use native Powershell cmdlets liek Select-String instead like you already did in your second pipeline step. You may (re-)read the help for Select-String including the examples to learn how to use it. You can provide input files for Select-String as well. You may share a few more lines from your log file to play with and your regex pattern as well and a few examples of the desired output, please.

Hi Olaf,

Thank you for your response, I updated my example as you suggested. Thank you for any ideas you may have to add first and last occurrences.

It may be over the top, but your data look like structured data… like CSV data. So you may treat them as such lite this:

$ImportedData = 
Import-Csv -Path D:\sample\sample.log -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
Select-Object -Property @{
    Name       = 'DateTime'
    Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
}, ID, Info,
@{
    Name       = 'DisplayName'
    Expression = { ($_.DisplayName -split '=')[1].trim(',') }
},
@{
    Name       = 'ExAddress'
    Expression = { ($_.ExAddress -split '=')[1].trim(',') }
},
@{
    Name       = 'SmtpAddress'
    Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
}

Now that you have the data in a variable you can much more easily sort, group or count them how you like:

$ImportedData |
Format-Table -AutoSize

Output would look like this:

DateTime            ID      Info DisplayName              ExAddress SmtpAddress
--------            --      ---- -----------              --------- -----------
2020-06-04 15:06:53 [12655] INFO Emailaddress1@domain.com           Emailaddress1@domain.com
2020-06-04 15:06:54 [12653] INFO Emailaddress1@domain.com           Emailaddress1@domain.com       
2020-06-04 15:06:56 [12651] INFO Emailaddress1@domain.com           Emailaddress1@domain.com       
2020-06-04 15:10:14 [8276]  INFO Emailaddress2@domain.com           Emailaddress2@domain.com
2020-06-04 15:11:01 [6800]  INFO Emailaddress3@domain.com           Emailaddress3@domain.com       
2020-06-04 15:16:58 [11340] INFO Emailaddress4@domain.com           Emailaddress4@domain.com       
2020-06-04 15:16:59 [11340] INFO Emailaddress5@domain.com           Emailaddress5@domain.com

I’d recommend to save the output as CSV as well.

Olaf, bravo. I wasn’t aware import-csv was so powerful. This is an excellent answer.

Hi Olaf,

I am trying to grab the first time it occurred, last time, and count for each address. The idea is to first see how big of an issue that user is causing, see when it started, and see if its still happening. I was using group-object to get counts but that requires it to be unique.

Wanted output something like this:

First Occurrence Last Occurrence Display name Count
2020-06-04 15:06:53 [12655] 2020-06-04 15:06:56 [12651] Emailaddress1@domain.com 3
2020-06-04 15:10:14 [8276] 2020-06-04 15:10:14 [8276] Emailaddress3@domain.com 1
2020-06-04 15:11:01 [6800] 2020-06-04 15:11:01 [6800] Emailaddress3@domain.com 1
2020-06-04 15:16:58 [11340] 2020-06-04 15:16:58 [11340] Emailaddress4@domain.com 1
2020-06-04 15:16:59 [11340] 2020-06-04 15:16:59 [11340] Emailaddress5@domain.com 1

Thanks for your help

Olaf has provided your data as an object that you can now filter/group/calculate to your heart’s content.

$ImportedData | Group-Object -Property displayname | 
    Select-Object -Property @{N="First Occurrence"; E={$_.group.datetime | sort | select -First 1}},
                            @{N="Last Occurrence" ; E={$_.group.datetime | sort | select -Last 1}},
                            Name,
                            Count

Output

First Occurrence    Last Occurrence     Name                     Count
----------------    ---------------     ----                     -----
2020-06-04 15:06:53 2020-06-04 15:06:56 Emailaddress1@domain.com     3
2020-06-04 15:10:14 2020-06-04 15:10:14 Emailaddress2@domain.com     1
2020-06-04 15:11:01 2020-06-04 15:11:01 Emailaddress3@domain.com     1
2020-06-04 15:16:58 2020-06-04 15:16:58 Emailaddress4@domain.com     1
2020-06-04 15:16:59 2020-06-04 15:16:59 Emailaddress5@domain.com     1

Got it, thank you both for your time. This is exactly what I was looking for.

I was bored…

New sample data

2020-06-04 15:06:53 [1255] INFO DisplayName=Emailaddress1@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com
2020-06-04 15:06:54 [1653] INFO DisplayName=Emailaddress1@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com
2020-06-04 15:06:56 [12651] INFO DisplayName=Emailaddress1@domain.com, ExAddress=, SmtpAddress=Emailaddress1@domain.com
2020-06-04 15:10:14 [8276] INFO DisplayName=Emailaddress2@domain.com, ExAddress=, SmtpAddress=Emailaddress2@domain.com
2020-06-04 15:11:01 [680] INFO DisplayName=Emailaddress3@domain.com, ExAddress=, SmtpAddress=Emailaddress3@domain.com
2020-06-04 15:16:58 [1340] INFO DisplayName=Emailaddress4@domain.com, ExAddress=, SmtpAddress=Emailaddress4@domain.com
2020-06-04 15:16:59 [11340] INFO DisplayName=Emailaddress5@domain.com, ExAddress=, SmtpAddress=Emailaddress5@domain.com
2020-06-04 15:16:57 [1308] INFO DisplayName=Emailaddress4@domain.com, ExAddress=, SmtpAddress=Emailaddress4@domain.com
2020-06-04 15:16:57 [19971] INFO DisplayName=Emailaddress5@domain.com, ExAddress=, SmtpAddress=Emailaddress5@domain.com
2020-06-04 11:08:01 [6800] INFO DisplayName=Emailaddress3@domain.com, ExAddress=, SmtpAddress=Emailaddress3@domain.com
2020-06-04 19:08:01 [6033] INFO DisplayName=Emailaddress3@domain.com, ExAddress=, SmtpAddress=Emailaddress3@domain.com

New script

$ImportedData = Import-Csv -Path D:\sample\sample.log -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
    Select-Object -Property @{
    Name = 'DateTime'
    Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
    }, ID, Info,
    @{
    Name = 'DisplayName'
    Expression = { ($_.DisplayName -split '=')[1].trim(',') }
    },
    @{
    Name = 'ExAddress'
    Expression = { ($_.ExAddress -split '=')[1].trim(',') }
    },
    @{
    Name = 'SmtpAddress'
    Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
    }

$lookup = $ImportedData | Group-Object -Property displayname -AsHashTable -AsString

$lookup.Keys | foreach {
    $oldest = $lookup[$_].datetime | sort | select -First 1
    $newest = $lookup[$_].datetime | sort | select -Last 1
    $newestID = $lookup[$_] | ? datetime -eq $newest | select -ExpandProperty ID
    $oldestID = $lookup[$_] | ? datetime -eq $oldest | select -ExpandProperty ID

    [pscustomobject]@{
        "First Occurrence" = "{0} {1}" -f $oldest,$oldestID
        "Last Occurrence"  = "{0} {1}" -f $newest,$newestID
        Displayname        = $_
        Count              = $lookup[$_].Count
    }
} | sort -Property count -Descending

New output

First Occurrence            Last Occurrence             Displayname              Count
----------------            ---------------             -----------              -----
2020-06-04 11:08:01 [6800]  2020-06-04 19:08:01 [6033]  Emailaddress3@domain.com     3
2020-06-04 15:06:53 [1255]  2020-06-04 15:06:56 [12651] Emailaddress1@domain.com     3
2020-06-04 15:16:57 [19971] 2020-06-04 15:16:59 [11340] Emailaddress5@domain.com     2
2020-06-04 15:16:57 [1308]  2020-06-04 15:16:58 [1340]  Emailaddress4@domain.com     2
2020-06-04 15:10:14 [8276]  2020-06-04 15:10:14 [8276]  Emailaddress2@domain.com     1

This does work great, however I have an average of 120 MB of log files to scan, it seemed to take about 5 min per 20 MB of logs. Any ideas how to optimize?

I also switch to below to be able to use multiple input files.

$ImportedData = get-content .\sample.log* | ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |

If you remove the sort at the end does that speed it up?

You can provide more than one file at a time for Import-Csv. Either with an array you filled up in advance or directly like this:

Import-Csv -Path (Get-ChildItem -Path .\sample*.log) 

Powershell is quite slow when it comes to filesystem operations. If the running time really matters you could use a file system [System.IO.StreamReader] to speed up the import of the data. But that would be less comfortable than pure Powershell of course. :wink:

Here you have something to start reading: https://stackoverflow.com/questions/44462561/system-io-streamreader-vs-get-content-vs-system-io-file

Here see if this helps.

$lookup = Get-Content .\sample.log* |
    ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
        Select-Object -Property @{
            Name = 'DateTime'
            Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
            }, ID, Info,
            @{
            Name = 'DisplayName'
            Expression = { ($_.DisplayName -split '=')[1].trim(',') }
            },
            @{
            Name = 'ExAddress'
            Expression = { ($_.ExAddress -split '=')[1].trim(',') }
            },
            @{
            Name = 'SmtpAddress'
            Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
            } | Group-Object -Property displayname -AsHashTable -AsString

$lookup.keys | Foreach-Object {
    $rows = $lookup[$_]

    $oldest = $rows.datetime | sort | select -First 1
    $newest = $rows.datetime | sort | select -Last 1
    $newestID = $rows | ? datetime -eq $newest | select -first 1 -ExpandProperty ID
    $oldestID = $rows | ? datetime -eq $oldest | select -first 1 -ExpandProperty ID

    [pscustomobject]@{
        "First Occurrence" = "{0} {1}" -f $oldest,$oldestID
        "Last Occurrence"  = "{0} {1}" -f $newest,$newestID
        Displayname        = $_
        Count              = $rows.Count
    }
} | sort -Property count -Descending

If you can afford to put it all in memory, you can try it this way.

$script = {
    Get-Content .\sample.log* |
        ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
            Select-Object -Property @{
                Name = 'DateTime'
                Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
                }, ID, Info,
                @{
                Name = 'DisplayName'
                Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                },
                @{
                Name = 'ExAddress'
                Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                },
                @{
                Name = 'SmtpAddress'
                Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                } | Group-Object -Property displayname
 }
foreach($group in (& $script))
{
    $oldest = $group.group.datetime | sort | select -First 1
    $newest = $group.group.datetime | sort | select -Last 1
    $newestID = $group.group | ? datetime -eq $newest | select -first 1 -ExpandProperty ID
    $oldestID = $group.group | ? datetime -eq $oldest | select -first 1 -ExpandProperty ID

    [pscustomobject]@{
        "First Occurrence" = "{0} {1}" -f $oldest,$oldestID
        "Last Occurrence"  = "{0} {1}" -f $newest,$newestID
        Displayname        = $group.name
        Count              = $group.Count
    }
}

Take care.

Or this slightly simplified version of the previous

$script = {
    Get-Content .\sample.log* |
        ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
            Select-Object -Property @{
                Name = 'DateTime'
                Expression = { Get-Date ($_.Date, $_.Time -join ' ') -Format 'yyyy-MM-dd HH:mm:ss' }
                }, ID, Info,
                @{
                Name = 'DisplayName'
                Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                },
                @{
                Name = 'ExAddress'
                Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                },
                @{
                Name = 'SmtpAddress'
                Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                } | Group-Object -Property displayname
 }
foreach($group in (& $script))
{
    $oldest = $group.group | sort datetime | select -First 1
    $newest = $group.group | sort datetime | select -Last 1

    [pscustomobject]@{
        "First Occurrence" = "{0} {1}" -f $oldest.datetime,$oldest.id
        "Last Occurrence"  = "{0} {1}" -f $newest.datetime,$newest.id
        Displayname        = $group.name
        Count              = $group.Count
    }
}

This did not seem to work for me, Olaf.

Unfortunately its still too slow for that particular set of logs. It does work for the other another set that I am parsing down the data first.

Checking other forums a common suggestion is to use Microsoft.VisualBasic.FileIO.TextFieldParser, though I lack background in VB to do this. If I find a way to get it working or improve the performance I’ll post an update.

Yeah powershell isn’t the fastest with large text files. May need another tool for faster speeds. The .net methods aren’t much different than get-content -raw. Also, the get-date seemed redundant and was called for each record. This should speed it up some.

$script = {
    Get-Content .\sample.log* -raw |
        ConvertFrom-Csv -Delimiter ' ' -Header Date, Time, ID, Info, DisplayName, ExAddress, SmtpAddress |
            Select-Object -Property @{
                Name = 'DateTime'
                Expression = { '{0} {1}' -f $_.Date,$_.Time }
                }, ID, Info,
                @{
                Name = 'DisplayName'
                Expression = { ($_.DisplayName -split '=')[1].trim(',') }
                },
                @{
                Name = 'ExAddress'
                Expression = { ($_.ExAddress -split '=')[1].trim(',') }
                },
                @{
                Name = 'SmtpAddress'
                Expression = { ($_.SmtpAddress -split '=')[1].trim(',') }
                } | Group-Object -Property displayname
 }
foreach($group in (& $script))
{
    $oldest = $group.group | sort datetime | select -First 1
    $newest = $group.group | sort datetime | select -Last 1

    [pscustomobject]@{
        "First Occurrence" = "{0} {1}" -f $oldest.datetime,$oldest.id
        "Last Occurrence"  = "{0} {1}" -f $newest.datetime,$newest.id
        Displayname        = $group.name
        Count              = $group.Count
    }
}

Hmmm … does performance really matters? In my experience it matters quite less often than you might think. Especially when the task is about to run automatically by schedule in the background. :wink:

If performance matters you should measure it to know where you lack on performance. Otherwise you might optimize an already speedy part of your code. You can use Measure-Command to measure the individual parts of your script.

Please have look at my last answer. Quite often Powershells file system operations consumes a lot of time and there are some ways to circumwent this.

I don’t know what to say. It just works for me as intended. Did you notice - it’s Get-ChildItem inside the parenthesis - not Get-Content!! :wink:

What came to my mind just right now - do you run this code locally on the computer where the logs files are or remote on a networkshare? What version of Powershell do you use? I noticed a distinct difference in performance between v5.1 and the current v7.0.2.