Regex for extracting Data

Hello friends,

I am trying to write a script in powershell to capture some eventlog and extract the no.of times a domain got recycled.

Sample eventlog :

https://gist.github.com/anonymous/09354b3595baa85532c89fc47ccfebfc

I have kept this log in a Variable say $Data.
Now I am looking for output something as below by parsing $Data variable :

Domain Name No.of Times recycled


a.org 2
b.com 1
c.co.in 3
d.com 1
s.org 1
se.com 1
b.ac.in 1

Kindly provide your suggestion on how this can be achieved.

Here’s a lazy attempt.

$counts = @{}

echo a.org,b.com,c.co.in,d.com,s.org,se.com,b.ac.in | 
foreach { 
  $counts[$_] = (select-string $_ log).count
}
    
$counts


Name                           Value
----                           -----
s.org                          1
a.org                          2
b.com                          1
c.co.in                        3
se.com                         1
d.com                          1
b.ac.in                        1

Thanks for the suggestion, Actually the above one is just a sample having 7 domain names, but in actual scenario it may have more than 300+ domains, so it will be difficult to pass each domain name.

Kindly suggest.

Can you read the domains from Active Directory?

I was trying to use convertfrom-string, but I can’t get it to work. :frowning:

$template = @'
{Domain*:a.org}
{Domain*:b.com}
'@

$testText = @'
A worker process serving application pool 'a.org(domain)(4.0)(pool)' has requested a recycle because it reached its private bytes memory limit.
A worker process serving application pool 'b.com(domain)(2.0)(pool)' has requested a recycle because it reached its private bytes memory limit.
'@

$testText | convertfrom-string -templatecontent $template 


Domain
------
A worker process serving application pool 'a.org(domain)(4.0)(pool)' has requested a recycle because it reached its private bytes memory limit.
A worker process serving application pool 'b.com(domain)(2.0)(pool)' has requested a recycle because it reached its private bytes memory limit.

Will Anderson - This domains are actually hosted on IIS, and not all of them will show application pool recycle error. so fetching from AD will not be possible.

Assuming the format is the same:

Class domain{
    $name
    $counted = $false
}
Class recycleCount{
    $name
    $count
}
$log = Get-Content C:\temp\temp.log
$arrayDom = @()
$arrayFinal = @()

$log | %{if($_.contains("has requested a recycle")){
                $start = $_.indexof("'");
                $end = $_.indexof("(");
                $objDom = New-Object domain;
                $objDom.name = $_.Substring($start+1,$end-$start-1);
                $arrayDom += $objDom;
                }
           }
$arrayDom | %{$count = 0;
            if($_.counted -eq $false){
                $name = $_.name;
                $_.counted = $true;
                $count += 1;
                $arrayDom | %{if(($_.name -eq $name) -and ($_.counted -eq $false)){
                            $_.counted = $true;
                            $count+=1;
                            }
                        };
                        $objCnt = New-Object recycleCount;
                        $objCnt.name = $name
                        $objCnt.count = $count
                        $arrayFinal += $objCnt
                }
          }
$arrayFinal

Amar,

Try something like this:

$Data |
Where-Object { $_ -like ‘has requested a recycle’ } |
ForEach-Object { $_.Split( “’(” )[1] } |
Group-Object |
Select-Object -Property Name, Count

Nice! I tried that with -split.

$counts = @{} # associative array
select-string 'has requested a recycle' log |
foreach {
  $org = ($_ -split {$_ -in "'",'(' })[1]
  $counts[$org]++
}
$counts


Name                           Value                                           
----                           -----                                           
b.com                          1                                               
s.org                          1                                               
c.co.in                        3                                               
d.com                          1                                               
b.ac.in                        1                                               
a.org                          2                                               
se.com                         1                                               

If you are not sure how large this log file will be, I recommend using switch statement.

# Match and add each domain name to list
$log = Get-ChildItem .\event.log
$nobj = New-Object System.Collections.ArrayList

switch -regex -File $log {
    "pool '(?'dn'.*)\(domain\).*requested a recycle" 
    {[void]($nobj.add($Matches['dn']))}
}

# Display number of domain name
$nobj | Group-Object | Select-Object Name,Count

What does this part mean? Somehow .* becomes $matches[‘dn’]?

(?'dn'.*)

‘(?‘dn’.*)’ is a named capture group. It will capture the domain name or in this case all text between the words ‘pool’ and ‘domain.’ Once captured, a hashtable ($Matches) is created.

() = text in parentheses will be captured
‘.*’ = any number of characters
?‘dn’ = give the capture group the name ‘dn’

I was reading about capture groups here, but the format used greater than and less than signs (this forum can’t show them) instead of single quotes. https://ss64.com/ps/syntax-regex.html

Here’s a way to convert that $counts hashtable I made to an object:

foreach ( $key in $counts.keys ) {
  echo '' | select @{name='domain';expression={$key}},
    @{name='times';expression={$counts[$key]}} }


domain  times
------  -----
s.org       1
a.org       2
b.com       1
c.co.in     3
se.com      1
d.com       1
b.ac.in     1

They are both valid formats

IE

$inputval = 'abcdefghijklmnopqrstuvwxyz'

$inputval -match "(?.*)e.*v(?'afterV'.*)$"

$Matches

Results:

True

Name                           Value
----                           -----                                                                                                                                                                                                                             
afterV                         wxyz
beforeE                        abcd
0                              abcdefghijklmnopqrstuvwxyz

There is also another format (?P.*), but this format is not supported by .NET, and subsequently not supported in PowerShell.
Ref: http://www.regular-expressions.info/refext.html