Count Number of Lines Between String Matches

Hi,

I’m trying to count the number of lines after matching a string. The output is like this:

Schedule 1
Task 1
Task 2
Schedule 2
Task 1
Task 2
Task 3
Schedule 3
Task 1
Schedule 4
Task 1
Task 2

I’m searching for the string “schedule” and once found, count the number of lines after it and before the next match. End result would something like this:

Name Count
“Schedule 1” 2
“Schedule 2” 3
“Schedule 3” 1
“Schedule 4” 2

That way, I can sort the above array by Count and select the schedule with the lowest number. Any pointers in the right direction would be much appreciated.

I’ve cobbled something up but its butt ugly:

$Lines = <from Output>

$Report = @()
$RepName = @()
$RepCount = @()
$Count = 0

ForEach ($Line in $Lines)
{
    If ($Line -match "Schedule")
    {
        $RepName += New-Object PSObject -Property @{
            Name = $Line
        }
        $RepCount += New-Object PSObject -Property @{
            Count = $Count
        }
        $Count = 0
    }
    Else
    {
        $Count++
    }
}

# Appends the final $Count value
$RepCount = $RepCount += New-Object PSObject -Property @{
    Count = $Count
}

# Skips the first $Count value
$RepCount = $RepCount | Select-Object -Skip 1

# Merge both arrays
For ($i = 0; $i -lt $RepName.count; $i++)
{
    $Report += New-Object PSObject -Property @{
        Name = $RepName[$i].Name
        Count = $RepCount[$i].Count
    }
}

# Sort ascending by Count
$ReportSorted = $Report | Sort-Object Count

# Select Schedule with lowest count
$ReportSorted[0].Name

Seems to work but its quite an eye sore. Hoping for a better alternative.

Thanks.

There are going to be several ways to do it. If you can fit all the text in memory I would suggest this approach.

You bring all the content into memory as a single string with the -Raw parameter. I’ve commented the code to explain each step.

$content = Get-Content -Path \path\to\content.txt -Raw

# split on a non capturing match for literal string 'Schedule' sending each section as a single line of text
$content -split '(?=Schedule)' | ForEach-Object {
    # split the single line at the newline/carriage return turning it into an array of lines
    foreach($line in $_ -split '\r?\n'){
        # if the current line matches 'schedule<space><0 or more numeric digits>' create the object
        if($line -match '(Schedule \d*)'){
            $current = [PSCustomObject]@{
                Name  = $matches.1
                Count = 0
            }
        }
        # otherwise if the line isn't just an empty line, increment the count
        elseif($line -match '\w.+'){
            $current.Count++
        }
    }

    # implicit output since we done with this section
    $current

    # clear variable for next chunk
    $current = $null
}

Output is

Name       Count
----       -----
Schedule 1     2
Schedule 2     3
Schedule 3     1
Schedule 4     2

to capture it to a variable just put the variable assignment in front of the foreach loop

$content = Get-Content -Path \path\to\content.txt -Raw

# split on a non capturing match for literal string 'Schedule' sending each section as a single line of text
$output = $content -split '(?=Schedule)' | ForEach-Object {
    # split the single line at the newline/carriage return turning it into an array of lines
    foreach($line in $_ -split '\r?\n'){
        # if the current line matches 'schedule<space><0 or more numeric digits>' create the object
        if($line -match '(Schedule \d*)'){
            $current = [PSCustomObject]@{
                Name  = $matches.1
                Count = 0
            }
        }
        # otherwise if the line isn't just an empty line, increment the count
        elseif($line -match '\w.+'){
            $current.Count++
        }
    }

    # implicit output since we done with this section
    $current

    # clear variable for next chunk
    $current = $null
}

# implicit output of the collected objects
$output
1 Like

Very nice. Never thought about splitting into chunks. I tweaked the “match” criteria to suit my needs and your script works a treat.

As my regex knowledge is rudimentary at best, could I trouble you to help me understand the following:

  • ‘\r?\n’ - Whats the question mark for? I tried without it and it work just as well to cater for the carriage return and new line.

  • ‘\w.+’ - Whats the dot for?

Thanks for the help, much appreciated.

If that’s not the best chance to built some knowledge …

https://www.regular-expressions.info/refquick.html

2 Likes

The question mark right after a pattern/literal makes it optional. This is the best approach tha I know of to match both new line and carriage return, as well as both.

The question mark as part of (?=…) is a look ahead pattern.

The dot means march any character, combined with + it means match one or more of any character. So it’s a letter \w plus one or more of any character .+

@Olaf - bookmarked, thanks.

@krzydoug - again, much appreciated.

Cheers.

I also recommend to check out https://regex101.com
its nice site to construct and undrestand Regex :slight_smile: