Count Number of Lines Between String Matches

tlauwk · August 15, 2022, 12:25am

Hi,

I’m trying to count the number of lines after matching a string. The output is like this:

Schedule 1
Task 1
Task 2
Schedule 2
Task 1
Task 2
Task 3
Schedule 3
Task 1
Schedule 4
Task 1
Task 2

I’m searching for the string “schedule” and once found, count the number of lines after it and before the next match. End result would something like this:

Name Count
“Schedule 1” 2
“Schedule 2” 3
“Schedule 3” 1
“Schedule 4” 2

That way, I can sort the above array by Count and select the schedule with the lowest number. Any pointers in the right direction would be much appreciated.

tlauwk · August 15, 2022, 3:47am

I’ve cobbled something up but its butt ugly:

$Lines = <from Output>

$Report = @()
$RepName = @()
$RepCount = @()
$Count = 0

ForEach ($Line in $Lines)
{
    If ($Line -match "Schedule")
    {
        $RepName += New-Object PSObject -Property @{
            Name = $Line
        }
        $RepCount += New-Object PSObject -Property @{
            Count = $Count
        }
        $Count = 0
    }
    Else
    {
        $Count++
    }
}

# Appends the final $Count value
$RepCount = $RepCount += New-Object PSObject -Property @{
    Count = $Count
}

# Skips the first $Count value
$RepCount = $RepCount | Select-Object -Skip 1

# Merge both arrays
For ($i = 0; $i -lt $RepName.count; $i++)
{
    $Report += New-Object PSObject -Property @{
        Name = $RepName[$i].Name
        Count = $RepCount[$i].Count
    }
}

# Sort ascending by Count
$ReportSorted = $Report | Sort-Object Count

# Select Schedule with lowest count
$ReportSorted[0].Name

Seems to work but its quite an eye sore. Hoping for a better alternative.

Thanks.

krzydoug · August 15, 2022, 6:14am

There are going to be several ways to do it. If you can fit all the text in memory I would suggest this approach.

You bring all the content into memory as a single string with the -Raw parameter. I’ve commented the code to explain each step.

$content = Get-Content -Path \path\to\content.txt -Raw

# split on a non capturing match for literal string 'Schedule' sending each section as a single line of text
$content -split '(?=Schedule)' | ForEach-Object {
    # split the single line at the newline/carriage return turning it into an array of lines
    foreach($line in $_ -split '\r?\n'){
        # if the current line matches 'schedule<space><0 or more numeric digits>' create the object
        if($line -match '(Schedule \d*)'){
            $current = [PSCustomObject]@{
                Name  = $matches.1
                Count = 0
            }
        }
        # otherwise if the line isn't just an empty line, increment the count
        elseif($line -match '\w.+'){
            $current.Count++
        }
    }

    # implicit output since we done with this section
    $current

    # clear variable for next chunk
    $current = $null
}

Output is

Name       Count
----       -----
Schedule 1     2
Schedule 2     3
Schedule 3     1
Schedule 4     2

to capture it to a variable just put the variable assignment in front of the foreach loop

$content = Get-Content -Path \path\to\content.txt -Raw

# split on a non capturing match for literal string 'Schedule' sending each section as a single line of text
$output = $content -split '(?=Schedule)' | ForEach-Object {
    # split the single line at the newline/carriage return turning it into an array of lines
    foreach($line in $_ -split '\r?\n'){
        # if the current line matches 'schedule<space><0 or more numeric digits>' create the object
        if($line -match '(Schedule \d*)'){
            $current = [PSCustomObject]@{
                Name  = $matches.1
                Count = 0
            }
        }
        # otherwise if the line isn't just an empty line, increment the count
        elseif($line -match '\w.+'){
            $current.Count++
        }
    }

    # implicit output since we done with this section
    $current

    # clear variable for next chunk
    $current = $null
}

# implicit output of the collected objects
$output

tlauwk · August 16, 2022, 12:20am

Very nice. Never thought about splitting into chunks. I tweaked the “match” criteria to suit my needs and your script works a treat.

As my regex knowledge is rudimentary at best, could I trouble you to help me understand the following:

‘\r?\n’ - Whats the question mark for? I tried without it and it work just as well to cater for the carriage return and new line.
‘\w.+’ - Whats the dot for?

Thanks for the help, much appreciated.

Olaf · August 16, 2022, 12:26am

If that’s not the best chance to built some knowledge …

https://www.regular-expressions.info/refquick.html

krzydoug · August 16, 2022, 2:03am

The question mark right after a pattern/literal makes it optional. This is the best approach tha I know of to match both new line and carriage return, as well as both.

The question mark as part of (?=…) is a look ahead pattern.

The dot means march any character, combined with + it means match one or more of any character. So it’s a letter \w plus one or more of any character .+

tlauwk · August 17, 2022, 12:11am

@Olaf - bookmarked, thanks.

@krzydoug - again, much appreciated.

Cheers.

sureshkrishnan · August 17, 2022, 1:02pm

I also recommend to check out https://regex101.com
its nice site to construct and undrestand Regex

Topic		Replies	Views
Powershell String operation PowerShell Help	18	616	May 16, 2024
Multi-line pattern looking for contained content in file PowerShell Help	23	465	May 16, 2024
extracting lines between two strings - multiple occurrences of strings PowerShell Help	2	768	May 16, 2024
Search text file for 2 items of text at specific positions and return the count PowerShell Help	4	319	May 16, 2024
Select 2 lines from multiline PowerShell Help	6	241	May 16, 2024

Count Number of Lines Between String Matches

Related topics