Trying to match regex between HTML strings

Hi folks. I’m trying to match everything between the following two patterns

first pattern - <div id="ctl00_PlaceHolderMain_Content_label" style='display:none'

second pattern - </em></p></div>

I’ve tried the following in Powershell, but not getting any return. What am i doing wrong?

$file = c:\folder\HTMLoutput.txt

{[Regex]::Matches($file, "(?div id=""ctl00_PlaceHolderMain_Content_label"" style=''display:none'')((.|\n)*?)(?</em></p></div>)")}

Please go back and edit your post to correct the formatting of your code. Without that the forum software tries to interpret it and removes some characters.

Simply place your cursor on an empty line, click the preformatted text button ( </> ) and then paste your code.

Thanks in advance

Here it is properly formatted, thank you Olaf.

$file = c:\folder\HTMLoutput.txt

{[Regex]::Matches($file, "(?div id=""ctl00_PlaceHolderMain_Content_label"" style=''display:none'')((.|\n)*?)(?</em></p></div>)")} 

Please edit your existing post (the first one) and format ALL code as code.

You may post some sample data from your “input” file as well.

Edited first post as well. I can’t post sample data (work policy), but the instance counts for both patterns occur only 1x in the entire .txt file. Thanks

Hmmm … I don’t know why you again only format just one line of your code and I don’t know how to help you without seeing the input file.
I’d try it with Select-String and providing 2 separate patterns. This way you’ll get the lines of both matches and you can extract all lines between those lines.
Regardless of that I’d recommend to use the -match operator instead of the dot net method. And you could use the method [regex]::Escape() to escape all potentially tricky characters in your patterns.

Edit:

Another tip to develop regex patterns is to use

There you can edit and try this and try that and tweak your patterns until it works as expected.

1 Like

I agree with Olaf, it would help to see some sample data (just make some up in the same format, you don’t need to share anything work-related) to make the regex match the pattern. You could do it with capture groups, I suppose:

$tempFile = New-TemporaryFile

$html = @"
<div id="ctl00_PlaceHolderMain_Content_label" style='display:none'asdfsjklfasfasldkf
sjdfaskjldf
asjkfnkjasndfjk<//.asfsadfas></em></p></div>
"@

Add-Content -Path $tempFile -Value $html

$content = Get-Content $tempFile

[regex]::Match($content,"(<div id=`"ctl00_PlaceHolderMain_Content_label`" style='display:none')(.*)(<\/em><\/p><\/div>)").Groups[2].Value

Output:

asdfsjklfasfasldkf sjdfaskjldf asjkfnkjasndfjk<//.asfsadfas>

Couldn’t get it working with the -match operator so I opted for the .Net method.

1 Like

Thank you so much Matt Bloomfield. You’re a life saver!

Thanks Olaf for the help, much appreciated.