Regex help

I have a multi lined string that I’m trying to grab a portion of such as:

$body

[html]
whatever
whatever
whatever
[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
...
...
...
[/table]
whatever
[/html]

I’ve tried

$body -match '[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"].*[/table]'

Which just returns false. I imagine it’s only returning one line and not reading until EOF. How can I get it to read everything between [table…[/table]?

edited to remove and replace with [ ]

-match is supposed to return True/False, but it also creates the $matches collection, which is what you’d look at to see what it matched. Whether it matches the first instance or continues to look for additional instances depends on whether your regular expression was written to do that. And honestly, for this purpose, you might find Select-String to be a bit more useful than -match.

But to go further, -match is only designed to tell you if it found a match or not. If you want to capture what it matched, you need to write a capturing (group) subexpression in your regex. That will populate $matches with what it captured. You can even give your capture group a name in your regex, and $matches will use that name, making it easier to reference what it found.

Not sure how to use select-string here to grab and return my match, this below returns false…

Select-String -InputObject $body -simplematch"[table class=`"MsoNormalTable`" border=`"1`" cellspacing=`"0`" cellpadding=`"0`" width=`"900`" style=`"width:675.0pt;border:solid black 1.0pt`"]*[/table]"

Well, a couple of things. -SimpleMatch isn’t a regular expression; it’s just a wildcard match. And, by default, letting you know you have a match is all the cmdlet is supposed to do.

Also, if you delimit your pattern in single quotes, you can use double quotes within and not have to escape them ;).

You should also know a bit about how regular expressions and patterns work. They’re fairly literal - meaning if the attributes in that TABLE tag are in a different order, it won’t match them. I’m assuming you already thought of that, and that the HTML you’re using is consistent. But a -SimpleMatch isn’t intended to capture anything. As I wrote earlier, you need a capturing subexpression in a regex.

That means using -Pattern to specify your pattern. And, instead of "" to match the inside of the TABLE, you’re probably going to want to use something like (+). Keep in mind that * only matches a single character; *+ means match more than one. The (parentheses) create a capturing subexpression. However, that example is a greedy subexpression. That means, if your HTML contains more than one TABLE, it’ll match from the beginning of the first one to the end of the last one, and everything in between. I’m not sure what your HTML looks like, or what your goal is, but you may need to modify it to be a non-greedy subexpression.

You probably want to use the -AllMatches switch, also.

What you’re trying to do is certainly straightforward, I think, but regular expressions aren’t as straightforward as I wish they were ;). It’d be worth some time to read up on capturing subexpressions and greedy vs. non-greedy subexpressions, so you can figure out what the right technique is to meet your goal.

Here is an example of the HTML: http://pastebin.com/MtSa06ue

Basically I just want to grab the pertinent table and analyze the data in it

The table will always start

[table class=`"MsoNormalTable`" border=`"1`" cellspacing=`"0`" cellpadding=`"0`" width=`"900`" style=`"width:675.0pt;border:solid black 1.0pt`"]

I can get it to match on

Select-String -InputObject $body -pattern "[table class=`"MsoNormalTable`" border=`"1`" cellspacing=`"0`" cellpadding=`"0`" width=`"900`" style=`"width:675.0pt;border:solid black 1.0pt`"].*"

But its but I cant get it to return until it hits [/table].

But I’ve wasted more than enough of your time and I’ll do some more research on my own, I’m sure experienced users are saying ‘HE TOLD YOU WHAT TO DO ALREADY!!’ :wink:

Thanks Don!

What do you aim to do with that string? Would it be easier to work with objects?

$web = Invoke-WebRequest -Uri ‘http://www.w3schools.com/html/html_tables.asp
$Web.ParsedHtml.getElementsByTagName(“TABLE”) | select -First 1

I just need to grab the table starting on line 747 and ending on 868.

I thought I could just use regex since it will always start (and should be unique) with:

[table class=“MsoNormalTable” border=“0” cellspacing=“0” cellpadding=“0” width=“900” style=“width:675.0pt”]

and all the text between it to include the [/table].

So at the end I would have the complete [table]…[/table] which I could create reports/alerts for and send in email form

You know, if it’s consistently at those line numbers, it’s easy:

Get-Content filename.html | Select -skip 747 -first 121

:wink:

I wish it was that easy :wink:

The entire HTML will actually be the body of an email that was retrieved through powershell, never makes it to a file. And I’m not sure if it always starts on 747, but the table header should be unique.

If $body is the entire powershell, and I do a

$body -match ‘.*’ it only matches the first line, how would I make it so it makes the entire string?

$body = @'
[html]
whatever
whatever
whatever
[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
...
...
...
[/table]
whatever
[/html]
'@


($body -split 'table class' | ? {$_ -like "=*"}).trimstart('=')

$body = '
[html]
whatever
whatever
whatever
[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
random text 1
[/table]
whatever
[/html]
'
$body -match "table(?'table'.*)\[/table" ; $Matches.table

I guess if you just want that string and not what follows it.

$body -split “`n” | ? {$_ -match ‘table class’}

Random Comandline, when I run your example, it comes back false

Import-Module -Name "C:\Program Files\Microsoft\Exchange\Web Services\2.0\Microsoft.Exchange.WebServices.dll"

$s = New-Object Microsoft.Exchange.WebServices.Data.ExchangeService([Microsoft.Exchange.WebServices.Data.ExchangeVersion]::Exchange2010_SP1)

$s.Credentials = New-Object Microsoft.Exchange.WebServices.Data.WebCredentials('me', 'Password', 'domain')

$s.AutodiscoverUrl('me@domain.com', { $true })

$inbox = [Microsoft.Exchange.WebServices.Data.Folder]::Bind($s, [Microsoft.Exchange.WebServices.Data.WellKnownFolderName]::Inbox)

$emails = $inbox.FindItems(1)

$emails.load()

$emails.body.text |ConvertTo-Html | Select-String -Pattern 'head' -Context 0,3

Make sure you run it in the consolehost not ISE.

From console

>> $body -match "table(?'table'.*)\[/table" ; $Matches.table
False
PS C:\Users\user>

Ok, I took a different approach to this. Not sure why my previous post didn’t work for you, but try this.

$body = '
[html]
whatever
whatever
whatever
[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
random text 1
[/table]
whatever
[/html]
' 
$newbody = ($body -split "\[table")[1] 
($newbody -split "\[/table]")[0]

See if this doesn’t work for you:

$matchstring = '[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]'
$matchstring = [regex]::Escape($matchstring)

$regex = 
'(?ms)\[html\].+?' + $matchstring + '(.+?)\[/table\]'

if ($body -match $regex)
{$lines = $Matches[1].Split("`n")}

$lines

Hey @aaron-miller, this is the deal. Based on the description of your results, it appears that $body is of type System.String[] rather than System.String. Meaning it is an Array of strings, not a single string. RegEx does not process against an array like it would a string. You have two options here.

Note: Below is tested using provided sample input:

$body = @'
[html]
whatever
whatever
whatever
[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
random text 1
[/table]
whatever
[/html]
'@ -split "`n"
  1. If you don’t care about the content being on separate lines, make the body a single string using -join
$body = $body -join ""

($body | Select-String "\[table class=.*\[/table\]").Matches.Value

Results

[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]random text 1[/table]
  1. If you do need to maintain the line uniqueness, loop through the body to find the start and stop of your table, then pull that section.
for ($i=0; $i -lt $body.count; $i++) {
    If ($body[$i] -match "\[table class=|\[/table]") {
        Switch ($body[$i].Substring(0,8)) {
            "[table c" {$tablestart=$i}
            "[/table]" {$tablefinish=$i}
        }
    }
}

$body | Select-Object -Skip $tablestart -First ($tablefinish-$tablestart+1)

Results

[table class="MsoNormalTable" border="1" cellspacing="0" cellpadding="0" width="900" style="width:675.0pt;border:solid black 1.0pt"]
random text 1
[/table]