Need regEx help

OK, I admit, I am NOT good with regEx. Hoping someone here can help. Here is what I have:

$str = '<th>HotFixID</th>'

Here is my current replace code:

$str -Replace '<th>', $("<th onclick=""sortTable(this.cellIndex,'"+$TableName +"')""> <input type=""text"" onfocus=""filterTable(this.parentElement.cellIndex,'"+$TableName+"')"" class=""filterInput"" />")

This almost works. It gives me this:

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')"> <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" />HotFixID</th>

What I need is this:

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">HotFixID <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>

I suspect some RegEX guru can guide me with the proper replace method? Thanks in advance.

Could do it it a couple ways. I’d just do

‘<th>(?=Hotfixid)’,$replacement

But I think you don’t want that slash at the end in your replacement. It’s in the following </th>

Oh wait you’re trying to get rid of Hotfixid?

No sir. I need to keep the HotFixID. My replacement puts it at the end of the <input … > tag, I need it to precede it with one space at the end. I posted what I need it to be … sorry if I am not very clear.

Make sense?

What I need is this:

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">HotFixID <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>

Not sure if this matters but I am basically massaging data returned from ConvertTo-HTML where I need to add a text input and two JavaScript function calls in the TH tags.

Here is what is in the data returned from ConvertTo-HTML

'<th>HotFixID</th>'

Something like this?

$str = '<th>HotFixID</th>'

$leftSide = "<th onclick=""sortTable(this.cellIndex,'HotFixes_HOSTNAME')"">"

$rightSide = " <input type=""text"" onfocus=""filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')"" class=""filterInput"" /></th>"

$str = $str -replace '(<th>)(HotFixID)(</th>)',"$leftSide `$2 $rightSide"

My approach would be:

$str = "<th>HotFixID</th>"

First step take hotfix value out of string:

[regex] $HotfixRegex = "<th>(?<hotfix>.+)</th>"
$HotfixEntry = [regex]::Match($str, $HotfixRegex)

if ($HotfixEntry.Success)
{
       $Hotfix = $HotfixEntry.Groups["hotfix"].Value
       Write-Information "Hotfix is: $Hotfix"
}

Second step, define replacement pattern using same method:

[regex] $ReplacementRegex = "(?<replacement><th>.+)</th>"
$ReplacementEntry = [regex]::Match($str, $ReplacementRegex)

if ($ReplacementEntry.Success)
{
      $ReplacePattern = $ReplacementEntry.Groups["replacement"].Value
      Write-Information "Replacement pattern is: $ReplacePattern"
}

Next step is to define replacement:

$Replacement = $("<th onclick=""sortTable(this.cellIndex,'" + $TableName + "')"">$Hotfix <input type=""text"" onfocus=""filterTable(this.parentElement.cellIndex,'" + $TableName + "')"" class=""filterInput"" />")

And then simply do the replacement:

$str -Replace $ReplacePattern, $Replacement

EDIT:
Or for consistency you could also:

[regex]::Replace($str, $ReplacePattern, $Replacement)

OK, my bad again. Please refrain from getting mad, even though I deserve it :frowning:

I should clarify that the output of ConvertTo-HTML can have multiple TH tags on a single line, hence the reason I was trying to do the replace on the tag itself. Something like this:

<th>HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th>

Needs to produce this:

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">HotFixID <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">Description <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">InstalldBy <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">InstalledOn <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>

One javascript function sorts the table based on the table header selected, the text box does an excel like filter on the table/column data. The result can be a single line, I put on separate lines for readability.

It should still work fine, you’re only replacing the tags and keeping whatever is between:

$str = '<th>HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th>'

$leftSide = "<th onclick=""sortTable(this.cellIndex,'HotFixes_HOSTNAME')"">"

$rightSide = " <input type=""text"" onfocus=""filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')"" class=""filterInput"" /></th>"

$str = $str -replace '(<th>)(HotFixID|Description|InstalledBy|InstalledOn)(</th>)',"$leftSide `$2 $rightSide"

$str

You could probably even use a wildcard for the second match although I didn’t test that. You can add `r`n after $rightSide if you want readability in the ouput.

Did you try my suggestion? It will only replace th that are followed by Hotfixid

If you want to process nested tags like you described here is an implementation:

	$str = "<th>HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th>"
	[regex] $SplitRegex = "<th>(.(?!</th>))+.</th>"
	$Tags = [regex]::Matches($str, $SplitRegex)

	foreach ($Tag in $Tags)
	{
		Write-Information "Processing tag: $($Tag.Value)" -INFA "Continue"

		[regex] $DataRegex = "<th>(?<data>.+)</th>"
		$DataMatch = [regex]::Match($Tag.Value, $DataRegex)

		if ($DataMatch.Success)
		{
			$DataEntry = $DataMatch.Groups["data"].Value
			Write-Information "Tag data is: $DataEntry" -INFA "Continue"
		}

		Write-Information "" -INFA "Continue"
	}

Output:

Processing tag: <th>HotFixID</th>
Tag data is: HotFixID

Processing tag: <th>Description</th>
Tag data is: Description

Processing tag: <th>InstalledBy</th>
Tag data is: InstalledBy

Processing tag: <th>InstalledOn</th>
Tag data is: InstalledOn

Simply insert code from my previous example to perform replacement per tag and store result into a new string.

Krzydoug, I did not try yours yet … will give that a wack and report back.

Matt, I did try yours, but the values you show are static. That will work and did work for the one single query. This replace method is in a function that massages the contents of many different tables, and each table will have different table headers.

So basically, that is the reason I am trying to just massage the TH tags. If someone can give me the RegEX to grab all the entries between the <TH></TH> tags, I can take it from there.

So, if I have this:

'<th>value1</th><th>value2</th><th>value3</th><th>value4</th>'

I need an array that contains:

@('value1', 'value2', 'value3', 'value4')

I can take it from there … and thanks everyone for the very responsive help. Very much appreciated. This seems to work b returns an array with a count of 37, lots of blank lines.

$str.Split("<th>(.*?)</th>")

OK, so much for regex … found this …

If regex is not solution to your problem, then what is?

I’m not saying stackoverflow answers are wrong, but keep in mind you are dealing with powershell not C++
powershell is limited in many ways, but do you really want to write your own powershell compatible HTML parser to solve a very simple problem?

It’s very simple because you’re dealing with <th></th> only, unless there is something more you want to do that goes beyond this tag only?

OK Metablaster, my intent is not to write my own parser. I have just not found a solution yet to what I want to do. You say it is a “very simple problem”, but there are yet no answers that work? Am I missing something?

I admit, I may not be describing my issue all that well. I just need a one line string that contains DIFFERENT values between the TH tags to go from this:

'<th>value1</th><th>value2</th><th>value3</th><th>value4</th>'

To this:

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value1 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value2 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value3 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value4 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>

The table header line can also have a different number of headers, anywhere from 3 to 7 or 8. I simply need to inject my function calls in the table header and have the header title precede the input box. The replace string that I have now puts the header title AFTER the input box, which I dont prefer.

Make sense? You have a solution to this “very simple problem”, then I am all ears and will be forever in your debt :slight_smile:

The reason I tend to believe the SO post is this. Trying to do a simple regex on the string produces odd results:

PS C:\> $str = '<th>value1</th><th>value2</th><th>value3</th><th>value4</th>'
PS C:\> $str.Split("<th>(.*?)</th>") | foreach{$_.getType()}

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object


PS C:\>

Now, if there was a way to embed JavaScript in my PS, I could use getElementByTagName and parse the innerhtml and be done with it.

I have not tried your suggestion

No answers that work

These two statements conflict. And now I see you edit both before and after the value.

You said:

'<th>value1</th><th>value2</th><th>value3</th><th>value4</th>'

To this:

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value1 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value2 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value3 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">value4 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>

The following code does exactly that: (combined version of my previous 2 posts)

	$str = '<th>value1</th><th>value2</th><th>value3</th><th>value4</th>'

	[regex] $DataRegex = "<th>(?<data>.+)</th>"
	[regex] $SplitRegex = "<th>(.(?!</th>))+.</th>"
	[regex] $ReplacementRegex = "(?<replacement><th>.+)</th>"

	foreach ($Tag in [regex]::Matches($str, $SplitRegex))
	{
		$ReplacementEntry = [regex]::Match($Tag.Value, $ReplacementRegex)
		$DataMatch = [regex]::Match($Tag.Value, $DataRegex)
		$DataEntry = $DataMatch.Groups["data"].Value

		$ReplacePattern = $ReplacementEntry.Groups["replacement"].Value
		$Replacement = $("<th onclick=""sortTable(this.cellIndex,'" + $TableName + "')"">$DataEntry <input type=""text"" onfocus=""filterTable(this.parentElement.cellIndex,'" + $TableName + "')"" class=""filterInput"" />")

		[regex]::Replace($Tag.Value, $ReplacePattern, $Replacement)
	}

Output:

<th onclick="sortTable(this.cellIndex,'')">value1 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'')">value2 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'')">value3 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'')" class="filterInput" /></th>
<th onclick="sortTable(this.cellIndex,'')">value4 <input type="text" onfocus="filterTable(this.parentElement.cellIndex,'')" class="filterInput" /></th>

(the only thing missing above is $TableName which is unknown)
Does this work for you as wanted, and if not why not?

Looks good Metablaster … I will give it a wack tomorrow and post the results.

Thanks to everyone that contributed :slight_smile:

Now that you’ve clarified, here is how I’d craft the regex pattern.

$str = '<th>HotFixID</th>'
$TableName = 'HotFixes_HOSTNAME'
$pattern = '<th>(\S+)(?=<)'
$replacement = & {'<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $args[0],'$1'} $TableName

For the replacement you invoke a scriptblock and pass in the table name. I chose args[0] but you could also make a named Param(). If you inspect $replacement you should see

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" />

With the table name and the regex match $1 embedded. Now when you do your replacement

$str -Replace $pattern, $replacement

Your end result should be the desired

<th onclick="sortTable(this.cellIndex,'HotFixes_HOSTNAME')">HotFixID<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'HotFixes_HOSTNAME')" class="filterInput" /></th>

Now let’s take a second and consider could there ever be a space before or after the >Value< such as > Value< ? If so, throw a couple of “ZERO or more” regex space patterns \s* around it

$str = '<th>HotFixID</th>'
$TableName = 'HotFixes_HOSTNAME'
$pattern = '<th>\s*(\S+)\s*(?=<)'
$replacement = & {
    Param($table)

    '<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $table,'$1'

} $TableName

$str -Replace $pattern, $replacement

And if you wanted to do it inline/in one line

$str -replace '<th>\s*(\S+)\s*(?=<)',(& {'<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $args[0],'$1'} $TableName)

Edit

Actually it can be simpler. :stuck_out_tongue:

$str -replace '<th>\s*(\S+)\s*(?=<)',('<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $TableName,'$1')

Thanks guys, both solutions get me almost there.

Doug, your one liner only seems to create the first TH tag, not 4 of them?? Did I miss something?

PS C:\> $TableName = 'TESTING123'
PS C:\> $str = '<tr><th>HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th></tr>'
PS C:\> $str -replace '<th>\s*(\S+)\s*(?=<)',(& {'<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $args[0],'$1'} $TableName)
<tr><th onclick="sortTable(this.cellIndex,'TESTING123')">HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th><input type="text" onfocus="filterTable(this.parentElement.cellIndex,'TESTING123')" class="filterInput" /></tr>
PS C:\>

Very much appreciated :slight_smile:

Try making the capture group match non greedy.

'<th>\s*(\S+?)\s*(?=<)'

the pattern \S+ says match one or more (+) characters that are not a space (\S). This is a greedy match meaning, capture everything you can until you get to the final <. Adding the question mark after makes it non greedy, meaning it will only match as few characters as possible until it runs into < ((?=<))

Looking at your code, I get the feeling the TableName will be dynamic based on the value matched. If that’s the case I’d probably change it some. I’m just spitballing here, this could be completely off base from what you need to accomplish.

Lets say you have a hash table of tablenames with the keys as the 4 different values you’ve shown. This will allow dynamically providing the table based on the matched value. Maybe it can also help.

$TableList = @{
    HotFixID    = 'Hotfixes_Table'
    Description = 'Description_Table'
    InstalledBy = 'InstalledBy_Table'
    InstalledOn = 'InstalledOn_Table'
}

$str = '<tr><th>HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th></tr>'

[regex]$pattern = '(<th>\s*(\S+?)\s*)(?=<)'


$pattern.Matches($str) | ForEach-Object{
    $str = $str -replace $_.groups[1].value,('<th onclick="sortTable(this.cellIndex,''{0}'')">{1}<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $TableList[$_.groups[2].value],$_.groups[2].value)
}

$str

Output

<tr><th onclick="sortTable(this.cellIndex,'Hotfixes_Table')">HotFixID<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'Hotfixes_Table')" class="filterInput" /></th><th onclick="sortTable(this.cellIndex,'Description_Table')">Description<in
put type="text" onfocus="filterTable(this.parentElement.cellIndex,'Description_Table')" class="filterInput" /></th><th onclick="sortTable(this.cellIndex,'InstalledBy_Table')">InstalledBy<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'In
stalledBy_Table')" class="filterInput" /></th><th onclick="sortTable(this.cellIndex,'InstalledOn_Table')">InstalledOn<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'InstalledOn_Table')" class="filterInput" /></th></tr>
2 Likes

KrzyDoug, you are correct. $tableName is passed as an argument to the function. This worked PERFECLTY !!! THANK YOU !!!

$str -replace '<th>\s*(\S+?)\s*(?=<)',(& {'<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $args[0],'$1'} $TableName)
PS C:\> $tablename = 'TESTING1234'
PS C:\> $str = '<tr><th>HotFixID</th><th>Description</th><th>InstalledBy</th><th>InstalledOn</th></tr>'
PS C:\> $str -replace '<th>\s*(\S+?)\s*(?=<)',(& {'<th onclick="sortTable(this.cellIndex,''{0}'')">$1<input type="text" onfocus="filterTable(this.parentElement.cellIndex,''{0}'')" class="filterInput" />' -f $args[0],'$1'} $TableName)
<tr><th onclick="sortTable(this.cellIndex,'TESTING1234')">HotFixID<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'TESTING1234')" class="filterInput" /></th><th onclick="sortTable(this.cellIndex,'TESTING1234')">Description<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'TESTING1234')" class="filterInput" /></th><th onclick="sortTable(this.cellIndex,'TESTING1234')">InstalledBy<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'TESTING1234')" class="filterInput" /></th><th onclick="sortTable(this.cellIndex,'TESTING1234')">InstalledOn<input type="text" onfocus="filterTable(this.parentElement.cellIndex,'TESTING1234')" class="filterInput" /></th></tr>
PS C:\>

Got any links/pointers to learn RegEX? I really do need to learn it.

Kudos to Metablaster as well, very much appreciated.

1 Like