POWERSHELL HTML FILE MANIPULATION - REGEX

Hi All,

Below is a part of a html file.


To: amar.helloween@email.com
Subject: AG Sanity Status - Morning Sanity Check On AG Node 1: 08/22/16 06:25
From: Amarnath.Mahato@cgi.com>
Reply-to: Amarnath.mahato@cgi.com
Content-Type: text/html; charset=us-ascii

/
....some content

My task is to remove the content starting from To:amar.helloween… to us-ascii and only keep the

content

Kindly provide a REGEX to remove the above content.

I tried this , but its not working… even the date in the subject used to change daily. so help me with this.

$regex1 = ‘To: amar.helloween@email.com
Subject: AG Sanity Status - Morning Sanity Check On AG Node 1: 08/22/16 06:25
From: Amarnath.Mahato@cgi.com>
Reply-to: Amarnath.mahato@cgi.com
Content-Type: text/html; charset=us-ascii’

$new_html = @()
gc ‘D:\Report.html’ -raw |
foreach {
if ($_ -match $regex1)
{
$new_html += ($_ -replace $regex1,‘’)
$new_html | Out-File “D:\Report1.html”
}
else
{ “did not match”}
}

Here is the link : Powershell HTML File Manipulation · GitHub

It appears you want to match what is between html tags versus excluding what you don’t want:

$test = @"
blah
blah
blah


    
        Some HTML content
    
    
        blah blah blah
    


"@

#http://stackoverflow.com/questions/7167279/regex-select-all-text-between-tags
$pattern = "(.|\n)*?"
[regex]::matches($test,$pattern).Value

Output:

    
        Some HTML content
    
    
        blah blah blah
    

Thanks Rob, that worked!