Powershell: why in one place it keeps all the initial data, in another place it changes completely?

hello. In my previous article, @matt-bloomfield helped me to make a parsing in a html file. In that code (below), I wanted to copy the content of the tag <link rel="canonical to other tags, such as <meta property="og:url" and tag @id": "

$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"
 
<# Replace canonical tag with <meta property="og:url"              #> 

Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
   $content = Get-Content -Path $_.FullName -Raw
   $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
   $content = $content -replace '(?<=<meta property="og:url" content=").+(?="/>)',$replaceValue
    Set-Content -Path $resultsdir\$($_.name) $content
	
   $content = $content -replace '(?<="@id": ").+(?=")',$replaceValue
    Set-Content -Path $resultsdir\$($_.name) $content
	}

The code Works fine. But there is a problem. As you can see in those 2 html pages below (I put a link on those), in the Short version it replace only the tags I want, and nothing else changes on html. Super !

**But, in the Complet version", is doubles some lines, other lines are deleted, etc. The same Powershell code makes a mess. Why is that? I need to change only the html tags I want, and not to modify any other lines on the file.

Short version:

https://pastebin.com/2BBSn830

Complete code:

https://pastebin.com/qLzSZyS8

Have you tried running Set-Content just once, at the end?

hello. I don’t understand, l already have Set-Content -Path $resultsdir\$($_.name) $content at the end of the code

I was just wondering if having Set-Content in your script twice was messing up your output.

It’s not clear from the links you posted whether that’s the input or the output. It would help greatly if you could show what the input is, what the expected output should be, and the actual output that you’re getting.

While regular expressions can be fine for simple HTML documents. It’s not suitable for parsing more complex HTML.

@matt-bloomfield figured it out. To make a parsing in Powershell, with regex, must use the same “start” and the same “stop”. And you must pay attention of the regex formula:

$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"

Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
    $content = Get-Content -Path $_.FullName -Raw
    $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").*(")').Matches.Value
    $content = $content -replace '(?<=<meta property="og:url" content=").*(")',$replaceValue
    $content = $content -replace '(?<="@id": ").*(")',$replaceValue
    Set-Content -Path $resultsdir\$($_.name) $content
}