hello. In my previous article, @matt-bloomfield helped me to make a parsing in a html file. In that code (below), I wanted to copy the content of the tag <link rel="canonical
to other tags, such as <meta property="og:url"
and tag @id": "
$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"
<# Replace canonical tag with <meta property="og:url" #>
Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
$replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
$content = $content -replace '(?<=<meta property="og:url" content=").+(?="/>)',$replaceValue
Set-Content -Path $resultsdir\$($_.name) $content
$content = $content -replace '(?<="@id": ").+(?=")',$replaceValue
Set-Content -Path $resultsdir\$($_.name) $content
}
The code Works fine. But there is a problem. As you can see in those 2 html pages below (I put a link on those), in the Short version it replace only the tags I want, and nothing else changes on html. Super !
**But, in the Complet version", is doubles some lines, other lines are deleted, etc. The same Powershell code makes a mess. Why is that? I need to change only the html tags I want, and not to modify any other lines on the file.
Short version:
https://pastebin.com/2BBSn830
Complete code:
https://pastebin.com/qLzSZyS8