Powershell: why in one place it keeps all the initial data, in another place it changes completely?

nicu_fantanaru · June 16, 2021, 8:14am

hello. In my previous article, @matt-bloomfield helped me to make a parsing in a html file. In that code (below), I wanted to copy the content of the tag <link rel="canonical to other tags, such as <meta property="og:url" and tag @id": "

$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"
 
<# Replace canonical tag with <meta property="og:url"              #> 

Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
   $content = Get-Content -Path $_.FullName -Raw
   $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
   $content = $content -replace '(?<=<meta property="og:url" content=").+(?="/>)',$replaceValue
    Set-Content -Path $resultsdir\$($_.name) $content
	
   $content = $content -replace '(?<="@id": ").+(?=")',$replaceValue
    Set-Content -Path $resultsdir\$($_.name) $content
	}

The code Works fine. But there is a problem. As you can see in those 2 html pages below (I put a link on those), in the Short version it replace only the tags I want, and nothing else changes on html. Super !

**But, in the Complet version", is doubles some lines, other lines are deleted, etc. The same Powershell code makes a mess. Why is that? I need to change only the html tags I want, and not to modify any other lines on the file.

Short version:

https://pastebin.com/2BBSn830

Complete code:

https://pastebin.com/qLzSZyS8

matt-bloomfield · June 16, 2021, 9:52am

Have you tried running Set-Content just once, at the end?

nicu_fantanaru · June 16, 2021, 10:19am

hello. I don’t understand, l already have Set-Content -Path $resultsdir\$($_.name) $content at the end of the code

matt-bloomfield · June 16, 2021, 10:57am

I was just wondering if having Set-Content in your script twice was messing up your output.

It’s not clear from the links you posted whether that’s the input or the output. It would help greatly if you could show what the input is, what the expected output should be, and the actual output that you’re getting.

While regular expressions can be fine for simple HTML documents. It’s not suitable for parsing more complex HTML.

nicu_fantanaru · June 16, 2021, 4:20pm

@matt-bloomfield figured it out. To make a parsing in Powershell, with regex, must use the same “start” and the same “stop”. And you must pay attention of the regex formula:

$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"

Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
    $content = Get-Content -Path $_.FullName -Raw
    $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").*(")').Matches.Value
    $content = $content -replace '(?<=<meta property="og:url" content=").*(")',$replaceValue
    $content = $content -replace '(?<="@id": ").*(")',$replaceValue
    Set-Content -Path $resultsdir\$($_.name) $content
}

Topic		Replies	Views
PowerShell - Copy strings from some html tags to another html tags PowerShell Help	20	3148	May 16, 2024
Powershell: parsing html links (meta tags) PowerShell Help	6	1217	May 16, 2024
PARSING: Copy lines from several files into a single point of other html files PowerShell Help	11	480	May 16, 2024
Powershell: Copy the content of the tag to another tag PowerShell Help	3	282	May 16, 2024
Change content in HTML File PowerShell Help	5	697	May 16, 2024

Powershell: why in one place it keeps all the initial data, in another place it changes completely?

Related topics