hi, I must to copy the link from top line to the bottom lines, in several html files. Each html files has an unique “canonical” link. For example:
<link rel="canonical" href="https://website.com/en/america.html" />
<html code>
<html code>
<div class="somers"><a href="https://website/darertss.html" class="flags bg" hreflang="bg" title="bk"></a>
<a href="https://website.com/pas-lofet.html" class="flags sk" hreflang="sk" title="sk"></a>
<a href="https://website.com/latinamer.html" class="flags uk" hreflang="uk" title="uk"></a>
<a href="https://website.com/sacrdo.html" class="flags uk" hreflang="uk" title="uk"></a>
The output should be
<div class="somers"><a href="https://website.com/en/america.html" class="flags bg" hreflang="bg" title="bk"></a>
<a href="https://website.com/en/america.html" class="flags sk" hreflang="sk" title="sk"></a>
<a href="https://website.com/en/america.html" class="flags uk" hreflang="uk" title="uk"></a>
<a href="https://website.com/america.html" class="flags uk" hreflang="uk" title="uk"></a>
My powershell code is almost good, but only replaces the first line (the one with <div class…). And I must replace all the lines
$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"
Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
$replacementValue = (Select-String -InputObject $content -Pattern '(?<=<a href=").+(?=</a>)').Matches.Value
$replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
$content.Replace("$replacementValue", "$replaceValue") | Out-File -FilePath $resultsdir\$($_.name)
}
Also, I try to use -AllMatches, but doesn’t work. Can anyone update my code a little bit so as to replace all the lines ?
$sourcedir = "E:\Temp\Folder1\"
$resultsdir = "E:\Temp\Folder2\"
Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
$replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
$content = $content -replace 'https:\/\/.+.html',$replaceValue
Set-Content -Path $resultsdir\$($_.name) $content
}
1 Like
by the way, @matt-bloomfield I’m thinking of a similar case. For example:
<html code>
<html code>
<link rel="canonical" href="https://website.com/en/laptop.html" />
<html code>
<html code>
<meta property="og:url" content="https://website.com/accente-pronunce.html"/>
<html code>
<html code>
"@id": "https://website.com/mom-and-dad.html"
So, I want to parse the canonical link to the other links below, from meta and @id.
I don’ know why is not working my code (I update your code a little bit)
$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"
Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
$replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
$content = $content -replace '(?<=<meta property="og:url" content=").+(?="/>)',$replaceValue
$content = $content -replace '(?<="@id": ").+(?=")',$replaceValue
Set-Content -Path $resultsdir\$($_.name) $content
If you look at my code, I simplified the regular expression to just replace https://<anything>.html, this would work regardless of what else you have between the tags.
1 Like
DONE. Thanks @matt-bloomfield
$sourcedir = "C:\Folder1\"
$resultsdir = "C:\Folder1\"
Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw
$replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").+(?=" />)').Matches.Value
$content = $content -replace '(?<=<meta property="og:url" content=").+(?="/>)',$replaceValue
Set-Content -Path $resultsdir\$($_.name) $content
$content = $content -replace '(?<="@id": ").+(?=")',$replaceValue
Set-Content -Path $resultsdir\$($_.name) $content
}