PowerShell - Copy strings from some html tags to another html tags

I have this

<ul id="myNavigation">
    <li><a href="https://my-website.com/page-1.html" title="Page 1">Page 1 (34)</a></li>
</ul>

I must copy and replace the LINK, TITLE and NUMBER to this new html code:

<div class="categories-name">
   <a href="https://my-website.com/page-66.html" title="Page 66">
   <p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 66 <span>27</span> </p>
  </a>
</div>

I wonder if something like this can be done in powershell

The output should be:

<div class="categories-name">
   <a href="https://my-website.com/page-1.html" title="Page 1">
   <p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 1 <span>34</span> </p>
  </a>
</div>

Probably it is. But this is not a free script shop. We do not write ready to use Powershell code on request. You will have to write the actual code yourself. If you get stuck on a certain point we’re probably able to help you further.

Regardless of that: Have you tried to search for a solution? There are thousands of examples for tasks like this. Either here or on

or here
www.PowershellGallery.com

1 Like

Indeed, sir @Olaf . But any help that someone offers, helps others through internet. And at a certain moment you were helped without asking for anything. It’s called progress, and we all need to contribute here.

And just a suggestion as it may be confusing to others, you mention:

" I copy and replace the LINK , TITLE and NUMBER to this new css:"

The code you show is not CSS format, it is HTML.

yes, sorry. I reformulated I must copy and replace the LINK,TITLEandNUMBER on this new html code:

Have you at least tried to search for a solution first before you came here to ask for help?

Here you have some inspiration to get you started:

or … this one : StackOverflow Powershell replace string html

$file1 = 'D:\temp\file1.html'
$file2='D:\temp\file2.html'
$result = 'D:\temp\result.html'
Get-Content -Path $file1|ForEach-Object{
if($_ -match '(?<=href=").+?(?=")'){$link = $Matches.Values}
if($_ -match '(?<=title=").+?(?=")'){$title = $Matches.Values}
if($_ -match '(?<=\()\d+(?=\))'){$number = $Matches.Values}
}
$content = Get-Content -Path $file2
$content | ForEach-Object{
if($_ -match '(?<=href=").+?(?=")'){$link2 = $Matches.Values}
if($_ -match '(?<=title=").+?(?=")'){$title2 = $Matches.Values}
if($_ -match '(?<=<span>)\d+(?=</span>)'){$number2 = $Matches.Values}
}
$content -replace $link2, $link -replace $title2, $title -replace $number2, $number | Out-File -FilePath $result

A friend of mine make for me this script, it is very good. But, to be a complete answer, I have to consider another aspect.

A friend of mine make for me this script, it is very good. But, to be a complete answer, I have to consider another aspect.

In the case there are more lines, the same number of lines, the same structure, except the strings that needs to be parsing. I believe a Loop is need it !

<ul id="myNavigation">
<li><a href="https://my-website.com/page-1.html" title="Page 1">Page 1 (34)</a></li>
<li><a href="https://my-website.com/page-2.html" title="Page 2">Page 2 (29)</a></li>
<li><a href="https://my-website.com/page-3.html" title="Page-3">Page 3 (11)</a></li>
....
<li><a href="https://my-website.com/page-40.html" title="Page-4">Page 4 (54)</a></li>

</ul>

AND THE SECOND PART:

<div class="categories-name">
<a href="https://my-website.com/page-66.html" title="Page 66">
<p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 66 <span>27</span> </p>
</a>
</div>
<div class="categories-name">
<a href="https://my-website.com/page-67.html" title="Page 67">
<p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 67 <span>24</span> </p>
</a>
</div>
<div class="categories-name">
<a href="https://my-website.com/page-68.html" title="Page 68">
<p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 68 <span>07</span> </p>
</a>
</div>
.....
<div class="categories-name">
<a href="https://my-website.com/page-100.html" title="Page 100">
<p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 100 <span>67</span> </p>
</a>
</div>

Maybe someone can modify that script above, as to work for more lines as in this example.

Do you actually do things by yourself sometimes? :thinking:

What do you mean? You already use 2 loops in this code?

And regardless of that - please format your code as code. Just like you did it with the html code.
Thanks.

yes, I did the regex. Because I only know html and regex. I just discovered powershell, I didn’t have the time to learn it.

does anyone know how to solve the second part of my problem? It is not necessarily about helping me, but it’s about the thousands of visitors who will also use this information. It’s about progress.

You’ve still not asked a clear and specific question. What is your problem?

the problem is one. The powershell code I put it above, replace only one line from one part to the second part. For example this line from File1.html

<li><a href="https://my-website.com/page-1.html" title="Page 1">Page 1 (34)</a></li>

after running my powershell code, become this. So all values from this line on File1.html are exported to this <div> tag from the File2.html

<div class="categories-name">
<a href="https://my-website.com/page-1.html" title="Page 1">
<p class="font-16 color-grey text-capitalize"><i class="fa fa-angle-right font-14 color-blue mr-1"></i> Page 1 <span>34</span> </p>
</a>
</div>

The problem is that I have 40 lines in the File1.html and 40 lines in File2.html, with the same structure, only the values are different. And my powershell can only parsing one line from file1.html to the file2.html

And I need to make the replacement/parsing to all lines, not just one.

Hmmm … I asked you to format your code as code. :wink:

OK, I see what you mean. Actually you’re almost there. You just need a nested loop to process your file2 content for each line of the file1 content … like this:

$file1 = 'D:\temp\file1.html'
$file2 = 'D:\temp\file2.html'
$result = 'D:\temp\result.html'
$content = Get-Content -Path $file2

Get-Content -Path $file1 | ForEach-Object {
    if ($_ -match '(?<=href=").+?(?=")') { $link = $Matches.Values }
    if ($_ -match '(?<=title=").+?(?=")') { $title = $Matches.Values }
    if ($_ -match '(?<=\()\d+(?=\))') { $number = $Matches.Values }
    $content | ForEach-Object {
        if ($_ -match '(?<=href=").+?(?=")') { $link2 = $Matches.Values }
        if ($_ -match '(?<=title=").+?(?=")') { $title2 = $Matches.Values }
        if ($_ -match '(?<=<span>)\d+(?=</span>)') { $number2 = $Matches.Values }
    }
    $content = $content -replace $link2, $link -replace $title2, $title -replace $number2, $number
}
$content | Out-File -FilePath $result

That should do it actually.

1 Like

I think it is absolutely impossible to understand the actual task from reading your initial post. :face_with_raised_eyebrow:
What you actually want if to replace every occurance of a string or actually of a combination of strings to replace with the according combination of strings from another file with the according index of the occurance.
So replace occurance 1 in file 1 replace with occurance 1 in file 2 and so on.

Your code and your regex are insufficient for this purpose.

At least with the samples you posted it worked with this snippet.

$file1 = 'D:\temp\file1.html'
$file2 = 'D:\temp\file2.html'
$result = 'D:\temp\result.html'

$NewContentList = 
Get-Content -Path $file1 |
ForEach-Object {
    If ($_ -match '(?<=href=")(?<URI>.+)(?="\s+title).*(?<=Page.+">)(?<TitleSpan>Page.+)(?=</a>)' ) {
        $URI = $Matches.URI
        $TitleSpan = $Matches.TitleSpan 
        $TitleSpan -match '(?<Title>Page\s+\d+)(?=\s+)' | Out-Null
        $Title = $Matches.Title
        $TitleSpan -match '(?<=\()(?<Span>\d+)(?=\))' | Out-Null
        $Span = $Matches.Span 
        [PSCustomObject]@{
            URI   = $URI
            Title = $Title
            Span  = $Span
        }
    }
}

$File2Content = Get-Content -Path $file2 -Raw 
$MatchGroups = ([regex]'<div((.|\n|\r)+?)<\/div').Matches($File2Content)
$NewContentListIndex = 0

foreach ($Group in $MatchGroups) {
    $GroupValueBefore = $Group.Value
    $GroupValueAfter = $GroupValueBefore -replace '(?<=href=").+(?="\s+title)', $($NewContentList[$NewContentListIndex].URI) -replace 'Page\s+\d+(?="|\n|\r)', $($NewContentList[$NewContentListIndex].title) -replace '(?<=<span>)\d+(?=</span>)', $($NewContentList[$NewContentListIndex].Span)
    $File2Content = $File2Content.Replace($GroupValueBefore, $GroupValueAfter)
    $NewContentListIndex++
}

$File2Content | Out-File -FilePath $result

I think it would be more reliable and more rubust to provide the new content with a CSV file instead of cutting strings from another html file. Then you could omit the first part of the script where I create the $NewContentList.

so, @Olaf do you remember what you ask me at the beginning at the first reply?

“Regardless of that: Have you tried to search for a solution? There are thousands of examples for tasks like this. Either here or on…”

Ok, so you know PowerShell, you tried 3 solution, and you still did not resolve the problem. I want you to understand one thing. Internet is for helping people. I will help you, for free, the same I was helped.

I made the same request on other forum, they help me without ask me nothing. Maybe this solution will help you to understand much better the PowerShell.

$file1 = 'D:\temp\file1.html'
$file2 = 'D:\temp\file2.html'
$result = 'D:\temp\result.html'
$link=@()
$title=@()
$number=@()
Get-Content -Path $file1 -Delimiter '</li>'|ForEach-Object{
$_|ForEach-Object{
if($_ -match '(?<=href=").+?(?=")'){$link += $Matches.Values}
if($_ -match '(?<=title=").+?(?=")'){$title += $Matches.Values}
if($_ -match '(?<=\()\d+(?=\))'){$number += $Matches.Values}
}
}
$content = Get-Content -Path $file2 -Delimiter '</div>'
for($i=0;$i -lt $content.Count;$i++){
$content[$i] | ForEach-Object{
if($_ -match '(?<=href=").+?(?=")'){$link2 = $Matches.Values}
if($_ -match '(?<=title=").+?(?=")'){$title2 = $Matches.Values}
if($_ -match '(?<=<span>)\d+(?=</span>)'){$number2 = $Matches.Values}
}
$content[$i] -replace $link2, $link[$i] -replace $title2, $title[$i] -replace $number2, $number[$i] | Out-File -FilePath $result -Append
}

Source code: PowerShell - Copy strings from some html tags to another html tags - Microsoft Q&A

You are a

… and you managed it to be on my very short list.

Of course the internet offers a lot of help and most of it is free of charge. That does not mean that all people willing to help you do not expect you to do your homework. And … if you depend on others you should at least try to make them not do more work than needed and you should follow some general rules.
To avoid people willing to help you make their work twice or more you should at least post the links to the crosspostings you create.

With the sample code you provided in your first post my solution works. If it does not you do something wrong.

2 Likes

your last code, was almost good. Almost.

I’ll repeat myself one last time:

yes, thank you. Sorry. I also focused on the second case. Indeed. Works.

can you make it work for the second case, the last code you write? You were so close. Only one value has not changed.

Let me try to make this very clear: the purpose of this forum is to teach you to do your own work, not to get someone else to do your work for you.

No one here has any responsibility to provide a complete working solution for your specific problem. If another user does offer you even a partially working solution, you should be grateful and gracious.

2 Likes