Automate saving multiple webpages?

Hello, first post so apologies if I get anything wrong.

Is it possible to automate

  • open a webpage in Google Chrome (from a list)

  • save that webpage as mhtml

  • close Google Chrome

  • rinse and repeat for every webpage on the list (there are 9,500 of them)

??

I have tried to do this in Excel via VBA - I can save the webpages just fine but not as mhtml.

I need mhtml as the webpages function exactly the same locally as they do if I was connected to them.

Any thoughts appreciated and many thanks for reading.

chuckles1066,
Welcome to the forum. :wave:t4:

You actually don’t need Chrome for that.

You may start with reading the following help topics to get you startet:

I’d recommend to read the help for all cmdlets completely including the examples to learn how to use them.

In theory, Invoke-WebRequest should work for you. What is funny is the alias “wget” has no where near the functionality of the Linux “wget” that it implies to emulate. Invoke-WebRequest does not have a recursive switch as the Linux “wget” does so you have to play some tricks. Google “invoke-webrequest recursive” to get some examples on how to do that.

There was a bit of controversy about the use of those aliases a few years ago. It was considered a breaking change to remove them so they’re still in 5.1 but they’ve been removed from PowerShell Core which I guess makes sense as you wouldn’t want those aliases on a Linux system.

Thanks for that Matt, good to know. I dont currently have PS 6/7 at my disposal. Do you know if recursion is supported with Invoke-WebRequest with the open source versions? I did not see that on Microsoft Docs … just curious.

There’s no recursive parameter so I don’t think so. You’d have to roll your own by parsing the response and calling it again.

Probably easier to install wget for Windows or use it on WSL.

I had bash/WSL installed and it worked perfectly from there.

Thank you for replying, much appreciated.

I have to confess that I’ve never used PowerShell although I have programmed in several languages - I can’t see that any of those help topics would even remotely address what I am looking to achieve?

Take the first address from a list, connect to it, download the webpage as mhtml, close the webpage, go to the next address in the list, rinse and repeat.

Apologies, I didn’t make my intentions clear in my original post.

Depending on the file type you saved the list of webpages in you want to treat you can use either Get-Content or Import-Csv to read this list from a file. Now you iterate over this list and use Invoke-Webrequest to download the webpages. With Out-File you write the downloaded webpage to a file.

Not being an expert in either MHTML or Invoke-WebRequest, my gut feel is that Invoke-WebRequest be it PowerShell or WSL (wget) will not result in the same output as MHTML. If someone can validate this, that would be great.

https://whatis.techtarget.com/fileformat/MHTML-MHTML-document-MIME

That’s my concern, I’ve played with various formats and only mhtml makes the locally saved webpage act the same as the online one, hence the need for it to be mhtml.

What’s the actual purpose of this task? You may use an external tool like wget or something similar what’s made to be controlled by command line.

It is my understanding that MHTML is pretty much a Microsoft IE (and Chrome through API’s) capability and wget wont meet that exact requirement. If I was approaching this task, I would instantiate an instance of IE and manipulate the DOM to attempt to get this done. This link also has pointers to an HTML Agility Pack that may be what you need.

This is a backgammon website. I play there but there is no facility to export your games so that they can be analysed by something like GNU Backgammon (so I can see what moves were bad, how to improve my game etc).

I have over 9,000 games there. I want to automate downloading of each game (a different URL for each one) so that I can then use VBA or something to extract the dice rolls and moves.