Parse info from a website

As I have been unable to find a current RSS feed for windows updates I am trying to parse some data from the Microsoft support site. I can dynamically build the URL as the only bits that will change are the numbers of the knowledge base article. I am interested in the text below where it say Summary but cant find a method to extract this information with Invoke-Webrequest

an example url is below

$web= Invoke-WebRequest “https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title

So there’s a couple of ways you could do it, but before I start jumping down the wrong rabbit hole, would this site get you the data you need?

https://support.microsoft.com/en-us/gp/selectrss?target=rss

Hi, Thanks for the reply. I looked at that site rss feeds but it would appear that this is no longer being updated

I am trying

$web =“https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title
$data = invoke-Webrequest $web
$result = $data.ParsedHtml.body.getElementsByClassName(‘kb-summary-section section ng-scope.x-hidden-focus’)

As I only want the information in the summary but nothing is being passed back to $result

invoke-restmethod?

Hi Simon,

it seemed to me that you are doing everything right, but when you output the complete raw result of the request, it doesn’t seem to be anything useful. So I tried using the Internet Explorer COM Object through PS and it worked. Not pretty, but gets the result you are looking for:

   $ie = new-object -ComObject "InternetExplorer.Application"
   $ie.silent = $true
   $ie.navigate($web)
   while($ie.busy){ sleep 1 }
   $result = $ie.document.body.getElementsByClassName("kb-summary-section") | select -ExpandProperty innertext
   $ie.quit()

Cheers
Wilm

Thanks Wilm that does the trick.

Just when I thought it was safe to go back into the water :slight_smile:
When I use the following :-
$ie = new-object -ComObject “InternetExplorer.Application”
$ie.silent = $true
$web =“https://support.microsoft.com/en-us/help/4022887/title#!/en-us/help/4022887/title
$ie.navigate($web)
$result = “”
$result = $ie.document.body.getElementsByClassName(“container section-body”) | select -ExpandProperty innertext
$kbarticle = $result -split “Symptom” | select -first 1
$ws.cells.item($intRow,4) = $kbarticle
$ws.cells.item($intRow,5) = $web

It writes the contents of $kbarticle to the cell in excel (ok I have not included to code to open excel here) but there are 2 carriage returns at the top of the data so in order to see the data you have to click into the cell (I spent hours thinking it wasn’t writing the data before I spotted the 2 Carriage returns :slight_smile: ). I have tried $kbarticle.Trim() but that does not seem to work. Any ideas

I fixed the issue with

$trimmedkbArticle = $kbarticle.ToString()
$ws.cells.item($intRow,4) = $trimmedkbarticle.Trim()

Using invoke-restmethod with an rss feed? This returns an array of [XmlElement]'s.

$a = Invoke-RestMethod https://support.microsoft.com/en-us/rss?rssid=18165
show-object $a  # PowershellCookbook module