GetElementsByClass help please

I am trying to write my first PowerShell to replicate a script that I have already written in AppleScript for my Mac. It gets info from a website based on a class name. I’m just starting out and now getting an error that I don’t understand.
The check-krpano function (which I copied from a previous post on here) works fine and returns a date. The check-cccbr function which is exactly the same apart from the website and class name, gives me two errors.

function check-krpano {
   $geturl = Invoke-WebRequest -Uri "http://krpano.com/news/" 
    $news = $geturl.ParsedHtml.body.getElementsByClassName("newsdate")
    Write-Host "$($news[0].innerHTML)"
}

 function check-cccbr {
    $geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
    $news = $geturl.ParsedHtml.body.getElementsByClassName("elementor-text-editor elementor-clearfix")
    Write-Host "$($news[0].innerHTML)"
    }
   
#check-krpano
check-cccbr

Errors from check-cccbr

You cannot call a method on a null-valued expression.
At C:\Users\User\Desktop\TEST-elemementsByClass.ps1:12 char:5

  • $news = $geturl.ParsedHtml.body.getElementsByClassName("elementor ...
    
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:slight_smile: , RuntimeException
    • FullyQualifiedErrorId : InvokeMethodOnNull

Cannot index into a null array.
At C:\Users\User\Desktop\TEST-elemementsByClass.ps1:13 char:19

  • Write-Host "$($news[0].innerHTML)"
    
  •               ~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (:slight_smile: , RuntimeException
    • FullyQualifiedErrorId : NullArray

I assume that the 2nd error is just a result of the 1st one.

I know that the class name exists on the website because the AppleScript code below gets the text that I am after.

Can someone help me out here please?

Working AppleScript for info.

-- Website open for example purposes in Safari tag 1 is https://cccbr.org.uk/bellringing/what-is-bell-ringing/

tell application "Safari"
	set theItem to "elementor-text-editor elementor-clearfix"
	set myWindow to current tab of first window
	tell myWindow
		set theName to do JavaScript "document.getElementsByClassName('" & theItem & "')[0].outerText"
		set theText to theName
	end tell
end tell

theText

Hi, welcome to the forum :wave:

Are you using PowerShell on your Mac?

PowerShell Core does not have the ParsedHtml property; it was dependent on Internet Explorer.

It will work under Windows PowerShell 5.1.

Hi, thanks for your reply.
No, I have got an old windows machine (Windows 10) to test the code on. It is running PowerShell 5.1

Your function works fine for me under 5.1.

Please can you run:

$geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"

On its own, and post the output of

$geturl | Get-Member

Hi, i’ve done that.

TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject

Name MemberType Definition


Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
AllElements Property Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse Property System.Net.WebResponse BaseResponse {get;set;}
Content Property string Content {get;}
Forms Property Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers Property System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields Property Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml Property mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent Property string RawContent {get;set;}
RawContentLength Property long RawContentLength {get;}
RawContentStream Property System.IO.MemoryStream RawContentStream {get;}
Scripts Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode Property int StatusCode {get;}
StatusDescription Property string StatusDescription {get;}

When posting code, output, or data files, please can you use the </> button to format your post as code. It’s much more readable. If you can’t see the button in the toolbar, you’ll find it under the gear icon.

How to format code on PowerShell.org

So the good news is you have a ParsedHTML property. Does that have a Body property?

$geturl.ParsedHtml | Get-Member Body

I’ve got output but I don’t see anything useful (to me) from the webpage. The word ‘church’ for example appears 4 times on the page but is not there at all in the output?
It’s quite long but here it is -

PS C:\Users\User> $geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
# $geturl = Invoke-WebRequest -Uri "http://krpano.com/news/" 
$geturl.ParsedHtml | Get-Member Body

$geturl


StatusCode        : 200
StatusDescription : OK
Content           : <!DOCTYPE html>
                    <html lang="en-US" prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# website: http://ogp.me/ns/website#">
                    <head>
                    <!-- Debug: Bootstrap Inserted by WordPress Twitter Bootstrap CS...
RawContent        : HTTP/1.1 200 OK
                    Pragma: no-cache
                    Link: <https://cccbr.org.uk/wp-json/>; rel="https://api.w.org/", <https://cccbr.org.uk/wp-json/wp/v2/pages/14773>; rel="alternate"; 
                    type="application/json", <http://...
Forms             : {}
Headers           : {[Pragma, no-cache], [Link, <https://cccbr.org.uk/wp-json/>; rel="https://api.w.org/", <https://cccbr.org.uk/wp-json/wp/v2/pages/14773>; 
                    rel="alternate"; type="application/json", <http://cccbr.info/0fqya>; rel=shortlink], [Vary, Accept-Encoding,User-Agent], 
                    [X-XSS-Protection, 1; mode=block]...}
Images            : {@{innerHTML=; innerText=; outerHTML=<img title="CCCBR" class="header-image" 
                    src="https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png">; outerText=; tagName=IMG; title=CCCBR; class=header-image; 
                    src=https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png}}
InputFields       : {@{innerHTML=; innerText=; outerHTML=<input name="s" title="Search for:" class="search-field" type="search" placeholder="Search …" 
                    value="">; outerText=; tagName=INPUT; name=s; title=Search for:; class=search-field; type=search; placeholder=Search …; value=}, 
                    @{innerHTML=; innerText=; outerHTML=<input class="search-submit" type="submit" value="Search">; outerText=; tagName=INPUT; 
                    class=search-submit; type=submit; value=Search}, @{innerHTML=; innerText=; outerHTML=<input class="cli-user-preference-checkbox" 
                    id="wt-cli-checkbox-necessary" type="checkbox" checked="checked" data-id="checkbox-necessary">; outerText=; tagName=INPUT; 
                    class=cli-user-preference-checkbox; id=wt-cli-checkbox-necessary; type=checkbox; checked=checked; data-id=checkbox-necessary}, 
                    @{innerHTML=; innerText=; outerHTML=<input class="cli-user-preference-checkbox" id="wt-cli-checkbox-non-necessary" type="checkbox" 
                    checked="checked" data-id="checkbox-non-necessary">; outerText=; tagName=INPUT; class=cli-user-preference-checkbox; 
                    id=wt-cli-checkbox-non-necessary; type=checkbox; checked=checked; data-id=checkbox-non-necessary}}
Links             : {@{innerHTML=Skip to content; innerText=Skip to content; outerHTML=<a title="Skip to content" class="screen-reader-text skip-link" 
                    href="#content">Skip to content</a>; outerText=Skip to content; tagName=A; title=Skip to content; class=screen-reader-text skip-link; 
                    href=#content}, @{innerHTML=
                    					<img title="CCCBR" class="header-image" src="https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png">
                    				; innerText= ; outerHTML=<a title="CCCBR" href="https://cccbr.org.uk/" rel="home">
                    					<img title="CCCBR" class="header-image" src="https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png">
                    				</a>; outerText= ; tagName=A; title=CCCBR; href=https://cccbr.org.uk/; rel=home}, @{innerHTML=
                    					<span class="screen-reader-text">Search</span>
                    				; innerText=Search ; outerHTML=<a href="#">
                    					<span class="screen-reader-text">Search</span>
                    				</a>; outerText=Search ; tagName=A; href=#}, @{innerHTML=Home; innerText=Home; outerHTML=<a href="https://cccbr.org.uk/">Home</a>; 
                    outerText=Home; tagName=A; href=https://cccbr.org.uk/}...}
ParsedHtml        : System.__ComObject
RawContentLength  : 110105




PS C:\Users\User> 

You’ve pasted the value of $geturl; the output I’m expecting looks like this:

PS E:\Temp> $geturl.ParsedHtml | Get-Member Body


   TypeName: mshtml.HTMLDocumentClass

Name MemberType Definition
---- ---------- ----------
body Property   mshtml.IHTMLElement, Microsoft.msht

Can you talk me through that please? If I input the following:

$geturl.ParsedHtml | Get-Member Body

It runs for a second but I get no output? (using PowerShell ISE)

For completeness, run these three lines in order. It should work fine in a 5.1 console, or in the ISE.

$geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
$geturl | Get-Member
$geturl.ParsedHtml | Get-Member Body

The first line assigns the response to the variable $geturl
The second line shows the members of $geturl - this establishes that you’re getting a proper response and, more importantly for what you’re trying to do, that the ParsedHtml property is present.
The third line is to check that the ParsedHtml object has a body property.

Your error message indicates that either body or ParsedHtml has no value so we’re working through checking those values to figure out where the problem is.

I have and it’s working fine for me. Trying to identify what is wrong on op’s end.

1 Like

PS C:\Users\User> $geturl = Invoke-WebRequest -Uri “What is Bell Ringing? – CCCBR

PS C:\Users\User> $geturl | Get-Member

 TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject

Name              MemberType Definition                                                                 
----              ---------- ----------                                                                 
Dispose           Method     void Dispose(), void IDisposable.Dispose()                                 
Equals            Method     bool Equals(System.Object obj)                                             
GetHashCode       Method     int GetHashCode()                                                          
GetType           Method     type GetType()                                                             
ToString          Method     string ToString()                                                          
AllElements       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}                             
Content           Property   string Content {get;}                                                      
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}            
Headers           Property   System.Collections.Generic.Dictionary[string,string] Headers {get;}        
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}     
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}      
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}                                    
RawContent        Property   string RawContent {get;set;}                                               
RawContentLength  Property   long RawContentLength {get;}                                               
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}                             
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}    
StatusCode        Property   int StatusCode {get;}                                                      
StatusDescription Property   string StatusDescription {get;}                                            

PS C:\Users\User> $geturl.ParsedHtml | Get-Member Body

PS C:\Users\User>

Now that’s odd. Because I get this:

PS E:\Temp> $geturl.ParsedHtml | Get-Member body


   TypeName: mshtml.HTMLDocumentClass

Name MemberType Definition
---- ---------- ----------
body Property   mshtml.IHTMLElement, Microsoft.mshtml

That implies you don’t have the body property, hence your error.

How big is the response?

PS E:\Temp> $geturl.RawContent.Length
110752

I don’t know if it helps but I ran the same commands through the krpano website.

PS C:\Users\User> $geturl = Invoke-WebRequest -Uri “krpano.com - News

PS C:\Users\User> $geturl | Get-Member

   TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject

Name              MemberType Definition                                                                 
----              ---------- ----------                                                                 
Dispose           Method     void Dispose(), void IDisposable.Dispose()                                 
Equals            Method     bool Equals(System.Object obj)                                             
GetHashCode       Method     int GetHashCode()                                                          
GetType           Method     type GetType()                                                             
ToString          Method     string ToString()                                                          
AllElements       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}                             
Content           Property   string Content {get;}                                                      
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}            
Headers           Property   System.Collections.Generic.Dictionary[string,string] Headers {get;}        
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}     
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}      
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}                                    
RawContent        Property   string RawContent {get;set;}                                               
RawContentLength  Property   long RawContentLength {get;}                                               
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}                             
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}    
StatusCode        Property   int StatusCode {get;}                                                      
StatusDescription Property   string StatusDescription {get;}                                            

PS C:\Users\User> $geturl.ParsedHtml | Get-Member Body

   TypeName: System.__ComObject#{3050f55f-98b5-11cf-bb82-00aa00bdce0b}

Name MemberType Definition                 
---- ---------- ----------                 
body Property   IHTMLElement body () {get} 

PS C:\Users\User>

Looking at your earlier post, and comparing it to the output I’m gettting, the length seems OK, but the content is slightly different. Given the parsing is dependent on IE, and you mentioned that you have an older version of Windows 10, I’m wondering if you have a problem with IE. Is it patched up to date?

Get-Item 'C:\Program Files\Internet Explorer\iexplore.exe' | Select-Object VersionInfo | Format-List

Well outside my comfort zone but I don’t think this is good!

PS C:\Users\User> Get-Item 'C:\Program Files\Internet Explorer\iexplore.exe' | Select-Object VersionInfo | Format-List


VersionInfo : File:             C:\Program Files\Internet Explorer\iexplore.exe
              InternalName:     iexplore
              OriginalFilename: IEXPLORE.EXE.MUI
              FileVersion:      11.00.19041.3691 (WinBuild.160101.0800)
              FileDescription:  Internet Explorer
              Product:          Internet Explorer
              ProductVersion:   11.00.19041.3691
              Debug:            False
              Patched:          False
              PreRelease:       False
              PrivateBuild:     False
              SpecialBuild:     False
              Language:         English (United Kingdom)

Sorry, I missed this question.

PS C:\Users\User> $geturl.RawContent.Length
110752

I’m curious if the OP has tried to query that site so many times programmatically, if they haven’t been temporarily “blocked.” A lot of sites detect automation tools querying the front end can/do block those requests as well as refer to their TOS/API pages. OP, perhaps try the code from a different internet connection and/or different machine and see if it works there. That said, this is a FYI, typically websites will “prohibit” scraping of their website using programs/tools outside a normal browser. I’m not saying you are breaking any terms, just beware of this possibility. You may also look into any API options for the site, it would certainly be easier to retrieve data and won’t break if the website front end changes. If it works elsewhere, you may need to just wait a while before it works from your current location.

Sorry lots of chatter here… but i want to go back to a more basic check… are we sure OP is running 5.1 and not 7? I replicate the error in PS 7. It seems to be expected: Invoke-Webrequest is missing some properties, like .ParsedHtml and .AllElements · Issue #2867 · PowerShell/PowerShell (github.com)

TLDR IE is involved (surprise it’s not really ‘gone’) for parsing. Apparently there’s PSgallery workarounds for PS7.

PS core does not have the internet explorer component available, all Invoke-WebRequest use the -UseBasicParsing mode. The confirmation the ParsedHtml property is present removes any doubt the user is not on PS Core.

1 Like