Niscors
February 18, 2024, 2:54pm
1
I am trying to write my first PowerShell to replicate a script that I have already written in AppleScript for my Mac. It gets info from a website based on a class name. I’m just starting out and now getting an error that I don’t understand.
The check-krpano function (which I copied from a previous post on here) works fine and returns a date. The check-cccbr function which is exactly the same apart from the website and class name, gives me two errors.
function check-krpano {
$geturl = Invoke-WebRequest -Uri "http://krpano.com/news/"
$news = $geturl.ParsedHtml.body.getElementsByClassName("newsdate")
Write-Host "$($news[0].innerHTML)"
}
function check-cccbr {
$geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
$news = $geturl.ParsedHtml.body.getElementsByClassName("elementor-text-editor elementor-clearfix")
Write-Host "$($news[0].innerHTML)"
}
#check-krpano
check-cccbr
Errors from check-cccbr
You cannot call a method on a null-valued expression.
At C:\Users\User\Desktop\TEST-elemementsByClass.ps1:12 char:5
Cannot index into a null array.
At C:\Users\User\Desktop\TEST-elemementsByClass.ps1:13 char:19
I assume that the 2nd error is just a result of the 1st one.
I know that the class name exists on the website because the AppleScript code below gets the text that I am after.
Can someone help me out here please?
Working AppleScript for info.
-- Website open for example purposes in Safari tag 1 is https://cccbr.org.uk/bellringing/what-is-bell-ringing/
tell application "Safari"
set theItem to "elementor-text-editor elementor-clearfix"
set myWindow to current tab of first window
tell myWindow
set theName to do JavaScript "document.getElementsByClassName('" & theItem & "')[0].outerText"
set theText to theName
end tell
end tell
theText
Hi, welcome to the forum
Are you using PowerShell on your Mac?
PowerShell Core does not have the ParsedHtml
property; it was dependent on Internet Explorer.
It will work under Windows PowerShell 5.1.
Niscors
February 19, 2024, 12:04am
3
Hi, thanks for your reply.
No, I have got an old windows machine (Windows 10) to test the code on. It is running PowerShell 5.1
Your function works fine for me under 5.1.
Please can you run:
$geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
On its own, and post the output of
$geturl | Get-Member
Niscors
February 19, 2024, 1:06pm
5
Hi, i’ve done that.
TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject
Name MemberType Definition
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
AllElements Property Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse Property System.Net.WebResponse BaseResponse {get;set;}
Content Property string Content {get;}
Forms Property Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers Property System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields Property Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml Property mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent Property string RawContent {get;set;}
RawContentLength Property long RawContentLength {get;}
RawContentStream Property System.IO.MemoryStream RawContentStream {get;}
Scripts Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode Property int StatusCode {get;}
StatusDescription Property string StatusDescription {get;}
When posting code, output, or data files, please can you use the </> button to format your post as code. It’s much more readable. If you can’t see the button in the toolbar, you’ll find it under the gear icon.
How to format code on PowerShell.org
So the good news is you have a ParsedHTML
property. Does that have a Body
property?
$geturl.ParsedHtml | Get-Member Body
Niscors
February 19, 2024, 8:39pm
7
I’ve got output but I don’t see anything useful (to me) from the webpage. The word ‘church’ for example appears 4 times on the page but is not there at all in the output?
It’s quite long but here it is -
PS C:\Users\User> $geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
# $geturl = Invoke-WebRequest -Uri "http://krpano.com/news/"
$geturl.ParsedHtml | Get-Member Body
$geturl
StatusCode : 200
StatusDescription : OK
Content : <!DOCTYPE html>
<html lang="en-US" prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# website: http://ogp.me/ns/website#">
<head>
<!-- Debug: Bootstrap Inserted by WordPress Twitter Bootstrap CS...
RawContent : HTTP/1.1 200 OK
Pragma: no-cache
Link: <https://cccbr.org.uk/wp-json/>; rel="https://api.w.org/", <https://cccbr.org.uk/wp-json/wp/v2/pages/14773>; rel="alternate";
type="application/json", <http://...
Forms : {}
Headers : {[Pragma, no-cache], [Link, <https://cccbr.org.uk/wp-json/>; rel="https://api.w.org/", <https://cccbr.org.uk/wp-json/wp/v2/pages/14773>;
rel="alternate"; type="application/json", <http://cccbr.info/0fqya>; rel=shortlink], [Vary, Accept-Encoding,User-Agent],
[X-XSS-Protection, 1; mode=block]...}
Images : {@{innerHTML=; innerText=; outerHTML=<img title="CCCBR" class="header-image"
src="https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png">; outerText=; tagName=IMG; title=CCCBR; class=header-image;
src=https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png}}
InputFields : {@{innerHTML=; innerText=; outerHTML=<input name="s" title="Search for:" class="search-field" type="search" placeholder="Search …"
value="">; outerText=; tagName=INPUT; name=s; title=Search for:; class=search-field; type=search; placeholder=Search …; value=},
@{innerHTML=; innerText=; outerHTML=<input class="search-submit" type="submit" value="Search">; outerText=; tagName=INPUT;
class=search-submit; type=submit; value=Search}, @{innerHTML=; innerText=; outerHTML=<input class="cli-user-preference-checkbox"
id="wt-cli-checkbox-necessary" type="checkbox" checked="checked" data-id="checkbox-necessary">; outerText=; tagName=INPUT;
class=cli-user-preference-checkbox; id=wt-cli-checkbox-necessary; type=checkbox; checked=checked; data-id=checkbox-necessary},
@{innerHTML=; innerText=; outerHTML=<input class="cli-user-preference-checkbox" id="wt-cli-checkbox-non-necessary" type="checkbox"
checked="checked" data-id="checkbox-non-necessary">; outerText=; tagName=INPUT; class=cli-user-preference-checkbox;
id=wt-cli-checkbox-non-necessary; type=checkbox; checked=checked; data-id=checkbox-non-necessary}}
Links : {@{innerHTML=Skip to content; innerText=Skip to content; outerHTML=<a title="Skip to content" class="screen-reader-text skip-link"
href="#content">Skip to content</a>; outerText=Skip to content; tagName=A; title=Skip to content; class=screen-reader-text skip-link;
href=#content}, @{innerHTML=
<img title="CCCBR" class="header-image" src="https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png">
; innerText= ; outerHTML=<a title="CCCBR" href="https://cccbr.org.uk/" rel="home">
<img title="CCCBR" class="header-image" src="https://cccbr.org.uk/wp-content/uploads/2018/12/header_2_small.png">
</a>; outerText= ; tagName=A; title=CCCBR; href=https://cccbr.org.uk/; rel=home}, @{innerHTML=
<span class="screen-reader-text">Search</span>
; innerText=Search ; outerHTML=<a href="#">
<span class="screen-reader-text">Search</span>
</a>; outerText=Search ; tagName=A; href=#}, @{innerHTML=Home; innerText=Home; outerHTML=<a href="https://cccbr.org.uk/">Home</a>;
outerText=Home; tagName=A; href=https://cccbr.org.uk/}...}
ParsedHtml : System.__ComObject
RawContentLength : 110105
PS C:\Users\User>
You’ve pasted the value of $geturl
; the output I’m expecting looks like this:
PS E:\Temp> $geturl.ParsedHtml | Get-Member Body
TypeName: mshtml.HTMLDocumentClass
Name MemberType Definition
---- ---------- ----------
body Property mshtml.IHTMLElement, Microsoft.msht
Niscors
February 19, 2024, 9:41pm
9
Can you talk me through that please? If I input the following:
$geturl.ParsedHtml | Get-Member Body
It runs for a second but I get no output? (using PowerShell ISE)
For completeness, run these three lines in order. It should work fine in a 5.1 console, or in the ISE.
$geturl = Invoke-WebRequest -Uri "https://cccbr.org.uk/bellringing/what-is-bell-ringing/"
$geturl | Get-Member
$geturl.ParsedHtml | Get-Member Body
The first line assigns the response to the variable $geturl
The second line shows the members of $geturl
- this establishes that you’re getting a proper response and, more importantly for what you’re trying to do, that the ParsedHtml
property is present.
The third line is to check that the ParsedHtml
object has a body
property.
Your error message indicates that either body
or ParsedHtml
has no value so we’re working through checking those values to figure out where the problem is.
I have and it’s working fine for me. Trying to identify what is wrong on op’s end.
1 Like
Niscors
February 19, 2024, 10:04pm
14
PS C:\Users\User> $geturl = Invoke-WebRequest -Uri “What is Bell Ringing? – CCCBR ”
PS C:\Users\User> $geturl | Get-Member
TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject
Name MemberType Definition
---- ---------- ----------
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
AllElements Property Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse Property System.Net.WebResponse BaseResponse {get;set;}
Content Property string Content {get;}
Forms Property Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers Property System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields Property Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml Property mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent Property string RawContent {get;set;}
RawContentLength Property long RawContentLength {get;}
RawContentStream Property System.IO.MemoryStream RawContentStream {get;}
Scripts Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode Property int StatusCode {get;}
StatusDescription Property string StatusDescription {get;}
PS C:\Users\User> $geturl.ParsedHtml | Get-Member Body
PS C:\Users\User>
Now that’s odd. Because I get this:
PS E:\Temp> $geturl.ParsedHtml | Get-Member body
TypeName: mshtml.HTMLDocumentClass
Name MemberType Definition
---- ---------- ----------
body Property mshtml.IHTMLElement, Microsoft.mshtml
That implies you don’t have the body
property, hence your error.
How big is the response?
PS E:\Temp> $geturl.RawContent.Length
110752
Niscors
February 19, 2024, 10:22pm
16
I don’t know if it helps but I ran the same commands through the krpano website.
PS C:\Users\User> $geturl = Invoke-WebRequest -Uri “krpano.com - News ”
PS C:\Users\User> $geturl | Get-Member
TypeName: Microsoft.PowerShell.Commands.HtmlWebResponseObject
Name MemberType Definition
---- ---------- ----------
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
AllElements Property Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse Property System.Net.WebResponse BaseResponse {get;set;}
Content Property string Content {get;}
Forms Property Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers Property System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields Property Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml Property mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent Property string RawContent {get;set;}
RawContentLength Property long RawContentLength {get;}
RawContentStream Property System.IO.MemoryStream RawContentStream {get;}
Scripts Property Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode Property int StatusCode {get;}
StatusDescription Property string StatusDescription {get;}
PS C:\Users\User> $geturl.ParsedHtml | Get-Member Body
TypeName: System.__ComObject#{3050f55f-98b5-11cf-bb82-00aa00bdce0b}
Name MemberType Definition
---- ---------- ----------
body Property IHTMLElement body () {get}
PS C:\Users\User>
Looking at your earlier post, and comparing it to the output I’m gettting, the length seems OK, but the content is slightly different. Given the parsing is dependent on IE, and you mentioned that you have an older version of Windows 10, I’m wondering if you have a problem with IE. Is it patched up to date?
Get-Item 'C:\Program Files\Internet Explorer\iexplore.exe' | Select-Object VersionInfo | Format-List
Niscors
February 19, 2024, 10:57pm
18
Well outside my comfort zone but I don’t think this is good!
PS C:\Users\User> Get-Item 'C:\Program Files\Internet Explorer\iexplore.exe' | Select-Object VersionInfo | Format-List
VersionInfo : File: C:\Program Files\Internet Explorer\iexplore.exe
InternalName: iexplore
OriginalFilename: IEXPLORE.EXE.MUI
FileVersion: 11.00.19041.3691 (WinBuild.160101.0800)
FileDescription: Internet Explorer
Product: Internet Explorer
ProductVersion: 11.00.19041.3691
Debug: False
Patched: False
PreRelease: False
PrivateBuild: False
SpecialBuild: False
Language: English (United Kingdom)
Niscors
February 19, 2024, 11:28pm
19
Sorry, I missed this question.
PS C:\Users\User> $geturl.RawContent.Length
110752
I’m curious if the OP has tried to query that site so many times programmatically, if they haven’t been temporarily “blocked.” A lot of sites detect automation tools querying the front end can/do block those requests as well as refer to their TOS/API pages. OP, perhaps try the code from a different internet connection and/or different machine and see if it works there. That said, this is a FYI, typically websites will “prohibit” scraping of their website using programs/tools outside a normal browser. I’m not saying you are breaking any terms, just beware of this possibility. You may also look into any API options for the site, it would certainly be easier to retrieve data and won’t break if the website front end changes. If it works elsewhere, you may need to just wait a while before it works from your current location.
dotnVo
February 20, 2024, 4:28pm
21
Sorry lots of chatter here… but i want to go back to a more basic check… are we sure OP is running 5.1 and not 7? I replicate the error in PS 7. It seems to be expected: Invoke-Webrequest is missing some properties, like .ParsedHtml and .AllElements · Issue #2867 · PowerShell/PowerShell (github.com)
TLDR IE is involved (surprise it’s not really ‘gone’) for parsing. Apparently there’s PSgallery workarounds for PS7.
PS core does not have the internet explorer component available, all Invoke-WebRequest use the -UseBasicParsing mode. The confirmation the ParsedHtml property is present removes any doubt the user is not on PS Core.
1 Like