Getting information from web page.

by jerryp at 2012-08-30 14:16:29

I am trying to get some information from a network printer with a web interface. The web page runs a javascript which sets a couple of cookies and the page refreshes with a /system.xml extension. If I use webclient it just gets the html on the page and if I add the system.xml it just gives me some code about a direct connection error. I can get the information by using the internetexplorer.application but that leaves an internet explorer process running and each time it runs it adds another process. I need to be able to refresh the page and then get the information or send the cookies to the xml page, or I assume if I set the cookies and call the xml page it will work. Any suggestions on how to do this?
by DonJ at 2012-08-30 15:02:11
Without seeing the Web page, I can guess that it’s making an AJAX call from a client-side Javascript. That means it’ll only run properly in a Web browser capable of executing Javascript. Your attempt with the IE COM object was a good one - I’d have tried that, too. The fact the IE is launching a new process each time is unfortunate… I’m not sure of a way around that, short of getting what you need, terminating IE, and re-launching it for the next try.

I would probably have next tried accessing the XML page directly and then parsing it (it’ll be XML, not HTML). If you’re getting an error doing that, then the printer’s firmware isn’t allowing the page to be called in that fashion. I’d have to dig into the client-side Javascript to see what was being done. It might be a matter of constructing an HTTP header to fool the printer’s Webserver.
by poshoholic at 2012-08-30 18:43:48
After opening a site with New-Object -Com InternetExplorer.Application, you need to make sure you invoke the Quit() method on the object that is returned from that. That is how you terminate the processes that are otherwise left open.

i.e.

[script=powershell]$ie = New-Object -Com InternetExplorer.Application
$ie.Navigate(‘www.poshoholic.com’)
$ie.Visible = $true
Get-Process iexplore
$ie.Quit()
Get-Process iexplore[/script]
by jerryp at 2012-08-31 06:15:37
Here’s the web page code when webclient is run:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Frameset//EN”>
<HTML lang=“en”>
<HEAD>
<TITLE></TITLE>
<meta http-equiv=“Expires” content=“0”>
<meta http-equiv=“Pragma” content=“no-cache”>
<meta http-equiv=“Content-Type” content=“test/html;charset=UTF-8”>
<meta content=“text/javascript” http-equiv=“Content-Script-Type”>
<link rel=“stylesheet” type=“text/css” href=“default.css”>
<noscript>
<meta http-equiv=“refresh” content=“0; URL=js_error.xml”>
</noscript>
<meta http-equiv=“refresh” content=“0; URL=./system.xml”>
<script type=“text/javascript” src=“init.js”></script>
</HEAD>
<BODY BGCOLOR="#ffffff" LINK="#000000" ALINK="#ff0000" VLINK="#000000" onload=“init();”>
</BODY>
</HTML>
Here’s the contents of init.js

function init() {
//cookie
if ( !window.navigator.cookieEnabled ){
location.href = “js_error.xml”;
return;
}

langNo = getUserLang();

document.cookie=“selno=Auto;”;
document.cookie=“lang=”+langNo+";";

}

function getUserLang() {
if (navigator.userLanguage) {
userLangArray = navigator.userLanguage.split("-", 2);
} else if(navigator.language){
userLangArray = navigator.language.split("-", 2);
} else {
return “”;
}

userLang1tmp = userLangArray[0];
userLang1 = userLang1tmp.charAt(0).toUpperCase();
userLang1 = userLang1 + userLang1tmp.substr(1, userLang1tmp.length);

if (userLangArray[1] != null) {
userLang2 = userLangArray[1].toUpperCase();
} else {
userLang2 = “”;
}

if (userLang1 == “Zh”) {
userLangNo = userLang1 + “-” + userLang2;
} else {
userLangNo = userLang1;
}

return userLangNo;
Here’s what I get is I run webclient on the system.xml file
<?xml version=“1.0” encoding=“UTF-8”?>
<?xml-stylesheet href=“autherror.xsl” type=“text/xsl”?>
<MFP>
<Function>common</Function>
<LangNo>En</LangNo>
<Message>
<Item Code=“Err_2”>DirectAccessError</Item>
</Message>
<Redirect>/index.html</Redirect>
</MFP>
by DonJ at 2012-08-31 06:32:16
Yeah, so, it’s using an HTTP refresh, but if you try to access that in anything but a Web browser, you’ll get the error. The error’s in a NOSCRIPT block - so anything that doesn’t understand script, basically. I suppose you could try directly querying system.xml and see what you get. That isn’t going to be an HTML page, necessarily - you may need to be prepared to parse the XML (or let PowerShell do so).
by jerryp at 2012-08-31 07:47:23
As I said this is what I get when I query the xml page directly.

Here’s what I get is I run webclient on the system.xml file
<?xml version=“1.0” encoding=“UTF-8”?>
<?xml-stylesheet href=“autherror.xsl” type=“text/xsl”?>
<MFP>
<Function>common</Function>
<LangNo>En</LangNo>
<Message>
<Item Code=“Err_2”>DirectAccessError</Item>
</Message>
<Redirect>/index.html</Redirect>
</MFP>
by poshoholic at 2012-08-31 08:02:38
If you can get it to work as you said in the beginning when you use the InternetExplorer.Application COM object, then why not use that in a try/finally block, with a call to $ie.Quit() (assuming you store your COM object in a variable called “ie”) in the finally block so that the process doesn’t remain open when you’re finished with it?
by jerryp at 2012-08-31 08:33:00
That’s the way I’m going to run it now that I have the command to get the process to quit. I was orginally looking for a way to send the cookie information with the headers to see if I could get webclient to work with the xml file. But I don’t think it can be done without a lot of coding. Thanks for your help.
by poshoholic at 2012-08-31 09:15:26
No problem. I’m glad this works out your issue.