Extracting text nodes and nulls from XML elements

by davidski at 2012-10-20 04:56:43

I’m trying to extract text values from XML objects and am currently stumped on how to account for the possibility of null values. A snippet of the XML is below.


<object-array>
<string>Foo</string>
<string>Bar</string>
<int>0</int>
<string>Baz</string>
<null/>
<sql-timestamp>2012-10-20 21:54:17.0</sql-timestamp>
<boolean>true</boolean>
<float>6.4</float>
</object-array>


The node names ("string", "int", etc.) can be in any order and are not predictable. Any given element(s) may be null during a single object-array stanza.

I’m currently doing a simple unwind of all text nodes via:
$values = $xmlObject |
Select-Xml "//descendant::text()" |
ForEach-Object { $.Node.Value }


Null nodes, not having text elements, don’t return anything here which breaks my parsing. I need something that returns the following for the provided sample snippet: ("Foo", "Bar", "0", "Baz", "", "2012-10-20 21:54:17.0", "true", "6.4") The code needs to be relatively efficient as well as I’m parsing through several hundred thousand of these object-array groupings per job run.

My XML and XPath foo is weak. Any thoughts on a solution?
by DonJ at 2012-10-22 08:22:57
I’m not an XML guru - but I’m going to move this to our Advanced forum. Kirk’s been quiet lately, but he’ll probably have a good suggestion when het gets back in.
by davidski at 2012-10-22 08:41:16
Thanks, Don. If it helps anyone reviewing this thread, I do have access to PowerShell V3. Just in case there’s some awesome-sauce in there that makes this easier…
by poshoholic at 2012-10-22 11:16:56
I’m not an XML guru either, but I usually manage to find my way around. I had to make some assumptions here because I didn’t know the exact format your file would be in.

Let’s assume you start with C:\test.xml, and that it contains this:

<objs>
<object-array>
<string>Foo</string>
<string>Bar</string>
<int>0</int>
<string>Baz</string>
<null/>
<sql-timestamp>2012-10-20 21:54:17.0</sql-timestamp>
<boolean>true</boolean>
<float>6.4</float>
</object-array>
<object-array>
<string>Scooby</string>
<string>Doo</string>
<null/>
<string>bark</string>
<int>101</int>
<sql-timestamp>2012-10-22 08:08:08.0</sql-timestamp>
<boolean>false</boolean>
<float>42.0</float>
</object-array>
</objs>

With a file like that, if you wanted to extract the two arrays of values, using PowerShell 3.0 you could do something like this:
$xml = [xml]${C:\test.xml}
$xml.objs.'object-array' | ForEach-Object {
$
.CreateNavigator().SelectChildren('Element').Value
}

The inside of that ForEach-Object script block is the logic that takes the array of XML elements and pulls out only the values of those elements, including the null value. If you run it as is, you’ll see all 16 values appear at once, but the ForEach-Object statement is processing them as two collections of 8 values each, so you would need to add whatever other logic you are going to use to do something with the individual collections of 8 values.
by davidski at 2012-10-22 15:37:10
Most excellent. The CreateNavigator().SelectChildren(‘Element’).Value call was just what was needed. Thank you!
by ArtB0514 at 2012-10-30 12:50:47
Something else to keep in mind when examining objects that might be strings…

[string]::IsNullOrEmpty(<object>)