by stocksp at 2012-08-21 10:46:00
I have a string consisting of <div> elements separated by CRLF.by willsteele at 2012-08-21 11:02:44
Stored in $tmp.
if I just do$lines = $tmp .split("rn")
$lines contains an extra blank line between each <div>
$lines[0] is correct. $lines[1] is a blank line. through the whole file
To get what I want I’m using$lines = $tmp -replace("rn", "|")
$lines = $lines.split("|")
Which I know is ugly…
How can I get a ‘clean’ array (no blank lines) without the -replace ‘hack’
Due to the way the CR LF tags are handled it can be a little challenging. I have fought this before. An alternative may be to look at spliting on a binary character instead of an escaped character. CR is 0x0D and LF is 0x0A. Perhaps splitting on one or the other of those instead of both could help.by DonJ at 2012-08-21 11:15:05$lines = $tmp -split 0x0D
Without know your exact data I am guessing a bit, but, this will probably work.
I’m curious, how id you read in the string to begin with? I ask because Get-Content, when reading a text file, will normally handle this for you, putting each line into a unique object. Did you maybe query this from a Web server or something?by stocksp at 2012-08-21 11:28:42
My data is very simple it looks like this in Notepad+by stocksp at 2012-08-21 11:36:28
<div style="position: absolute; top: 170px; left: 32px; width:7px; font:8pt Arial; color: #000000">1</div>
<div style="position: absolute; top: 170px; left: 54px; width:13px; font:8pt Arial; color: #000000">15</div>
<div style="position: absolute; top: 170px; left: 76px; width:13px; font:8pt Arial; color: #000000">14</div>
If I ‘show symbols’. The editor shows a ‘CR’ and ‘LF’ at the end of each line. Very standard stuff.
I tried$lines = $tmp -split 0x0D, 0x0A
and it ‘almost’ works … a couple of the lines are mangled (missing <div>'s)
I assume I’m not passing both character correctly to -split.
Donjby poshoholic at 2012-08-21 11:38:50
The data I’m working with is really nasty HTML that a program is spitting out (its an image of a print file). I need it as single string for removing large chunks of garbage. Once I’ve stripped it down to the area I’m after, then I can break it up into lines.
This issue is easy to resolve once you understand what is happening behind the scenes.by willsteele at 2012-08-21 11:45:49
When you use the System.String Split method and you pass it "rn", you’re calling the Char overload of this method. That method allows you to pass in an array of characters, and it will split the string on any character it finds in that array. By passing in "rn", it will split on "r" and it will also split on "n". That is why you end up with extra newlines. To fix this you need to do one of the following:
Option A: Force it to split on entire strings, not characters.
[script=powershell]$lines = $tmp.Split([string]"rn",'None')[/script]
Option B: Use the regex -split operator instead.
[script=powershell]$lines = $tmp -split "rn"[/script]
I prefer option B, and personally I use a slightly modified version of it like this:
[script=powershell]$lines = $tmp -split "rn|r|n"[/script]
This version splits a string on thern combo first, then it checks forr by itself, and thenn by itself. I’ve dealt with strings with newline characters coming from enough sources to know that you don’t always getrn as a pair of characters for newlines, so I prefer the robustness of that last technique to make sure I get the results I want no matter what the source is.
Ah, that second option was one I recall having seen Mjolinor use. Thanks for pointing that one out Kirk. Good approach.