I stand corrected on the above…Encoding is the issue. If i open in notepad++
encoding shows UCS-2 Little Endian on the file that DOES NOT parse properly
and
UCS-2 LE BOM on the file that DOES.
I have tested converting back and forth on notepad++ and this is the issue.
powershell does not have an option that i have found to convert a file to UCS-2 LE BOM and i have tried each of the listed enumerators to convert and none will convert and parse properly.
The fix for this was not to write back to the same file i was pulling from…If i did it would purge the data…i had to run with out-file to a new file and encode it under unicode
Hmm, I couldn’t see the encoding of a UCS-2 Little Endian (no bom) file in notepad++. It looks like with get-content there’s a $null between each letter. Notepad can handle unicode no bom but get-content can’t without specifying the encoding.
For future reference you can convert the file like this and output to the same file; the parentheses make sure the first part is done before the second part starts. The default for out-file is unicode anyway in PS 5. Set-content defaults to ansi, which should also be fine. PS 6 defaults to utf8 no bom with all commands. The code tags look like < pre > < /pre > with no spaces.
Example of taking the initial 2 byte unicode BOM out of a file (utf8 bom is 3 bytes). FAQ - UTF-8, UTF-16, UTF-32 & BOM
You can see the bom (“FF FE”) in emacs hexl-mode or some other binary editor.
$file = 'hi.txt'
'hi there' | set-content -Encoding unicode $file
$bytes = [io.file]::ReadAllBytes($file)
[IO.File]::WriteAllBytes($file, $bytes[2..$bytes.length])
get-content $file | select-string ('h' + [char]$null + 'i')
h i t h e r e