The resulting files look the same in Notepad++ and when 'cat’ed. But when the file sizes are checked the first command always creates a file about twice the size of the second command. I have opened both files in a hex editor and the larger file shows NULL (hex code 00) characters separating every character which accounts for the size difference. Why is this happening?
Ran a few tests myself, and I can say that it’s not the CSV cmdlets causing the difference. The issue appears to be the Out-File cmdlet, and it’s only present in Windows PowerShell 5.1, not PS Core (6.1.0 RC1). Unsure of prior versions, but it’s likely that it’s a long-standing bug that was fixed for PS Core at some point.
Instead, I’d suggest using the Set-Content or Add-Content cmdlet.
I found out what is happening but not the why. I went back to double check the files in Notepad++ as @olaf-soyk suggested but couldn’t see any differences. I did notice that Notepad++ had decided that the files had difference encodings.
The smaller file was UTF-8
The larger file UCS-2 BE ROM
I haven’t come across UCS-2 BE ROM encoding before but a quick websearch showed it to be a 16-bit encoding as opposed to the UTF-8 which is 8-bit. I suppose it should have been obvious when I saw the extra empty chars in the hex editor!
Using out-file with -encoding utf8 gives files of equivalent size. There is still some BOM characters at the beginning of the file though. Hope this helps someone.
BTW: That does not affect the functionality of the files. It takes a little more space and you could save some more “exotic” charachters from the unicode table. But it will work the same as UTF8 encoded files in common environments.
Yep. I had a thread about this a little while ago. At first I thought it was unix text. PS 5’s Out-File (or “>”) encodes in what Notepad calls “unicode” and most other commands output in what Notepad calls “ansi”. Some applications won’t like it, like Infoblox (for csv import).