Convertto-CSV and out-file vs Export-Csv

ahart162 · September 6, 2018, 3:10pm

Hi,

Was just playing with exporting directory listings to CSV and noticed a little strangeness. I hope someone can enlighten me why this happens?

Two commands which seem to work the same,

[pre]Get-ChildItem | Select-Object fullname,length | ConvertTo-Csv |out-file -FilePath dir-list.csv[/pre]

[pre]Get-ChildItem | Select-Object fullname,length | Export-Csv -Path dir-list2.csv[/pre]

The resulting files look the same in Notepad++ and when 'cat’ed. But when the file sizes are checked the first command always creates a file about twice the size of the second command. I have opened both files in a hex editor and the larger file shows NULL (hex code 00) characters separating every character which accounts for the size difference. Why is this happening?

ta11ow · September 6, 2018, 3:26pm

Ran a few tests myself, and I can say that it’s not the CSV cmdlets causing the difference. The issue appears to be the Out-File cmdlet, and it’s only present in Windows PowerShell 5.1, not PS Core (6.1.0 RC1). Unsure of prior versions, but it’s likely that it’s a long-standing bug that was fixed for PS Core at some point.

Instead, I’d suggest using the Set-Content or Add-Content cmdlet.

Olaf · September 6, 2018, 3:27pm

Use Notepad++ to check the encoding of the files. There you will see the difference. If you like to have it equally use this:

Get-ChildItem -exclude ‘dir-list*.csv’ | Select-Object fullname,length | ConvertTo-Csv -NoTypeInformation |out-file -FilePath dir-list.csv -Encoding utf8

Get-ChildItem -exclude ‘dir-list*.csv’ | Select-Object fullname,length | Export-Csv -Path dir-list2.csv -Encoding utf8 -NoTypeInformation

ahart162 · September 7, 2018, 4:44am

I found out what is happening but not the why. I went back to double check the files in Notepad++ as @olaf-soyk suggested but couldn’t see any differences. I did notice that Notepad++ had decided that the files had difference encodings.
The smaller file was UTF-8
The larger file UCS-2 BE ROM

I haven’t come across UCS-2 BE ROM encoding before but a quick websearch showed it to be a 16-bit encoding as opposed to the UTF-8 which is 8-bit. I suppose it should have been obvious when I saw the extra empty chars in the hex editor!

Using out-file with -encoding utf8 gives files of equivalent size. There is still some BOM characters at the beginning of the file though. Hope this helps someone.

Olaf · September 7, 2018, 5:39am

BTW: That does not affect the functionality of the files. It takes a little more space and you could save some more “exotic” charachters from the unicode table. But it will work the same as UTF8 encoded files in common environments.

js2010 · September 7, 2018, 7:23am

Yep. I had a thread about this a little while ago. At first I thought it was unix text. PS 5’s Out-File (or “>”) encodes in what Notepad calls “unicode” and most other commands output in what Notepad calls “ansi”. Some applications won’t like it, like Infoblox (for csv import).

Topic		Replies	Views
Why does Powershell output unix text? (really utf-16) PowerShell Help	4	608	August 28, 2018
creating cfg file with odd results PowerShell Help	2	180	August 12, 2015
ConvertTo-CSV not creating valid CSV file PowerShell Help	1	190	August 31, 2016
results of Out-File has extra characters PowerShell Help	5	225	August 1, 2019
Export-CSV Differences PowerShell Help	2	198	May 9, 2024

Convertto-CSV and out-file vs Export-Csv

Related topics