Reading from Excel and Exporting to CSV?

by yodafett at 2012-09-12 14:57:02

Hi all, first time poster.

Disclaimer: I have LITTLE scripting experience, mostly with VB, and more than a couple years ago. But as Powershell is the new hotness, and has a LOT of extra features for SysAdmin, and as such, I’m trying to make a leap.

I have a task of pulling some data from about 700 Excel worksheets and compiling them into a single .CSV file.

The Excel files will all be formatted identically, and it is the same data I need from each one, and the files are named sequentially, which are all bonuses.
In essence, I need to read cells A1, B2, C3, D3 from file 000001.xlsb, then A1, B2, B3, D3 from 000002.xlsb and export the whole thing into a single .csv file.

I’m somewhat lost about how to even start, frankly.

I’ve done some google and have bits and pieces, but I’m not getting my head around the logic of pulling it all together.

Obviously, a counter/loop to track how many files have been gone through/how many total… (that part will probably be the easiest part)
Or more likely something along the lines of:

$Directory = "c:\file_folder"

Get-ChildItem -path $Directory *.xlsb | ForEach-Object #insert cmdlet/script to extract the data here

I’ve found how to pull cell data using bits like:
$XLSDoc = “C:\file_folder\test.xlsb”
$SheetName = “Sheet1”
$Excel = New-Object -ComObject “Excel.Application”

$Workbook = $$XLSDoc)
$Sheet = $Workbook.Worksheets.Item($SheetName)

$Sheet.Cells.Item(1,1).Text #(row,col)

I assume the best is to store each cell I need as an individual variable, then string them together before appending them to the exported data?

I’m planning to use Dmitry Sotnikov’s Export-CSV -Append cmdlet for the appending. ( … sv-append/)

I guess I am wondering if someone could offer some pointers on how to pull it all together? I kind of feel like I have most of the pieces, but no real clue of what the final picture should look like… Would love if someone could offer some assistance to a n00b trying to get his feet wet in the deeper end of the pool.
by DonJ at 2012-09-12 15:06:14
So, there’s no native cmdlet that’ll consume an Excel file. You could certainly make on, in the form of a function.

function Get-ExcelContent {
foreach ($obj in $file) {
# $obj contains a single file object. Use its Name, Path, or other property
# to open the file.

That’s how you’d build the basic framework. In terms of the output you’d create… you gotta think more object oriented.

# assume $a1 is cell A1, $b2 is B2, $c3 is C3, and $d3 is D3. Just for illustration.
$props = @{‘A1’=$a1;‘B2’=$b2;‘C3’=$c3;‘D3’=$d3}
$out = New-Object -TypeName PSObject -Prop $props
Write-Output $out

That creates output another cmdlet can consume.

Get-ChildItem -path $Directory .xlsb | Get-ExcelContent | Export-CSV -FilePath whatever.csv -Append

Along those lines.
by yodafett at 2012-09-15 09:25:46
Thanks for the quick reply and assistance!

Ok, I’ve got it pulling from Excel, I’ve got the data formatting, and I have it appending to the .csv, but I have some kind of formatting issue now.

I gather data from cells with:
$date5000 = $ExcelWorkSheet.cells.Item(3,1).Text

and I concatenate them into a single string with:
$fifty = ($date5000+","+$Denom5000+","+$BegBal5000+","+$EndBal5000+","+$CashIn5000+","+$CashOut5000+","+$ShipIn5000+","+$ShipOut5000)

When I
write-Output $fifty, to visually verify what it’s pulling and concatennating, I get the correct output:
“Processed on: 11/07/2011,$50 ,345,000.00,215,000.00,0.00,130,000.00,0.00,0.00"

But when I pass the following to export and append to CSV
write-Output $fifty | Export-CSV -Path C:\Users\raubrey\Desktop\ICM\whatever.csv -Append
I get the following in the .CSV file
#TYPE System.String

As I understand, I’m extracting only text formatted data, and am storing it in a String as opposed to an array. It outputs on console correctly, why is it hosing writing to .CSV?
by DonJ at 2012-09-15 10:02:33
PowerShell doesn’t really like strings. Or, at least, it doesn’t work with them as well as objects.

$props = @{‘Date’=$date5000;
$obj = New-Object -Type PSObject -Prop $props
Write-Output $obj

That constructs an object with all the data for a single line. You’ll essentially repeat that for each line you want in the CSV. The labels in single quotes will become the column headers. Remember that a CSV only has one header row, so every object you create for the data lines needs to use the same column headers.

Your problem was that you were constructing your own preformatted CSV text, and then asking the Export-CSV command to turn that into CSV data. But it already was CSV data. Whenever you find yourself manually formatting strings in PowerShell - as you were doing by concatenating in all those commas - you’re probably working at cross-purposes with the shell.

If you take what I’ve given you here and view it in the shell console, it should appear as a list (since it has more than 4 columns; if it had 4 or fewer, it’d be in a table). That’s okay - it’s exactly what you want. If you then pipe the output to Export-CSV, you’ll get a properly formatted CSV file. The trick is to always remember that PowerShell is an object-oriented shell, not a text-oriented shell. It wants to work with objects, which are basically just in-memory data structures. Think of an object as a table row, with the properties as its columns. All of PowerShell’s commands are designed to deal with that kind of data structure - not text.

(If you’re interested in a more cogent walkthrough of this whole object thing, consider picking up my book, Learn Windows PowerShell 3 in a Month of Lunches at The whole text-vs-objects thing is crucial to understanding how to use PowerShell effectively, and it’s something that trips up most newcomers in the beginning.)
by yodafett at 2012-09-18 07:34:06
Can you tell me why it is outputting them in a different order than what they are assigned to the array in? I actually copied the code you supplied above, so they should be in that order, but when I run it, it outputs the order as:

CashIn Denom ShipOut CashOut EndBal BegBal Date ShipIn

I need them actually set in a specific order in the .csv, and I assumed building them in an array like that would preserve it, but I guess it doesn’t?

Thanks again for all the help, I think this will finish the script off, aside from a little polishing…
by yodafett at 2012-09-18 09:36:22
Ok, so the [ordered] command isn’t working with the hashtable.

$Fifty = [ordered]@{‘Date’=$date5000; ‘Denom’=$denom5000; ‘BegBal’=$BegBal5000; ‘EndBal’=$EndBal5000; ‘CashIn’=$CashIn5000; ‘CashOut’=$CashOut5000; ‘ShipIn’=$ShipIn5000; ‘ShipOut’=$ShipOut5000}

it kicks back this error: ordered : Unable to find type [ordered]: make sure that the assembly containing this type is loaded.
+ CategoryInfo : InvalidOperation: (ordered:String) [], RuntimeException
+ FullyQualifiedErrorId : TypeNotFound

new-object : Cannot validate argument on parameter ‘Property’. The argument is null or empty. Supply an argument that is not null or empty and then try the command again

So I prefixed all of the name fields with numerals and did a sort command, a la

$Fifty = @{‘1Date’=$date5000; ‘2Denom’=$denom5000; ‘3BegBal’=$BegBal5000; ‘4EndBal’=$EndBal5000; ‘5CashIn’=$CashIn5000; ‘6CashOut’=$CashOut5000; ‘7ShipIn’=$ShipIn5000; ‘8ShipOut’=$ShipOut5000}
$fiftyfinal = $fifty.GetEnumerator() | Sort Name

Orders them correctly, but breaks each one down to individual rows now, so the output looks exactly like the following, instead of being on a single line now (in the actual .csv file, not just on the host)…

1Date Processed on: 11/07/2011
2Denom $50
3BegBal 345,000.00
4EndBal 215,000.00
5CashIn 0.00
6CashOut 130,000.00
7ShipIn 0.00
8ShipOut 0.00

I seem to keep losing the formatting I want (and which your snipped above provided) anytime I try to enforce a specific order on the contents…

EDIT: Awesome, apparently the [ordered] conversion only works with literals, not variables, so that wouldn’t have helped me much regardless… on to investigating ordereddictionary objects to see if they can help with this.
by poshoholic at 2012-09-18 20:58:28
Are you using PowerShell 3? Because [ordered] was only introduced in that version.

If you’re using PowerShell 3, replace [ordered] in your script above with [pscustomobject]. That should give you a custom object with ordered property names.

If you’re using PowerShell 2, you can create a custom object with no properties and then use Add-Member to add the properties to that object in an ordered fashion. That’s how I ensure property order in PowerShell 2.
by yodafett at 2012-09-19 06:11:45
Yes, I downloaded PowerShell3. As I understand it, [ordered] only works with literal values or assignments. Does [pscustomobject] work with variables?

I modified the code to $Fifty = [pscustomobject]@{‘1Date’=$date5000; ‘2Denom’=$denom5000; ‘3BegBal’=$BegBal5000; ‘4EndBal’=$EndBal5000; ‘5CashIn’=$CashIn5000; ‘6CashOut’=$CashOut5000; ‘7ShipIn’=$ShipIn5000; ‘8ShipOut’=$ShipOut5000}
write-output $fifty

But it still outputs out of order, as
[quote]1Date Processed on: 11/07/2011
4EndBal 215,000.00
7ShipIn 0.00
6CashOut 130,000.00
8ShipOut 0.00
2Denom $50
5CashIn 0.00
3BegBal 345,000.00[/quote]

When I run $host.version I get
[quote]Major Minor Build Revision
----- ----- ----- --------
2 7 1 1305 [/quote]

I tried this
$fifty = New-Object -Type System.Collections.Specialized.OrderedDictionary

write-output $fifty

And again, I’m back at a point of it keeping things in order, but breaking each item into individual rows… I just can’t seem to get it to output them in order, on a single row…
by poshoholic at 2012-09-19 07:02:06
Wait, your results from $host.Version seem to show you’re working in a different host. PowerSE maybe? Here’s what you should get from the native console or PowerShell ISE with version 3 installed:

Major Minor Build Revision
----- ----- ----- --------
3 0 -1 -1

What do you get when you invoke $PSVersionTable? Before going further, we need to make sure you are working with the RTM version of PowerShell 3 and not one of the earlier consumer preview or release preview versions. Assuming you have the correct RTM version installed, try using ordered or pscustomobject from native PowerShell or PowerShell ISE. They should both work, even when assigning to a variable.
by yodafett at 2012-09-19 07:18:58
Yes, I am indeed using PowerSE. if that is causing an issue, I can certainly switch over, I just like the interface of it.

In PowerSE, $PSVersionTable, I get
[quote]CLRVersion 2.0.50727.5456
BuildVersion 6.1.7601.17514
PSVersion 2.0
WSManStackVersion 2.0
PSCompatibleVersions {1.0, 2.0}
PSRemotingProtocolVersion 2.1[/quote]

In the native console, $host.version and $PSVersionTable return
[quote]Major Minor Build Revision
----- ----- ----- --------
3 0 -1 -1[/quote]

[quote]WSManStackVersion 3.0
PSCompatibleVersions {1.0, 2.0, 3.0}
BuildVersion 6.2.8370.0
PSVersion 3.0
CLRVersion 4.0.30319.269
PSRemotingProtocolVersion 2.2[/quote]

I’ll try [ordered] and the pscustomobject in native and see what happens. Will that help with keeping the output on a single line?
by poshoholic at 2012-09-19 07:25:55
For PowerShell 3, you should definitely use PowerShell ISE. PowerSE doesn’t support PowerShell 3 yet (if you look at the PSVersion when you invoke $PSVersionTable from PowerSE it shows you that it is using version 2).

Regarding getting tabular output with everything on a single line, have a look at this blog post:
by yodafett at 2012-09-19 07:53:50
Ok, again, I have LITTLE scripting experience, but either I’m not following that post, or I’m going in circles. My take away from that was to use it as a single string, which I tried earlier, but it gave me incorrect output, stuff like
#TYPE System.String

instead of the actual output of the string. That’s why I started using hashtables and the [ordered] or [pscustomobject] dictionaries, which kept the data in tact, but it’s on multi-line output. Do I need start from scratch and look at tables now?

Don’t get me wrong, I’m learning a LOT with this script that I thought would be an easy starting point, but I’m just getting a little frustrated that it seems every step forward is a step backward, lol…

This is the body of my code so far:
#Initializing Excel
$excel = New-Object -com excel.application
$excel.Visible = $false
$excel.DisplayAlerts = $false

#Declaring Excel file properties
$sheet = ‘Daily_Inventory_Report_PEORIA_F’
$ExcelWorkbook = $“C:\Users\raubrey\Desktop\ICM\test.xlsb”)
$ExcelWorkSheet = $Excel.Worksheets.Item($Sheet)

#Declaring variables and pulling data from the cells based on denomination determined by suffix codes
$date5000 = $ExcelWorkSheet.cells.Item(3,1).Text
$denom5000 = $ExcelWorkSheet.cells.Item(4,4).Text
$BegBal5000 = $ExcelWorkSheet.cells.Item(6,4).Text
$EndBal5000 = $ExcelWorkSheet.cells.Item(26,4).Text
$CashIn5000 = $ExcelWorkSheet.cells.Item(15,4).Text
$CashOut5000 = $ExcelWorkSheet.cells.Item(24,4).Text
$ShipIn5000 = $ExcelWorkSheet.cells.Item(14,4).Text
$ShipOut5000 = $ExcelWorkSheet.cells.Item(23,4).Text

$Fifty = [ordered]@{‘Date’=$date5000; ‘Denom’=$denom5000; ‘BegBal’=$BegBal5000; ‘EndBal’=$EndBal5000; ‘CashIn’=$CashIn5000; ‘CashOut’=$CashOut5000; ‘ShipIn’=$ShipIn5000; ‘ShipOut’=$ShipOut5000}
write-Output $fifty | Export-CSV -Path C:\Users\raubrey\Desktop\ICM\whatever.csv -Append

My goal is to have that whole string of data, the $fifty to output on a single line in the whatever.csv file. I will have to cycle through that, 12 times, for about 600 .xlsb files. The cycling is an issue I’m not even worried about yet, until I get the output working right.
by poshoholic at 2012-09-19 08:15:15
Oh, sorry, you’re using CSV. My mistake then, you don’t need that blog post.

Just switch from [ordered] to [pscustomobject] so that you’re creating and outputting objects and not hashtables, like this:
$Fifty = [pscustomobject]@{‘Date’=$date5000; ‘Denom’=$denom5000; ‘BegBal’=$BegBal5000; ‘EndBal’=$EndBal5000; ‘CashIn’=$CashIn5000; ‘CashOut’=$CashOut5000; ‘ShipIn’=$ShipIn5000; ‘ShipOut’=$ShipOut5000}
$Fifty | Export-CSV -Path C:\Users\raubrey\Desktop\ICM\whatever.csv -Append
by yodafett at 2012-09-19 08:21:30
Well, that gets back into a single line, but it gives me what appears to be property output instead of data output again. I can’t seem to get the correct combination of data and single line…

[quote]FALSE FALSE FALSE System.Collections.Hashtable+KeyCollection System.Collections.Hashtable+ValueCollection System.Object 8[/quote]

Again, thanks for the quick reply. I really DO appreciate the help… this whole thing just is reminding me why I went into PC and networking support instead of OO programming, lol
by juneb_msft at 2012-09-19 09:24:09
Just FYI, I’m watching this thread carefully. If I need to add anything to the documentation of [ordered] or [pscustomobject], please let me know.
by yodafett at 2012-09-19 09:34:23
Ok, it’s ugly as sin, and cludgy, but I think I have a work around.

#Initializing Excel
$excel = New-Object -com excel.application
$excel.Visible = $false
$excel.DisplayAlerts = $false

#Declaring Excel file properties
$sheet = ‘Daily_Inventory_Report_PEORIA_F’
$ExcelWorkbook = $“C:\Users\raubrey\Desktop\ICM\test.xlsb”)
$ExcelWorkSheet = $Excel.Worksheets.Item($Sheet)

#Declaring variables and pulling data from the cells based on denomination determined by suffix codes
$date5000 = $ExcelWorkSheet.cells.Item(3,1).Text
$denom5000 = $ExcelWorkSheet.cells.Item(4,4).Text
$BegBal5000 = $ExcelWorkSheet.cells.Item(6,4).Text
$EndBal5000 = $ExcelWorkSheet.cells.Item(26,4).Text
$CashIn5000 = $ExcelWorkSheet.cells.Item(15,4).Text
$CashOut5000 = $ExcelWorkSheet.cells.Item(24,4).Text
$ShipIn5000 = $ExcelWorkSheet.cells.Item(14,4).Text
$ShipOut5000 = $ExcelWorkSheet.cells.Item(23,4).Text
$fifty = ($date5000+”:"+$Denom5000+":"+$BegBal5000+":"+$EndBal5000+":"+$CashIn5000+":"+$CashOut5000+":"+$ShipIn5000+":"+$ShipOut5000)
add-Content -Path c:\Users\raubrey\Desktop\icm\test2.txt -value “$fifty"

I output the data string to a text file, open in WordPad, do a couple Replace All for some formatting clean up, and Save As .csv.

Now I just need to have it loop through every file in the directory… Back to Google.
by poshoholic at 2012-09-19 09:42:50
Are you deleting your old file before you run this again with Export-Csv? i.e. What happens if you remove the -Append flag? Sure you may need append later (or maybe not, depending on how you structure your script), but you should make sure you are getting back to a clean state.

Here’s an example showing this working on PowerShell 3, with objects being properly written to the csv file:
PS C:> $drSeuss = [pscustomobject]@{‘Onefish’=1;‘Twofish’=2;‘Redfish’=‘Red’;‘Bluefish’=‘Blue’}
PS C:> $drSeuss

Onefish Twofish Redfish Bluefish
------- ------- ------- --------
1 2 Red Blue

PS C:> $drSeuss | Export-Csv C:\drSeuss.csv -NoTypeInformation
PS C:> Get-Content C:\drSeuss.csv

You really shouldn’t have to resort to writing text content to the file. There’s just something missing in the steps you are taking.

My recommendation is:
1. Delete the csv file so that you start fresh.
2. Remove the -Append parameter of Export-Csv so that you create the file new every time.
3. Verify your assumptions along the way. i.e. Don’t just create a script and run it blindly. Do it in steps. Run the content up to the point where you have your $fifty object loaded. Then output that object to the console, see if it looks right. It will display in a list format due to the number of properties you have, but when exporting it to csv you’ll get one single row for that object. Do the values appear correct when you display $fifty (by simply invoking $fifty at the command line)? If not, then you have a problem with how you set those values. If so, then you should be able to export it to csv without issues.
by yodafett at 2012-09-19 10:37:51
Well, I have it working with via the add-content method, and while ugly, it’s working at least. now I just need to get the looping to work.

I’m using

$files=get-ChildItem -Path $Path -Filter "
foreach ($file in $files)
#Initializing Excel
$excel = New-Object -com excel.application
$excel.Visible = $false
$excel.DisplayAlerts = $false

#Declaring Excel file properties
$sheet = ‘Daily_Inventory_Report_PEORIA_F’
$ExcelWorkbook = $$file)
$ExcelWorkSheet = $Excel.Worksheets.Item($Sheet)

My code above

and it dies with
[quote]open : Exception calling “Open” with “1” argument(s): "‘02232012.xlsb’ could not be found. Check the spelling of the file name, and verify that the file location is correct.

If you are trying to open the file from your list of most recently used files, make sure that the file has not been renamed, moved, or deleted."

At C:\Users\raubrey\Desktop\ICM\data\ICM V2.ps1:13 char:39
+ $ExcelWorkbook = $ <<<< ($file)
+ CategoryInfo : NotSpecified: (:slight_smile: [], MethodInvocationException
+ FullyQualifiedErrorId : ComMethodTargetInvocation

Item : You cannot call a method on a null-valued expression.[/quote]

The .xlsb files ARE in that directory, and if I do a

$files=get-ChildItem -Path $Path -Filter "
write-output $files

It DOES return a list of the files, so doesn’t that mean that the issue is with
#Declaring Excel file properties
$sheet = ‘Daily_Inventory_Report_PEORIA_F’
$ExcelWorkbook = $$file)
$ExcelWorkSheet = $Excel.Worksheets.Item($Sheet)
section? It’s not liking to pass the file names as a $file variable?
by poshoholic at 2012-09-19 10:42:50
Change $$file) to $$file.FullName). That will pass the full path to the file to the open method so that Excel can open it properly.
by yodafett at 2012-09-19 11:16:14
You’re a rockstar! It’s ugly, it’s clunky, it’s like my first car, but I only have to do this once to back load 3 years worth of inventory data, so after it finishes, it’s done. Thanks again for all the help! :slight_smile:
by poshoholic at 2012-09-19 11:20:27
I’m glad you got it working. :slight_smile: