Extracting data from excel and updating variables

Hi all,

I am trying to take content from an excel document to update and append some other code, and struggling to get it to work efficiently. I want to look at the excel file, find new entries, grab 3 cells information and insert them into another PS script.

My excel table looks like the below

Subscription Name Category 1 Category 2
Sub1 A X
Sub2 B Y
Sub3 C Z
And my code looks like below...

Subscription Category 1 =
If(CONTAINS(‘Azure Subscription’,‘Azure Subscription’[Subscription Name],“Sub1”),“A”
,IF(CONTAINS(‘Azure Subscription’,‘Azure Subscription’[Subscription Name],“Sub2”),“B”
,IF(CONTAINS(‘Azure Subscription’,‘Azure Subscription’[Subscription Name],“Sub3”),“C”

Subscription Category 2 =
If(CONTAINS(‘Azure Subscription’,‘Azure Subscription’[Subscription Name],“Sub1”),“X”
,IF(CONTAINS(‘Azure Subscription’,‘Azure Subscription’[Subscription Name],“Sub2”),“Y”
,IF(CONTAINS(‘Azure Subscription’,‘Azure Subscription’[Subscription Name],“Sub3”),“Z”

This information is then used in PowerBI as a Measure, so don’t question the format in terms of PowerShell (it works :slight_smile: )

What I am looking to do, is run a PS script to check the excel file for new lines, take the 3 pieces of information and append them to the end of the code in the lines above!

Thanks in advance!

Not really clear on the content you shared above. CSV is fine , but the other Subscription Category 1 and 2, What are those. Where and How in the code are you putting them ?

Do you have any code which can be shared here ? IT should make anyone helpful to understand your issue clearly.

Hi again,

Sorry - I know it is a bit vague.

Basically, I currently have the below in a PS1 file…

Subscription Category 1 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”A”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”B”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”C”

Subscription Category 2 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”X”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”Y”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”Z”

So lets assume the Excel file gets updated, so a new line is added and now looks like the following

 

Subscription Name Category 1 Category 2
Sub1 A X
Sub2 B Y
Sub3 C Z
Sub4 D W
I want to be able to run a new PS script to look at the table in excel and add the following new lines

Subscription Category 1 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”A”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”B”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”C”

,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub4″),”D”

Subscription Category 2 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”X”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”Y”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”Z”

,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub4″),”W”

The code you are referencing is not Powershell. It appears to be Excel expression language which is related to VBA. There is no Powershell posted that I can see thus far. Typically, to detect new lines you want have a date or a copy of the data from the last run to compare against the current data. Where is the data coming from originally? If Excel can be taken out of the equation, that would be a much better solution.


Hi,

I am aware that the output is not Powershell, I just store it in a PS1 file as it is easier to manage than a .txt file. I don’t know where to start in PowerShell to get it to look at the Excel file (which has to be Excel, as thats how the users currently record the data) and populate the resulting output files, be them PS1 or txt.

I am able to add a new field into the Excel table that records date and time of entry, and if I were to set the PowerShell script to run on a weekly basis I assume I could set it to only look at rows where the date/time was in the last week?

The powershell below was taken from somewhere else but I am not having any luck modifying it…

$xl = New-Object -COM "Excel.Application"
$xl.Visible = $true
$wb = $xl.Workbooks.Open("filename.xlsx")
$ws = $wb.Sheets.Item(1)

for ($i = 1; $i -le 3; $i++) {
if ( $ws.Cells.Item($i, 1).Value -eq $num ) {
$GoLiveDate = $ws.Cells.Item($i, 2).Value
break
}
}

There is a module that may accelerate things:

PS C:\WINDOWS\system32> Find-Module PSExcel

Version    Name                                Repository           Description
-------    ----                                ----------           -----------
1.0.2      PSExcel                             PSGallery            Work with Excel without installing Excel

Anything you are doing in Excel is still using an old Excel COM object, Powershell is just executing the old COM code. For instance, if I wanted to get data from the last row in a certain column, I would run a macro and search for how to do that in vba:

Then you have to convert that VBA to Powershell, so I would search for something like ‘Powershell find xlCellTypeLastCell’

Before you can write a script that solves your problem, you need to have a clear understanding of what you have to start with, what you need to end with, and a process that will move you from the first to the second. In order to get help writing this script, you need to be able to convey those things to other people in a clear and concise way. So, let me try to break this down:

  1. You have an Excel file that is generated by an external company process over which you have no control. The Excel file contains data that you need to extract.
  2. You need an output file that contains specific data from the Excel file, and you want to update the output file regularly as new information is added to it.
  3. You do not have a clear idea of a process to get from (1) to (2).

Let’s take a closer look at these.
In regard to (1), an Excel file (Workbook) contains pages called Worksheets (Sheets) which contain cells with information in them. Excel files also have formatting and frequently empty columns and rows that separate sections for the sake of human readability. This presents a problem when dealing with the file programmatically, because you cannot predict (with an algorithm) which columns/rows will be empty when trying to extract data. When dealing with spreadsheets from a scripting perspective, it’s common to convert them to CSV files, which effectively strips all the formatting and extra fancy features of an Excel file and just gives you the values stored in individual cells, separated with commas and line breaks (columns and rows). The CSV can be read as a simple TXT file, or easily imported into PowerShell and manipulated as a data object.

In regard to (2), the purpose of your output file is unclear. Are you generating a record of changes? Are you extracting a specific set of data from a spreadsheet that contains more information than you need? Does the result need to be human readable? Are you creating a file that will later be used as input for another process? Does the result need to fit a particular format so that it can be used in another program?

If you just need a specific section of the Excel file, converting that particular Worksheet to a CSV might suit your needs and is fairly simple to accomplish in PowerShell. Rather than run a comparison against a previous version of the file (to discover new entries), you could simply overwrite the existing file with a new file whenever the script is executed. Is the date of addition of new information important? Does the output need to be a historical record of changes?
And on that note, what should happen in the output file if information is deleted from the Excel file? Do you also need the script to discover cells that have been removed? That will be more complicated.

In order to create the process (3) you will need to answer most if not all of these questions, and then figure out a logical step-by-step method to get from (1) to (2). Once you’ve done that you can write a PowerShell script that represents that process. Without it, you’re stumbling around blind.

Many thanks to you both for your replies… I fear I may have confused the situation somewhat so I shall seek to clarify the position I am in, in the way it has been articulated but Grokkit above.

  1. I have a bunch of fields in an excel document that is updated externally
    1. I am able to add fields if required, such as a date or time of entry of a new row
    2. I am able to convert to CSV as part of this process, if required
    3. The table looks similar to below, with more and more entries being added by users
Subscription Name Subscription Category 1 Subscription Category 2 Entered
Sub1 A X 1/4/20 09:00
Sub2 B Y 1/4/20 09:00
Sub3 C Z 1/4/20 09:00
Sub4 D W 1/4/20 09:00
2. I have external code that I use in another application (PowerBI) that looks like the lines below. I have a means to automate this process on a schedule, so there is no need for the PowerShell to control or manage when its run.

Subscription Category 1 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”A”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”B”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”C”

Subscription Category 2 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”X”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”Y”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”Z”

This can be stored in any file type, such as .txt if easiest. I simply used PowerShell ISE when doing this manually as it made if easier for me to format and idefity the fields I was manually updating/adding

  1. As per my example in the table above, a user has entered a new line “Sub4”. I would like to create a PowerShell script that opens this spreadsheet (potentially converting to CSV first?) and updates the PowerBI code in an output file. In the example above, this would result in the output file now looking like below, with the bold values identifying what has been pulled from the CSV.

Subscription Category 1 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”A”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”B”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”C”

,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub4″),”D

Subscription Category 2 =
If(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub1″),”X”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub2″),”Y”
,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub3″),”Z”

,IF(CONTAINS(‘Azure Subscription’,’Azure Subscription’[Subscription Name],”Sub4″),”W

Does that make sense now?

Yes, that clarifies a few things.

Because you don’t care about the date or time in the output file, I suggest not bothering with tracking that information at all and not attempting to do any comparison between new and old versions of the input file. Read the contents of the input file each time, process them into the format you need, and then write the processed content to your output file. As a bonus, this will also account for any information deleted from the input file without you having to write specific code to handle it.

You can get the data out of the Excel file directly, and without installing extra PS modules, although using a module that’s built for it may be easier. This article will lead you through finding data in an Excel file and building an object with it: Finding Data in Excel Using PowerShell. Once you can do that reliably, it should be relatively easy to build the output lines from the collected data.

Let’s start with something like this. Excel is the devil. It uses an old COM object and is painful. Take something like this:

To extract information you have to emulate what you would do the in gui to select the cells and the loop through them to get data. Powershell uses PSObject, so you would have to loop through it to generate a PSObject to then export to a CSV to keep a copy for comparison. A bit ugly. Powershell can import a CSV directly into a PSObject and then you get all of the power of Powershell to do comparisons, filters, etc. on the content. If it were me, I would just try to grab the sheet I’m looking for and save it as a CSV like this:

This code create a CSV from the XLSX, if you have a previous csv to perform deltas then it will show you what changed. Otherwise, you and comparing nothing, but Compare-Object is used to output just for consistent output:

$xlCSV = 6
$Sheet = 1 #Can reference index or name
$xls = "C:\Scripts\Subscriptions.xlsx"
$csv = "C:\Scripts\Subscriptions.csv"
$csv_prev = "C:\Scripts\Subscriptions_Prev.csv"

#Get the sheet and create a csv
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $False
$objExcel.DisplayAlerts = $False
$WorkBook = $objExcel.Workbooks.Open($xls)
$WorkSheet = $WorkBook.sheets.item($Sheet)
$WorkBook.SaveAs($csv,$xlCSV)
$objExcel.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($objExcel)

$psCsv = Import-Csv -Path $csv

if (Test-Path -Path $csv_prev) {
    $psCSvPrev = Import-Csv -Path $csv_prev
    #Here we have a new file and previous file, so we pick a property like Subscription Name to see what's changed
    Compare-Object -ReferenceObject $psCSvPrev -DifferenceObject $pscsv -Property 'Subscription Name' -IncludeEqual -PassThru
}
else {
    #Output the data, but you are comparing the same content to iteself, just doing it for consistent output with the SideIndicator
    Compare-Object -ReferenceObject $pscsv -DifferenceObject $pscsv -Property 'Subscription Name' -IncludeEqual -PassThru
}
#Save the prev file for Deltas next run
$pscsv | Export-Csv -Path $csv_prev -NoTypeInformation

The first run you have no previous data to perform deltas, so it would just be like this:

Subscription Name       : Sub1
Subscription Category 1 : A
Subscription Category 2 : X
Entered                 : 1/4/20 9:00
SideIndicator           : ==

Subscription Name       : Sub2
Subscription Category 1 : B
Subscription Category 2 : Y
Entered                 : 1/4/20 9:00
SideIndicator           : ==

Subscription Name       : Sub3
Subscription Category 1 : C
Subscription Category 2 : Z
Entered                 : 1/4/20 9:00
SideIndicator           : ==

Subscription Name       : Sub4
Subscription Category 1 : D
Subscription Category 2 : W
Entered                 : 1/4/20 9:00
SideIndicator           : ==

When I ran this, it generated the previous file. The next time I ran it, I removed a row and added a new row to the XLSX:

Subscription Name       : Sub1
Subscription Category 1 : A
Subscription Category 2 : X
Entered                 : 1/4/20 9:00
SideIndicator           : ==

Subscription Name       : Sub3
Subscription Category 1 : C
Subscription Category 2 : Z
Entered                 : 1/4/20 9:00
SideIndicator           : ==

Subscription Name       : Sub4
Subscription Category 1 : D
Subscription Category 2 : W
Entered                 : 1/4/20 9:00
SideIndicator           : ==

Subscription Name       : Sub 5
Subscription Category 1 : E
Subscription Category 2 : V
Entered                 : 9/16/20 14:21
SideIndicator           : =>

Subscription Name       : Sub2
Subscription Category 1 : B
Subscription Category 2 : Y
Entered                 : 1/4/20 9:00
SideIndicator           : <=

The side indicators show what in the previous (on the left) and the current file (on the right) based on ReferenceObject and DiffrenceObject respectively. If you run it again, then you would see that it’s all == because we overwrote the previous file. You could keep dated copies of the previous files for reference if you wanted to see changes. You could email the changes only to the subscriptions. Now that it’s a PSObject, you have the full flexibility of Powershell.