Powershell Regex and Excel

I have a user who gets a file via email and has to take certain parts from the email and insert it into Excel. So for example: the email come is with the following info:

User: John Doe
Purchase: Laptop
Cost: 1,300.00

For now I am saving the contents of the email into a text file and using Regex to get the pattern and save the value into excel. The problem is that it I have more than one user who purchased a computer, it just reads to first user and that’s it.

This is the code I am using
#start Excel

Add-Type -AssemblyName Microsoft.Office.Interop.Excel
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $true
$workbook = $excel.Workbooks.add()
$sheet1 = $workbook.Worksheets.Item(“Sheet1”)
$sheet1.Name = “Week ending in April”
$sheet2 = $workbook.Worksheets.Item(“Sheet2”)
$sheet2.Name = “April 2015”
$sheet3 = $workbook.Worksheets.Item(“Sheet3”)
$sheet3.Name = “Total by Funds”

#Add Heading Names
$sheet1.Cells.item(1,1) = ‘User’
$sheet1.Cells.item(1,2) = ‘Purchase’
$sheet1.Cells.item(1,3) = ‘Cost’

#Format Spreadsheet
$sheet1.Columns.Item(“A”).columnwidth=“25”
$sheet1.Columns.Item(“B”).columnwidth=“15”
$sheet1.Columns.Item(“C”).columnwidth=“15”

$range = $excel.Range(“A1”, “O1”)
$range.font.bold=$true
#$range.Columns(“A1”, “O1”).HorizonTalAlignment = -4108

#$range.ColumnWidth =“30”

$introw = 2

Start reading the file**************************************************************************************

$file = Get-Childitem C:\Test\FinancialStatements*.* -Include *.txt
foreach ($file in $files)
{

$sheet1.Cells.item($intRow,1) = gc -Path $file | Select-String -Pattern ‘User: (?[a-z]})’ | select -Expand Matches | foreach {$.groups[“digit”].value}
$sheet1.Cells.item($intRow,2) = gc -Path $file | Select-String -Pattern ‘Purchase: (?\d{2}[/]\d{2}[/]\d{2})’ | select -Expand Matches | foreach {$
.groups[“digit”].value}
$sheet1.Cells.item($intRow,3) = gc -Path $file | Select-String -Pattern ‘Cost: (?\d[0-9a-z]+)’ | select -Expand Matches | foreach {$_.groups[“digit”].value}
}

Looks like you’re running everything against the second row. You’ll need to increment the $introw variable in each iteration of the foreach loop:

foreach ($file in $files)
{
    $sheet1.Cells.item($intRow,1) = gc -Path $file | Select-String -Pattern 'User: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $sheet1.Cells.item($intRow,2) = gc -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $sheet1.Cells.item($intRow,3) = gc -Path $file | Select-String -Pattern 'Cost: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $introw++
}

I do have $introw++ at the end. It did not copy over

Can you edit your OP so we can see exactly what the script looks like?

Also, can you use the code formatting tags so that we can read this a little easier? It’s in the top ribbon of the post editor - just put all your code between the ‘pre’ tags.

I would parse the file into a Powershell object and then just select the first line using Select-Object. There is probably a regex that could capture the data, but this works too. Simply split the lines at the semi-colon. If it’s USER it is the start of new record, so initialize the hashtable. If it’s COST, it’s the last part of a record, so generate an object and return to $result. Anything in the middle, add it to the hashtable:

$content = Get-Content C:\Users\Rsimmers\Desktop\test.txt.txt

$results = foreach ($line in $content) {
    $arr = $line -split ":"
    switch ($arr[0].ToUpper()){
        "USER" {
            $props = @{}
            $props.Add($arr[0].Trim(),$arr[1].Trim())
        }
        "COST" {
            $props.Add($arr[0].Trim(),$arr[1].Trim())
            New-Object -TypeName PSObject -Property $props
        }
        default {
            $props.Add($arr[0].Trim(),$arr[1].Trim())
        }
    }
}

$results | Select User, Purchase, Cost | Format-Table -AutoSize

Output:

User        Purchase Cost    
----        -------- ----    
John Doe    Laptop   1,300.00
Sally Smith Desktop  800.00 

Then you would just need to do something like this:

foreach ($record in $results | Select -First 1) {

    $sheet1.Cells.item($intRow,1) = $record.User
    $sheet1.Cells.item($intRow,2) = $record.Purchase
    $sheet1.Cells.item($intRow,3) = $record.Cost
}

I have three text file in a dir and I am using this code to parse each file

$files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
foreach($file in $files)
{

Get-Content $file | Select-String -Pattern 'Transaction Type: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
Get-Content $file | Select-String -Pattern 'Settlement Date: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
Get-Content $file | Select-String -Pattern 'CUSIP Number: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
}

**It reads all three text files

But when try to write to excel, it only read one text file and closes.

$fileDir = Get-ChildItem C:\Test\FinancialStatements\
foreach($file in $files)
{
    $files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
    $sheet1.Cells.item($intRow,1) = Get-Content -Path $file | Select-String -Pattern 'User Name: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value} 
    $sheet1.Cells.item($intRow,2) = Get-Content -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $sheet1.Cells.item($intRow,3) = Get-Content -Path $file | Select-String -Pattern 'Cost:: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $introw ++
    
}

Debug one thing at a time. First you need to determine if your for loop is returning the information you expect and each file is being enumerated, so try this:

$files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
foreach($file in $files) {
    "Processing {0}" -f $file.Name
    $TransType = Get-Content $file | Select-String -Pattern 'Transaction Type: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $SettlementDate = Get-Content $file | Select-String -Pattern 'Settlement Date: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $CUSIP = Get-Content $file | Select-String -Pattern 'CUSIP Number: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}

    "Transaction Type: {0}" -f $TransType
    "Settlement Date:  {0}" -f $SettlementDate
    "CUSIP Number: {0}" -f $CUSIP
}

Are you seeing each file you expect and the expected results?

Rob,

I can see that it’s reading all three text files beacuse of the info it writes to the console. The problem is when I start to write to excel, it only looks at one text file and closes. It does not finish reading the text file and moves to the next text file.

Thanks
Freddy

The code you posted above looks incorrect, it should look like this:

#$fileDir = Get-ChildItem C:\Test\FinancialStatements
$intRow = 0
$files = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
foreach($file in $files) {  
    $sheet1.Cells.item($intRow,1) = Get-Content -Path $file | Select-String -Pattern 'User Name: (?[a-z]})' | select -Expand Matches | foreach {$_.groups["digit"].value} 
    $sheet1.Cells.item($intRow,2) = Get-Content -Path $file | Select-String -Pattern 'Purchase: (?\d{2}[/]\d{2}[/]\d{2})' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $sheet1.Cells.item($intRow,3) = Get-Content -Path $file | Select-String -Pattern 'Cost:: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $intRow ++
}

Rob,
Can you try the below script.

#start Excel

Add-Type -AssemblyName Microsoft.Office.Interop.Excel
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $true
$workbook = $excel.Workbooks.add()
$sheet1 = $workbook.Worksheets.Item("Sheet1")
$sheet1.Name = "Week ending in April"
$sheet2 = $workbook.Worksheets.Item("Sheet2")
$sheet2.Name = "April 2015"
$sheet3 = $workbook.Worksheets.Item("Sheet3")
$sheet3.Name = "Total by Funds"

#Add Heading Names
$sheet1.Cells.item(1,1) = 'Week Ending Date'
$sheet1.Cells.item(1,2) = 'Trade Date'
$sheet1.Cells.item(1,3) = 'Cusip'
$sheet1.Cells.item(1,4) = 'SSID'
$sheet1.Cells.item(1,5) = 'Fund Name and Class'
#$sheet1.Cells.item(1,6) = 'Class'
$sheet1.Cells.item(1,6) = 'Affiliate'
$sheet1.Cells.item(1,7) = 'Type of Trade'
$sheet1.Cells.item(1,8) = 'Subscriptions'
$sheet1.Cells.item(1,9) = '# of Sub Trades'
$sheet1.Cells.item(1,10) = 'Redemptions'
$sheet1.Cells.item(1,11) = '# of Red Trades'
$sheet1.Cells.item(1,12) = 'Net Flows'
$sheet1.Cells.item(1,13) = 'Dealer'
$sheet1.Cells.item(1,14) = '# of Trades'
$sheet1.Cells.item(1,7).HorizonTalAlignment = -4108
#$sheet1.Cells.Item(1,1).font.bold=$true

#Format Spreadsheet
$sheet1.Columns.Item("A").columnwidth="25"
$sheet1.Columns.Item("B").columnwidth="15"
$sheet1.Columns.Item("C").columnwidth="15"
$sheet1.Columns.Item("D").columnwidth="10"
$sheet1.Columns.Item("E").columnwidth="60"
$sheet1.Columns.Item("F").columnwidth="25"
$sheet1.Columns.Item("G").columnwidth="15"
$sheet1.Columns.Item("H").columnwidth="20"
$sheet1.Columns.Item("I").columnwidth="20"
$sheet1.Columns.Item("J").columnwidth="20"
$sheet1.Columns.Item("K").columnwidth="20"
$sheet1.Columns.Item("L").columnwidth="15"
$sheet1.Columns.Item("M").columnwidth="30"
$sheet1.Columns.Item("N").columnwidth="15"

$range = $excel.Range("A1", "O1")
$range.font.bold=$true
#$range.Columns("A1", "O1").HorizonTalAlignment = -4108

#$range.ColumnWidth ="30"

$introw = 2

$files = Get-ChildItem C:\Test\*.* -Include *.txt 
$txtFile = gc $files.
Foreach ($file in $files)
{
    Foreach($txtfile in $file){
           $sheet1.Cells.item($intRow,2) =  Get-Content $txtfile | Select-String -Pattern 'CUSIP Number: (?\d[0-9a-z]+)' | select -Expand Matches | foreach {$_.groups["digit"].value}
    $introw++
    $txtFile 
    }

}

It give me the name of the three text files, but its not reading each text from the beginning to end. It reads the first few line than it goes to the next file and so on. I need it to read the whole text file. I think the excel part is the problem. I am trying different things.

Your logic is still not correct. I think what you are not understanding is that Get-Content returns an array of lines, not the entire text file, so you can’t search for the pattern in the entire file.

-Get the files
–for each file
—get-content of file
----for each line in content
-----if this line has foo in it, parse with this regex

This goes back to my original post, I would parse the files and create objects FIRST and then process with Excel. With that logic, you can also validate if files exist and content exists prior to opening excel and creating all of those sheets to put nothing in it. Re-read my first post and see how I’m checking what is in each line before I create the object.

OK, I will try it