Read text file and find match

Here is the code I am using

$Gettext = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 

Foreach ($text in $Gettext){
$txtline = Get-Content -Path $text 
   
    Foreach($line in $txtline){
    $line1 = Get-Content -Path $line | select-string -Pattern  'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
    $line1
    }

}

$line1 should give me a match but instead it give me the content of the file.

You are calling Get-Content too many times:

$Gettext = Get-ChildItem C:\Test\FinancialStatements\*.* -Include *.txt 
 
Foreach ($text in $Gettext){
$txtline = Get-Content -Path $text 
 
    Foreach($line in $txtline){
        $line1 = $line | select-string -Pattern  'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
        $line1
    }
}

Keep in mind that if there are 3 lines and line 2 has what you are looking for, $line1 would be blank. You are overwriting $line1 for each line, so on the third line there is no match and it would write nothing.

I wanted to show you an example of search the entire contents of a file. Here is what is in the text file:

Some Column: Some Data
Transaction Type: 3523tergdg
Another Column: More Data

Next we can use the FileSystemObject which was popular in vbScript days to do a .ReadAll() method and get all of the content and search for a regex match:

$objFSO = New-Object -ComObject Scripting.FileSystemObject
$textFile = $objFSO.OpenTextFile("C:\Users\Rob\Desktop\test.txt", 1, $false)
$textFile.ReadAll() -match "Transaction Type:\s+(.*)"
$textFile.Close()

$Matches

Rob,
I would like to use the select-string in the Foreach loop. I will try your way first. So if I have 14 different matches can I use the select-string in the loop?

Thanks
Freddy

I did try your script and once it finds a match it exits. How can I make a loop so it can find more matches

I think you are overcomplicating this. Do all of the files have the same “headers” in them with the data separated by colons like the example file I used above? You are trying to use a regex solution which I think is overboard to just pull data out of colon-separated data. If your files had no similarity and say Transaction Type was just hidden in all of this random junk:

Foo DataFoo DataFoo DataFoo DataFoo DataFoo
Foo DataFoo DataFoo Data Transaction Type: 3523tergdgFoo DataFoo DataFoo DataFoo Data
Foo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo Data
Foo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo Data

Foo DataFoo DataFoo DataFoo DataFoo DataFoo DataFoo Data
Foo DataFoo DataFoo DataFoo DataFoo DataFoo Data

In the above, I would use a regex pattern, which is what you are doing in Select-String. However, if your files have the same headers with a colon to separate data, then I would really consider something like this:

#Get all text files
$results = Get-ChildItem C:\Users\rsimmers\Desktop\* -Include *.txt -File | foreach {
    #Get the content of the file
    $content = Get-Content $_.FullName
    #Create a blank hash table for properties
    $props = @{}
    #Add a property with the File Name
    $props.Add("File Name", $_.Name)
    foreach($line in $content) {
        #Split the line at the colon to make an array
        $arrLine = $line.Split(":")
        #Add the header ($arrLine[0]) and data ($arrLine[1]) into a hash table
        $props.Add(($arrLine[0]).Trim(), ($arrLine[1]).Trim())
    }
   #Create a new object with the properties and store in $results
   New-Object -TypeName PSObject -Property $props
}

$results | Format-Table -AutoSize

I placed 3 text files with the same headers with different data on my desktop and run the script would produce something like this:

File Name Another Column Transaction Type Some Column   
--------- -------------- ---------------- -----------   
test.txt  More Data      3523tergdg       Some Data     
test1.txt Foo Data       35w46235ythdg    Different Data
test2.txt Data Foo       o68rg352ty34tyr  Junk Data   

Now that the files are parsed, you know what file the data came from, you can do whatever you want with it. If you are convinced you have to use regex, to get multiple matches you would do something like this:

$objFSO = New-Object -ComObject Scripting.FileSystemObject
$textFile = $objFSO.OpenTextFile("C:\Users\rsimmers\Desktop\test.txt", 1, $false)
$content = $textFile.ReadAll()
$transaction = ([regex]"Transaction Type:\s+(.*)").match($content).groups[1].value
$anothercolumn = ([regex]"Another Column:\s+(.*)").match($content).groups[1].value
$textFile.Close()
 
$transaction
$anothercolumn

There are a lot of ways to do things. If you really want help, you need to be specific with what you are trying to do and the challenges you are having. Provide examples and expected results. Tell us how you want to use the data.

Rob,

I have 3 + text files in a folder. Each text file have the same info like Transaction Type: Purchase, Cost: 250.00, etc. I already have the code to open Excel and format the sheets. My problem is that it loops through a file once and it goes to the next text file.

This is a sample code:

#start Excel

Add-Type -AssemblyName Microsoft.Office.Interop.Excel
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.add()
$wrksht = $workbook.Worksheets.Item("sheet1")
$wrksht.name = "Hello"
$excel.Visible = $true

$introw = 2

$files = Get-ChildItem -Path C:\Test\FinancialStatements\*.* -Include *.txt
foreach ($txtfile in $files){
     $wrksht.Cells.item($intRow,3) = Get-Content -Path $txtfile | select-string -Pattern  'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
     $introw ++
}

So if a have more than 1 Transaction Type: Purchase it will only match the first one and move to the next Text file. I think I need another foreach loop

Does this code work?

#Get all text files
$results = Get-ChildItem C:\Users\rsimmers\Desktop\* -Include *.txt -File | foreach {
    #Get the content of the file
    $content = Get-Content $_.FullName
    #Create a blank hash table for properties
    $props = @{}
    #Add a property with the File Name
    $props.Add("File Name", $_.Name)
    foreach($line in $content) {
        #Split the line at the colon to make an array
        $arrLine = $line.Split(":")
        #Add the header ($arrLine[0]) and data ($arrLine[1]) into a hash table
        $props.Add(($arrLine[0]).Trim(), ($arrLine[1]).Trim())
    }
   #Create a new object with the properties and store in $results
   New-Object -TypeName PSObject -Property $props
}
 
$results | Format-Table -AutoSize

This script is some what working for me:

#start Excel

Add-Type -AssemblyName Microsoft.Office.Interop.Excel
$excel = New-Object -ComObject Excel.Application
$workbook = $excel.Workbooks.add()
$wrksht = $workbook.Worksheets.Item("sheet1")
$wrksht.name = "Hello"
$excel.Visible = $true

$introw = 2
$files = Get-ChildItem -Path C:\Test\FinancialStatements\*.* -Include *.txt
Foreach($txtfile in $files){
   Foreach ($line in [System.IO.File]::OpenRead($txtfile)){
   $Trans = $line| select-string -Pattern 'Transaction Type:\s(?[a-z].+)' | select -expand Matches | foreach {$_.groups["digit"].value}
   $Trans
   #$wrksht.Cells.item($intRow,3) = $Trans
   $introw ++

   }
   
}

Sample file

Transaction Type: Purchase
Cost:250.00

Transaction Type: Purchase
Cost:250.00

Transaction Type: Purchase
Cost:250.00

Instead of writing to excel line after line it skips two lines and write Transaction Type: Purchase, skips two lines and write. I would like it like so"
Transaction Type: Purchase
Transaction Type: Purchase
Transaction Type: Purchase

Not like:
Transaction Type: Purchase

Transaction Type: Purchase

Transaction Type: Purchase
.

Hi Wilfredo,

The regular expression in your code is invalid.
This part of the expression:

(?[a-z].+)

In a regular expression, the match of something within parenthesis is called a group.
By default, each match of a group is assigned a number as an index. To be able to refer
to a specific group by name you can add the following to the beginning of the group:

?

This is what your regex almost does. Since you later on refer to the group as “digit” this is propably the regex you need:

"Transaction Type:\s(?[a-z].+)"

This will match any row containing “Transaction Type:” followed by a whitespace (like space or tab) followed by one or more of any char. The group “digit” will contain the part after the whitespace.

Regarding the looping, if you want multiple matches from one file to appear in the same cell, you need to combine the values to one string. For example like this:

foreach ($txtfile in $files){
     $Strings = Get-Content -Path $txtfile | 
     	select-string -Pattern  "Transaction Type:\s(?[a-z].+)" | 
     		select -expand Matches | 
     			foreach {$_.groups["digit"].value}
     $wrksht.Cells.item($intRow,3) = $Strings -join ","
     $introw ++
}

If your file contains two matching rows like this:
Transaction Type: Purchase
Transaction Type: Sale
The cell would then have the value: Purchase,Sale

If you would like to add each value to its own row, you should probably do something like this:

foreach ($txtfile in $files){
    Get-Content -Path $txtfile | 
    	select-string -Pattern  "Transaction Type:\s(?[a-z].+)" | 
    		select -expand Matches | 
    			foreach {
    				$wrksht.Cells.item($intRow,3) = $_.groups["digit"].value
        			$intRow++
				}
}

Note how I"ve added a line break after each pipechar (|) which I think makes the code a little easier to read.

EDIT: Tried to get code to display properly but I give up

Ops, I just realized that the part of the regex that I thought was missing from your code was just parsed away by the HTML, sorry!

I’ll try to edit my post to get it correct.

The code you posted worked. but I need to match around 14 different words

13 alternatives to “Transaction Type”?

Does the files you read only contain things you want or alot of other data too?
It is hard to design a search pattern for a file without knowing the whole content of the file.

Here are the contents:

Please see the large trade(s) below:

Transaction Type: PURCHASE
Trade Amount: 250.00
Trade Date: 05/14/15
Settlement Date: 05/15/15
Fund Name and Class: ABCDEFG
Symbol: ABCD
CUSIP Number: 12345
Transfer Agent Fund Number: 123
State Street Fund Number: 1111


Transaction Type: PURCHASE
Trade Amount: 250.00
Trade Date: 05/14/15
Settlement Date: 05/15/15
Fund Name and Class: ABCDEFG
Symbol: ABCD
CUSIP Number: 12345
Transfer Agent Fund Number: 123
State Street Fund Number: 1111

In some text files there is more of the some data. So I may have 5 text files with the same king of data. I need to match the Trade Date: and input the value “05/14/15” in Excel and do the same for each text file.

Ok, so each files contains many transactions and you want to break out Transaction Type and Trade Date from each transaction?

I would suggest looking into the new cmdlet ConvertFrom-String in PowerShell V5.

If that is not an option, try to read each file and split it in a way that each part contains just one transaction, for that part, replace : with = and use ConvertFrom-StringData to create objects.

If each transaction always starts and finishes with the same data, I would use what I posted in the other post you opened for the same thing. Understand that putting the data into Excel is trivial, the important part of this process is parsing the data and making it useable to place in Excel. Your feeding small pieces of information instead of providing examples and requirements up front, so it’s really hard for anyone to actually help you. With that said, here is modified code from the other post:

$results = foreach ($file in (Get-ChildItem -Path C:\Users\rsimmers\Desktop\* -Include *.txt -File )) {
    $content = Get-Content $file.FullName
    #Start a record number, so if there are 4 entries in a file, you would see what
    #was the first entry in the file
    $intRecordNum = 1
    foreach ($line in $content) {
        #Process any line that has a colon in it
        if ($line -like "*:*") {
            #Split the line at the colon to create an array with the header and data
            $arr = $line -split ":"
            #Create a switch with the header
            switch ($arr[0].Trim().ToUpper()){
                "TRANSACTION TYPE" {
                    #Transaction Type is the start of a new record, so we create a blank property hashtable
                    $props = @{}
                    #Add the transaction type data
                    $props.Add($arr[0].Trim(),$arr[1].Trim())
                    #Add the file name for reference
                    $props.Add("FileName", $file.Name)
                    #Add the record number to indicate the sequence of data in the file
                    $props.Add("RecordNumber", $intRecordNum)

                } #TRANSACTION TYPE
                "STATE STREET FUND NUMBER" {
                    #State Street is the last entry of a record, so add the data
                    $props.Add($arr[0].Trim(),$arr[1].Trim())
                    #Since it's the last record, we want to actually create the object
                    #with all of the properties we've gathered.  This is returned to the $results variable
                    New-Object -TypeName PSObject -Property $props
                    #Again, this our last record, so we increment the record number
                    $intRecordNum++
                } #STATE STREET FUND NUMBER
                default {
                    #This is anything in between the start and finish headers, we're just creating a property for it
                    $props.Add($arr[0].Trim(),$arr[1].Trim())
                } #default
            }#switch header name
        } #if line contains a colon
    } #foreach line
} #foreach file
 
$results | Select * | Format-Table -AutoSize

I have 3 text files, one with two transactions, one with 4 and another with 3 using the format you posted above. When I run this against the files, I get the following:

Symbol RecordNumber FileName  State Street Fund Number Settlement Date CUSIP Number Trade Amount Transaction Type Trade Date Fund Name and Class
------ ------------ --------  ------------------------ --------------- ------------ ------------ ---------------- ---------- -------------------
ABCD              1 test.txt  1111                     05/15/15        12345        250.00       PURCHASE         05/14/15   ABCDEFG            
ABCD              2 test.txt  1111                     05/15/15        12345        1354.00      PURCHASE         05/14/15   ABCDEFG            
RVFE              1 test1.txt 5342                     05/15/15        463432       451.00       PURCHASE         05/14/15   REGFSFVF           
RVFE              2 test1.txt 5342                     05/15/15        463432       4845.00      PURCHASE         05/14/15   ABCDEFG            
RVFE              3 test1.txt 5342                     05/15/15        463432       5621.00      PURCHASE         05/14/15   REGFSFVF           
RVFE              4 test1.txt 5342                     05/15/15        463432       250.00       PURCHASE         05/14/15   ABCDEFG            
GFGE              1 test2.txt 8634                     05/15/15        789453       1434.00      PURCHASE         05/14/15   LIOBDMSS           
GFGE              2 test2.txt 8634                     05/15/15        789453       3351.00      PURCHASE         05/14/15   ABCDEFG            
GFGE              3 test2.txt 8634                     05/15/15        789453       121.00       PURCHASE         05/14/15   LIOBDMSS

We now have the filename the transactions came from, the order of the transactions in the file (i.e.RecordNumber) and all of the data for each transaction. Now I highly recommend that you do NOT try to rip this apart and put it in Excel. Validate that that this code correctly parses that data in your files. You should see similar output as above. Again, no Excel, does the data parse correctly? If the data parses, I would convert this code into a function, like Get-TransactionData. Then it’s a simple to get it into Excel.