Search multiple files for specific pattern

My first post in this forum:-)

I have folder with some .txt files. I want to search through the files and find lines where this pattern is part of the line: BDAS. The line3 could look like this:

This is a line number 1 with BDAS-1111

This is a line number 2 with BDAS-1111

This is a line number 3 with BDAS-2222

BDAS-1111 and BDAS-2222 should both be recorded once and the result (filename, BDAS-code) should be written to a .csv file.

I have been able to parse line and filename to a .CSV file, so the content will be this:

This is a line number 1 with BDAS-1111;filename.txt

This is a line number 3 with BDAS-2222;filename.txt

This were done via this:

#Get All objects with ‘BDAS’
Get-ChildItem `
-Path $SourceFolderForSplittedFiles -recurse | `
Select-String -pattern “BDAS” -AllMatches | `
Select-Object -Property @{label=‘JIRA’;expression=({$.Line})}, @{label=‘Objekt’;expression=({$.Filename})}| `
#Select-Object -Property @{label=‘JIRA’;expression=({$.Line})}, @{label=‘Objekt’;expression={$ObjektID = $.Filename; @($ObjektID).GetType() } }| `
Export-CSV “C:\ResultFile2.csv” -Delimiter ‘;’

But I would like the output of my .csv file to be:

BDAS-1111;filename

BDAS-2222;filename

How could I accomplish that?

 

The simplest version should be something like this:

Get-ChildItem -Path $SourceFolderForSplittedFiles -Filter *.txt | 
    ForEach-Object {
        $File = $_
        Select-String -Path $_.FullName -Pattern 'BDAS-\d{4}' | 
            Select-Object -Property @{Name='Match';Expression={$_.Matches.Value}}, @{Name='FileBaseName';Expression={$File.BaseName}}
    }

But please please please do not use backticks. That’s the worst style / habbit you can have for Powershell scripts. Especially when you place them after the pipe symbol … that’s a line continuation charachter anyway.

Hi Olaf

Thanks for your (quick) reply - I’ll try it out as soon as possible. So ‘$_.Matches.Value’ will return ‘BDAS-’ plus whatever comes after that (ie BDAS-1111, BDAS-2222)? That was exactly what I was looking for:-)

I haven’t been able to find any documentation for your -Pattern string ‘\BDAS-\d{4}’? What does ‘\d{4}’ mean?

And I’ll not be using backtics from now on:-)

 

 

It works like a charm:-)

Aha, ‘d{4}’ is a regular expression, and search for all occurence of ‘BDAS-’ followed by a 4-digit number:-) I have changed the script a little to this:

Get-ChildItem -Path $SourceFolderForSplittedFiles -Filter *.txt |
ForEach-Object {
$File = $
Select-String -Path $
.FullName -Pattern ‘PAAS-\d{1,4}’, ‘WK-\d{1,4}’ |
Select-Object -Property @{Name=‘Match’;Expression={$_.Matches.Value}}, @{Name=‘FileBaseName’;Expression={$File.BaseName}}
} |
Export-CSV “C:\ResultFile_FB.csv” -Delimiter ‘;’

I’m searching for occurences of both ‘BDAS-’ and ‘WK-’ followed by a number with 1-4 digits. The result is piped to a CSV-file

[quote quote=198326]It works like a charm:-)

I’m searching for occurences of both ‘BDAS-’ and ‘WK-’ followed by a number with 1-4 digits. The result is piped to a CSV-file[/quote]

I’m glad I could be of help and I’m proud that you figured out by yourself. Great. You can simplyfy your pattern a little bit like this:

Select-String -Path $_.FullName -Pattern '(?:PAAS|WK)-\d{1,4}'

Here you have good source to learn more about regex: https://www.regular-expressions.info

BTW: You can format your code for this forum by marking the code you pasted and click on the code tag button labeld “PRE”.

I’m struggling with a little modification. Each line starts with a date in format DDMMYY. But how can I get this info into My csv file? Some kind of “copystring” would maybe Do the trick, but I dont know what string to copy from?

Could you please share some sample data to actually show what you mean? Of course you should sanitize or obfuscate sensitive information.

And please format it as code by clicking the “preformatted text” button ( </> ) and pasting the data into the space between the tags.

Thanks in advance

Hi Olaf
Thx for quick reply:-) The files looks like this

    LOCAL PROCEDURE UpdatePDFViewer@50000();
    VAR
      TempBlob@50000 : Record 99008535;
      Calls@50001 : Integer;
    BEGIN
      CALCFIELDS("PDF Document");
      TempBlob.Blob := "PDF Document";
      CurrPage.PDFViewerFactBox.PAGE.LoadPDF(TempBlob)
    END;

    BEGIN
    {
      100117  BNP EV.001  EVSBA-122  Inserted the field "Possible requsitions" in the group Invoice Details.
      250419  MMM EV50.01 PAAS-1800  Added field "Job No."
      090519  YYY EV50.02 PAAS-1801 GB-Code Added Fields: "Buy-from County", "Ship-to County", "Pay-to County", "Invoice Receipt Date"
      060609  MMM EV50.03 PAAS-1829 Logic about "Assigned User ID" not editable
      011119  YYY EV50.04 PAAS-1988 New Action Group: Document
                                    New Function: UpdatePDFViewer
                                    Code Added on: OnAfterGetCurrRecord
      300320  XXX EV50.05 PAAS-2064 New Action: MatchedDocuments
    }
    END.
  }
}

I would like to extract both the values starting with ‘PAAS-xxxx’ and from that line the date information ie 300320

$Path    = 'D:\sammple\*.txt'
$Pattern = '^\s+(?<DayMonthYear>\d{6}).+(?<Match>(?:PAAS|WK)-\d{1,4})'

Select-String -Path $Path -Pattern $Pattern |
ForEach-Object {
    [PSCustomObject]@{
        FileName    = $_.Path
        LineNumber  = $_.LineNumber
        DateMatch   = $_.matches.groups | Where-Object -Property Name -EQ -Value DayMonthYear
        DateTime    = [datetime]::ParseExact(($_.matches.groups | Where-Object -Property Name -EQ -Value DayMonthYear), 'ddMMyy', $null)
        Match       = $_.matches.groups | Where-Object -Property Name -EQ -Value Match
        MatchedLine = $_.Line
    }
} |
Export-CSV 'C:\ResultFile_FB.csv' -Delimiter ';' -NoTypeInformation

Now you have to learn about named groups in regular expressions to understand what we just did here with the regex pattern. :wink:
And I used a PSCustomObject to make the code somewhat easier to read. Here you can read more about it:

I added some additional properties for convinience such as LineNumber, the actual DateTime from match of DayMonthYear string (ddMMyy) so you can sort it properly for the date and the complete line just for reference or to check. Of course you can remove them if you don’t need them.

Hy Olaf
This works just like a charm:-) I knew about the PSCustomObject but have never used it. And the regex is just awsome:-) I needed to iterate the .txt files in a folder, so my script ended up like this:

$Pattern = '^\s+(?<DayMonthYear>\d{6}).+(?<Match>(?:PAAS|WK)-\d{1,4})'


Get-ChildItem -Path $SourceFolderForSplittedFiles -Filter *.txt | 
    ForEach-Object {
        Select-String -Path $_.FullName -Pattern $Pattern |
        ForEach-Object {
            [PSCustomObject]@{
                FileName    = $_.Path
                LineNumber  = $_.LineNumber
                DateMatch   = $_.matches.groups | Where-Object -Property Name -EQ -Value DayMonthYear
                DateTime    = [datetime]::ParseExact(($_.matches.groups | Where-Object -Property Name -EQ -Value DayMonthYear), 'ddMMyy', $null)
                Match       = $_.matches.groups | Where-Object -Property Name -EQ -Value Match
                MatchedLine = $_.Line
            }
        }
    } |
Export-CSV $PathResultFile -Delimiter ';' -NoTypeInformation

The construct with a ForEach-Object inside a ForEach-Object may not be the most pretty one, but it does the job:-)
Again: Thanks a lot for helping:-)

I assumed that from the code you posted previously. But you don’t need another loop for that. You can provide a path with a wildcard in it for Select-String just like I showed in my code suggestion. That should even be a little faster.

That’s the main objective. :wink: :+1:t4: