Group by block of lines between two strings

ricardo-braganca · June 24, 2016, 11:25am

Good afternoon everyone,
I have a .txt file like this:

BLUE|        |     |     |1111  | BEGIN
BLUE|        |     |     |1112  | BTATI
YELL|        |     |     |1113  | CBDE2
YELL|        |     |     |1114  | MATTT
YELL|        |     |     |1115  | MATTT
YELL|        |     |     |1116  | END00
BLUE|        |     |     |1117  | BEGIN
BLUE|        |     |     |1118  | BTATI
BLUE|        |     |     |1119  | CBDE2
BLUE|        |     |     |1110  | MATTT
BLUE|        |     |     |1121  | END00

I want to get the .txt file group by block lines that starts at “BEGIN” and ends with “END00” and the first word of the “BEGIN” line is BLUE and the first word line “END00” is YELL has which replace by YELL BLUE.

BLUE|        |     |     |1111 | BEGIN
BLUE|        |     |     |1112   | BTATI
BLUE|        |     |     |1113  | CBDE2
BLUE|        |     |     |1114  | MATTT
BLUE|        |     |     |1115  | MATTT
BLUE|        |     |     |1116  | END00
BLUE|        |     |     |1117  | BEGIN
BLUE|        |     |     |1118  | BTATI
BLUE|        |     |     |1119  | CBDE2
BLUE|        |     |     |1110  | MATTT
BLUE|        |     |     |1121  | END00

Important: The Output must be a .txt file with the same caractericas the initial (size fields, delimiter …) but with the changes made.

someone can help me?

donj · June 25, 2016, 3:52pm

I would use Import-CSV to import the file, specifying a -Delimiter “|” to have it break on the pipe characters. From there, a ForEach loop would let you enumerate each line one at a time and work with the column values.

ForEach ($line in (Import-CSV whatever.txt -Delim “|”)) {
$line[0] # first column
$line[5] # last column
}

Unfortunately, I’m not understanding what it is you want to do with the data. Additionally, Export-CSV, while it can use the | as a delimiter, might not preserve the column widths. You might need to manage that yourself.

ricardo-braganca · June 25, 2016, 6:03pm

Good evening, first of all thanks for the help.

these lines within the foreach are not valid because if your realizing the example the “END” is not always in position 5, can appear in position 4 or position 6 or in any other.

$line[0] # first column
$line[5] # last column

I think it should be something who searched the strings “BEGIN” and “END”, but do not know what.

What I want to do with the data is something something like this:

foreach {IF (H5 like “BEGIN” and H1 like “BLUE” and H5 Like “END” and H1 like “YELL”) {
H1 replace “YELL”, “BLUE”
}
}

And change the initial document only these cases and keep the rest who do not obey this condition.

How can I manipulate the size of the fields?

random-commandline · June 27, 2016, 10:07am

Do you need to change the word ‘YELL’ to ‘BLUE’? If so, this should work.

(Get-Content \\path\to\textfile) -replace 'YELL','BLUE'

Do all lines start with ‘BLUE’ or ‘YELL’?
If so, do you need to capture all lines between the ‘BEGIN’ and ‘END00’ lines of each text file?
If not, do you need to capture ONLY lines that begin with ‘BLUE’ or ‘YELL’ that are between the
‘BEGIN’ and ‘END00’ lines of each text file?

Should your results look like this?

Group 1
BLUE|        |     |     |1111  | BEGIN
BLUE|        |     |     |1112  | BTATI
YELL|        |     |     |1113  | CBDE2
YELL|        |     |     |1114  | MATTT
YELL|        |     |     |1115  | MATTT
YELL|        |     |     |1116  | END00

Group2
BLUE|        |     |     |1117  | BEGIN
BLUE|        |     |     |1118  | BTATI
BLUE|        |     |     |1119  | CBDE2
BLUE|        |     |     |1110  | MATTT
BLUE|        |     |     |1121  | END00

ricardo-braganca · July 1, 2016, 4:44am

Yes, all lines start with “BLUE” or “YELL”.

I need to capture ONLY lines between the ‘BEGIN’ and ‘END00’ that has “BLUE” in ‘BEGIN’ and “YELL” in ‘END00’.

Next step is for this lines captured replace the “YELL” by “BLUE”.

Important: it is necessary that the columns of the output document have the same dimension of the same document import.

The result I want are something like:

BLUE|        |     |     |1111  | BEGIN
BLUE|        |     |     |1112  | BTATI
BLUE|        |     |     |1113  | CBDE2
BLUE|        |     |     |1114  | MATTT
BLUE|        |     |     |1115  | MATTT
BLUE|        |     |     |1116  | END00
BLUE|        |     |     |1117  | BEGIN
BLUE|        |     |     |1118  | BTATI
BLUE|        |     |     |1119  | CBDE2
BLUE|        |     |     |1110  | MATTT
BLUE|        |     |     |1121  | END00

random-commandline · July 1, 2016, 8:07am

If I am understanding, this should work.

(Get-Content \\path\to\textfile1.txt) -replace 'YELL','BLUE' | Add-Content \\path\to\textfile2.txt

ricardo-braganca · July 1, 2016, 8:58am

But first is missing that:

I need to capture ONLY lines between the ‘BEGIN’ and ‘END00’ that has “BLUE” in ‘BEGIN’ and “YELL” in ‘END00’.

I need to pass only the lines corresponding to this condition and this code passes all the lines and corrects all “YELL” to “BLUE” even those who do not obey to this condition.

random-commandline · July 1, 2016, 12:43pm

I think a switch statement would solve your problem. I am still confused as to why my previous post does not work for you.

$files = Get-ChildItem "\\path\to\textfile1.txt"
$count = $null
switch -Regex -File $files
{
    'BEGIN' {$count++;Write-Verbose "Group $count found" -Verbose} 
    '^BLUE|^YELL' {$_ -replace 'YELL','BLUE' | Add-Content -Path "\\path\to\textfile2.txt"}
    'END00' {continue}
}

ricardo-braganca · July 3, 2016, 5:36am

Good morning,

sorry if I did not explain myself properly,

The problem of the both post are both search every line of the .txt file and when find the string “YELL” replace “YELL” by “BLUE” independently the string are in the first column or in 10th.

A big problem is if you have a string “YELLOW” with the both post the result are “BLUEOW” because it´s made the replace “YELL” by “BLUE”

Another problem is I wont Only replace the “YELL” by “BLUE” When i have something like that:

BLUE|        |     |     |1111  | BEGIN
BLUE|        |     |     |1112  | BTATI
YELL|        |     |     |1113  | CBDE2
YELL|        |     |     |1114  | MATTT
YELL|        |     |     |1115  | MATTT
YELL|        |     |     |1116  | END00

IF I have something like that I don’t Want replace “YELL” by “BLUE”:

YELL|        |     |     |1111  | BEGIN
YELL|        |     |     |1112  | BTATI
YELL|        |     |     |1113  | CBDE2
YELL|        |     |     |1114  | MATTT
YELL|        |     |     |1115  | MATTT
YELL|        |     |     |1116  | END00

curtis-smith · July 3, 2016, 4:51pm

Hi Ricardo,
Here is an example of the logic you need. Basically you have to test each line to determine if is it the appropriate begin or end line and match them up to create your block of text. You can then do your replace on just that block. Let me know if you need explanation on what the below is doing.

$data = @'
BLUE|        |     |     |1111  | BEGIN
BLUE|        |     |     |1112  | BTATI
YELL|        |     |     |1113  | CBDE2
YELL|        |     |     |1114  | MATTT
YELL|        |     |     |1115  | MATTT
YELL|        |     |     |1116  | END00
BLUE|        |     |     |1117  | BEGIN
BLUE|        |     |     |1118  | BTATI
BLUE|        |     |     |1119  | CBDE2
BLUE|        |     |     |1110  | MATTT
BLUE|        |     |     |1121  | END00
BLUE|        |     |     |2111  | BEGIN
BLUE|        |     |     |2112  | BTATI
YELL|        |     |     |2113  | CBDE2
YELL|        |     |     |2114  | MATTT
YELL|        |     |     |2115  | MATTT
YELL|        |     |     |2116  | END00
BLUE|        |     |     |2117  | BEGIN
BLUE|        |     |     |2118  | BTATI
BLUE|        |     |     |2119  | CBDE2
BLUE|        |     |     |2110  | MATTT
BLUE|        |     |     |2121  | END00
'@ -split "`n"

For ($i=0;$i -lt $data.count;$i++) {
    Switch -Regex ($data[$i]) {
        "^BLUE.*BEGIN$" {$bluebegin = $i}
        "END00$" {
                    If (($data[$i] -match "^YELL") -and (Get-variable bluebegin -ErrorAction SilentlyContinue)) {
                        $data[$bluebegin..$i] | ForEach-Object {$_ -replace "^YELL\|", "BLUE|"}
                    } elseif (Get-variable bluebegin -ErrorAction SilentlyContinue) {
                        Remove-Variable bluebegin
                    }
                 }
    }
}

Results:

BLUE|        |     |     |1111  | BEGIN
BLUE|        |     |     |1112  | BTATI
BLUE|        |     |     |1113  | CBDE2
BLUE|        |     |     |1114  | MATTT
BLUE|        |     |     |1115  | MATTT
BLUE|        |     |     |1116  | END00
BLUE|        |     |     |2111  | BEGIN
BLUE|        |     |     |2112  | BTATI
BLUE|        |     |     |2113  | CBDE2
BLUE|        |     |     |2114  | MATTT
BLUE|        |     |     |2115  | MATTT
BLUE|        |     |     |2116  | END00

ricardo-braganca · July 4, 2016, 3:33am

Hi Curtis,

Two questions for now if I want import a txt file instead of a multiple lines how do I do that?
And for export for a txt file with same structure have in the initial files how do I do that?

curtis-smith · July 4, 2016, 1:13pm

Hi Richardo,
You would use get-content to get the content of the text file.

You would use out-file to write the data out to a file. Since the out-put data is being generated in a for loop, you would need to either output the data with out-file using the -append parameter inside the for loop, or collect all the data into a variable inside the for loop and then write it out to file once after the loop has completed.

ricardo-braganca · July 5, 2016, 11:43am

Hi curtis,

I do my changes in the code but the result is not the expected.
I don’t no why. I Think Did I do something wrong. This is what I have.

$data = get-content C:\Users\ricardo.braganca\Desktop\NEW\*.txt 
$Output = "C:\Users\ricardo.braganca\Desktop\NEW\dados.txt"


For ($i=0;$i -lt $data.count;$i++) {
    Switch -Regex ($data[$i]) {
        "^FE.*MAI000$" {$bluebegin = $i}
        "MAIFIM$" {
                    If (($data[$i] -match "^01") -and (Get-variable bluebegin -ErrorAction SilentlyContinue)) {
                        $data[$bluebegin..$i] | ForEach-Object {$_ -replace "^FE\|", "01|"}
                    } elseif (Get-variable bluebegin -ErrorAction SilentlyContinue) {
                        Remove-Variable bluebegin
                    }
                 }
    }
   $_ |Add-Content $Output 
}

curtis-smith · July 5, 2016, 5:48pm

Your Add-content is in the wrong place and adding the wrong content. Additionally, your replace logic is backward from the original example. Replacing FE with 01 is like replacing BLUE with YELL instead of YELL with BLUE. In any case. Here is the right way replacing 01 with FE instead of FE with 01 and using Add-Content in the correct spot.

$data = Get-Content C:\temp\*.txt
$output = "C:\temp\out.txt"

For ($i=0;$i -lt $data.count;$i++) {
    Switch -Regex ($data[$i]) {
        "^FE.*MAI000$" {$bluebegin = $i}
        "MAIFIM$" {
                    If (($data[$i] -match "^01") -and (Get-variable bluebegin -ErrorAction SilentlyContinue)) {
                        $data[$bluebegin..$i] | ForEach-Object {$_ -replace "^01\|", "FE|" | Add-Content $output}
                    } elseif (Get-variable bluebegin -ErrorAction SilentlyContinue) {
                        Remove-Variable bluebegin
                    }
                 }
    }
}

Sample input files:
1.txt

FE|        |     |     |1111  | MAI000
FE|        |     |     |1112  | BTATI
01|        |     |     |1113  | CBDE2
01|        |     |     |1114  | MATTT
01|        |     |     |1115  | MATTT
01|        |     |     |1116  | MAIFIM
FE|        |     |     |1117  | MAI000
FE|        |     |     |1118  | BTATI
FE|        |     |     |1119  | CBDE2
FE|        |     |     |1110  | MATTT
FE|        |     |     |1121  | MAIFIM

2.txt

FE|        |     |     |2111  | MAI000
FE|        |     |     |2112  | BTATI
01|        |     |     |2113  | CBDE2
01|        |     |     |2114  | MATTT
01|        |     |     |2115  | MATTT
01|        |     |     |2116  | MAIFIM
FE|        |     |     |2117  | MAI000
FE|        |     |     |2118  | BTATI
FE|        |     |     |2119  | CBDE2
FE|        |     |     |2110  | MATTT
FE|        |     |     |2121  | MAIFIM

Results: out.txt

FE|        |     |     |1111  | MAI000
FE|        |     |     |1112  | BTATI
FE|        |     |     |1113  | CBDE2
FE|        |     |     |1114  | MATTT
FE|        |     |     |1115  | MATTT
FE|        |     |     |1116  | MAIFIM
FE|        |     |     |2111  | MAI000
FE|        |     |     |2112  | BTATI
FE|        |     |     |2113  | CBDE2
FE|        |     |     |2114  | MATTT
FE|        |     |     |2115  | MATTT
FE|        |     |     |2116  | MAIFIM

Topic		Replies	Views
Grouping/Sum imported CSV (text) PowerShell Help	2	160	December 20, 2018
Join Header column PowerShell Help	3	156	July 2, 2015
Grouping CSV data PowerShell Help	1	115	August 12, 2015
Parsing data into a .csv file PowerShell Help	1	166	October 24, 2017
How to Group data to get the count PowerShell Help	6	146	May 25, 2017

Group by block of lines between two strings

Related Topics