removing multiline blocks from a text file based on a pattern

john-mooper · October 11, 2017, 9:09am

Hello. I have a text file that looks like this:
{
“something” “else”
“even” “2”
“moretext” “704 1696 -40”
}
{
“text” “random”
“odd” “1”
“never” “more”
}
…

Basically it’s a series of multiline blocks enclosed in curly brackets. There can be any number of line between the brackets.
What I need is to search this file for a string and if the string is found then remove the block(s) in curly brackets (including them) that contain it. So for example if I search for ‘more’, it should delete the whole block:
{
“text” “random”
“odd” “1”
“never” “more”
}
If I search for ‘1’, it should remove both blocks.

I’m sure it can be done with regex, but my regex skills are too low and I can’t figure this out.
Any help is appreciated.

jmmurrah · October 11, 2017, 9:26am

Short answer:

$($(get-content sample.txt) -join '').split('}').trimstart('{') | where-object {$_ -notlike "*random*"}

long answer:
get-content will import your text file with each line as an item in an array.

get-content sample.txt

{
"something" "else"
"even" "2"
"moretext" "704 1696 -40"
}
{
"text" "random"
"odd" "1"
"never" "more"
}

Step one would be to use -join to make one long string out of the input.

$(get-content sample.txt) -join ''

{"something" "else""even" "2""moretext" "704 1696 -40"}{"text" "random""odd" "1""never" "more"}

Step 2 would be to then turn that string back into an array by splitting on the closing curly brace.

$($(get-content sample.txt) -join '').split('}')

{"something" "else""even" "2""moretext" "704 1696 -40"
{"text" "random""odd" "1""never" "more"

Step 3 we clean up the opening curly brace

$($(get-content sample.txt) -join '').split('}').trimstart('{')

"something" "else""even" "2""moretext" "704 1696 -40"
"text" "random""odd" "1""never" "more"

Step 4 we use where-object to filter out any items that have the magic word

$($(get-content sample.txt) -join '').split('}') | where-object {$_ -notlike "*random*"}

"something" "else""even" "2""moretext" "704 1696 -40"

john-mooper · October 11, 2017, 10:14am

Jeremy, thank you,that works.
I need to preserve the original file format though (haven’t mentioned that explicitly above), how can I achieve that with your code?

lwajswaj · October 11, 2017, 10:24am

Hi John,

You should use a RegEx like this:

{(.|\n)?more(.|\n)?}

In this case if it finds the word “more” it will mark the whole text within {}; however, it matches even part of the word as well. In your example, it will select both since the first group has the word “moretext” and the second one “more”

john-mooper · October 11, 2017, 10:43am

Leandro, I’m not getting any matches using your regex. Maybe because Get-Content loads the file as array of lines?
edit: but yes, it should include partial matches too, my example above was incorrect, I didn’t notice both block contained ‘more’

lwajswaj · October 11, 2017, 12:40pm

Yeah, Get-Content reads the file as an array and the regex works for a full text per say; try doing a

Get-Content -Path FILE_PATH -Raw

curtis-smith · October 11, 2017, 4:23pm

You are going to run into a problem where it crosses multiple blocks with the {(.|\n)?more(.|\n)?} pattern.

For example: {(.|\n)?never(.|\n)?} matches

{
"something" "else"
"even" "2"
"moretext" "704 1696 -40"
}
{
"text" "random"
"odd" "1"
"never" "more"
}

Rather than just

{
"text" "random"
"odd" "1"
"never" "more"
}

What I would do is first find all of my blocks, then filter out the ones I don’t want, then join all the remaining blocks back together.

Somthing like this:

cls
$exclude = "odd"
((Get-Content -Path "D:\New Text Document.txt" -Raw | Select-String -Pattern "(?s)\{.*?\}" -AllMatches).matches.value | Select-String -Pattern $exclude -NotMatch) -join "`n"

postanote · October 11, 2017, 7:01pm

One more for your consideration…

$RandomData = @’
{
“something” “else”
“even” “2”
“A new record” “704 1696 -40”
}
{
“something” “else”
“even” “2”
“I want this one” “704 1696 -40”
}
{
“text” “random”
“odd” “1”
“never” “more”
}
{
“something” “else”
“even” “2”
“moretext” “704 1696 -40”
}
{
“text” “random”
“odd” “1”
“never” “And this one also”
}
{
“something” “else”
“even” “2”
“moretext” “704 1696 -40”
}
{
“something” “else”
“even” “2”
“The last record” “704 1696 -40”
}
'@

Remove all record entries taht match the string ‘more’

Validate pattern match

$RandomData -match ‘.[^]\b[^?{](.more.)\b(.|\n)?}*}’

True

Get all matches

{
“text” “random”
“odd” “1”
“never” “more”
}
{
“something” “else”
“even” “2”
“moretext” “704 1696 -40”
}
{
“something” “else”
“even” “2”
"m

Remove matches from the the data

$RandomData -replace ‘.[^]\b[^?{](.more.)\b(.|\n)?}*}’

{
“something” “else”
“even” “2”
“A new record” “704 1696 -40”
}
{
“something” “else”
“even” “2”
“I want this one” “704 1696 -40”
}
{
“text” “random”
“odd” “1”
“never” “And this one also”
}
{
“something” “else”
“even” “2”
“The last record” “704 1696 -40”
}

curtis-smith · October 11, 2017, 9:42pm

Nice, I never even considered using -replace, but it makes a lot of sense.

Here is another regex that could be used with -replace and takes less steps to process.

$exclude = "odd"
(Get-Content -Path "D:\New Text Document.txt" -Raw) -replace "{[^\}]*$exclude[^\}]*}(\r\n|\n)"

john-mooper · October 12, 2017, 12:46am

Thanks everyone for the input. I went with Curtis’ code in the end, works well.
At least I got some regexes to study.

Topic		Replies	Views
Search for text then return text inbetween braces PowerShell Help	9	382	May 16, 2024
remove multiple line in file on match PowerShell Help	9	368	May 16, 2024
How to delete contents of a file based on line numbers? PowerShell Help	19	2129	May 16, 2024
Group by block of lines between two strings PowerShell Help	14	230	May 16, 2024
regex to remove brackets PowerShell Help	5	250	May 16, 2024