Could someone provide some pointers for me that will allow me to extract 3 values that are grouped together from a .txt file.
The contents of the text file will be multiple email notifications, and the 3 values will need to be grouped together by email i.e. Email 1 = Value 1, Value 2, Value 3. The values can be identified in each email by a preceding word e.g. “Customer Id:”, but other than that, they sit in a singular string and are not delimited in any way.
The most progress I’ve made so far is with regex.matches($my_text).value, where “regex” refers to an expression that identifies one of the values I need, but I haven’t really got an understanding of how it’s working so struggling to develop it further.
Your description is a little vague. I think it would be helpful when you post some sample data and ideally the code you already have so far.
There are some cmdlets available for tasks like this. I’d recommend to read the complete help including the examples for
It should be the best way to extract the text you’re after. Then you could use
To group the result in the required way.
Regardless of that - when you post code, error messages, console output or sample data please format it as code using the preformatted text button ( </> ).
Thank you for the response. I had initially tried Select-String with a -Pattern parameter, but it would return the full string, rather than just the values I wanted from the String.
For example, the text below is an example of a text file, containing 2 strings, each containing the 3 values I want.
Lorem ipsum dolor sit amet consectetur adipiscing House No: 1234567 elit Aliquam dapibus congue Street No: 6845234 arcu sed fringilla Duis id ligula vel risus tristique mattis suscipit Customer Id, 8-7654123 et metus In ipsum lectus faucibus non lacus
Lorem ipsum dolor sit amet consectetur adipiscing House No: 9876543 elit Aliquam dapibus congue Street No: 2796481 arcu sed fringilla Duis id ligula vel risus tristique mattis suscipit Customer Id, 8-6684523 et metus In ipsum lectus faucibus non lacus
The values I want from that string are the just the numbers following “House No:”, “Street No:”, and “Customer Id”.
Which is definitely progress, but have got stuck with a couple of aspects. One, why is it working without a ForEach? Two, how to I get the other values in there (Street No and Customer Id) and outputted to a readable format…
It returns actually much much much more than that. Try to pipe the output to a Select-Object * and you will see what I mean.
In your case where you’re looking for more than one match per line it might be easier for a beginner to take a little more procedural approach. You read the file or input text line by line and treat each single line with three separate regex patterns … something like this:
$InputText = @'
Lorem ipsum dolor sit amet consectetur adipiscing House No: 1234567 elit Aliquam dapibus congue Street No: 6845234 arcu sed fringilla Duis id ligula vel risus tristique mattis suscipit Customer Id, 8-7654123 et metus In ipsum lectus faucibus non lacus
Lorem ipsum dolor sit amet consectetur adipiscing House No: 9876543 elit Aliquam dapibus congue Street No: 2796481 arcu sed fringilla Duis id ligula vel risus tristique mattis suscipit Customer Id, 8-6684523 et metus In ipsum lectus faucibus non lacus
'@
$InputText -split '\n' |
ForEach-Object {
[PSCustomObject]@{
HouseNo = $($_ -match '(?<=House\sNo:\s)(\d{7})' | Out-Null ; $Matches[1])
StreetNo = $($_ -match '(?<=Street\sNo:\s)(\d{7})' | Out-Null ; $Matches[1])
CustomerID = $($_ -match '(?<=Customer\sId,\s)(\d-\d{7})' | Out-Null ; $Matches[1])
}
}
You just have to come up with a strategy if there are lines with only one or two of the patterns matching. In such cases you would get either no results or potentially wrong results because the automatic variable $Matches is only populated when the -match operator returns $true.
What do I do about the lines that don’t contain any matches, but are screwing up my results?
In my actual text file, because the contents are saved from an email, there are many additional lines of text that don’t contain information I want i.e. each email generates 15 lines of text, only the last one contains the information I need.
If Select-String excludes the strings that don’t match, I guess I should be using your second suggestion then?
The text file is multiple emails in one file. The number will vary, one day it might be 3, one day it might be 300 if something breaks.
Ah, so that would explain why it returns the last found matching value per line, when there is no matches values? How would I remove it then? And would this prevent it from running on the lines where there are no matches?
If the lines you’re after ALWAYS have all three patterns then this might be a good idea.
You’re right.
Hmmm … you are allowed to try to solve some minor problems by yourself from time to time!!! If you want to remove a variable you could use
If you’re unsure if there’s a cmdlet for the task you want to achieve you could try to find it with
Please read the complete help including the examples to learn how to use it.
In this special case I would have used it like this:
Get-Command -Noun *variable*
I’d expect that.
Solutions based on regular expression depend on the reliability of the uniformity and consistency of the input text. If one of the numbers is 8 digits instead of 7 it would still match but you wouldn’t get the last number in your results. If a colon changes or a comma or a white space your patterns will not match anymore. You should keep that in mind and check the process and the results on a regular base.
I’m trying to implement your Select-String suggestion below:
I’m assuming that I need to replace $Pattern with my own regex
'(?<=House\sNo:\s)(\d{7})'
Which works perfectly by itself. It returns only 2 lines in the output with the correct values, but how do I then add the other 2 regexes? I’ve looked online and it appears that I only need to separate them with a comma, but when I then run the script, it either stops providing an output altogether or still returns just the first value (HouseNo).
Do not descibe it - show it. In the best case you post the code exactly like you use it (if it does not have sensitive information in it.)
I just noticed that I forgot to post the regex pattern I used along with my code suggestion. I added it in my answer above.
Instead of 3 separate regex patterns I used a big one and used the grouping feature to separate them in the output.
What a pity. It should actually be the other way around.
You’re totally right. Regular expressions can be overwelming for beginners. But it’s worth keeping up. They’re powerfull when you know them a little bit. I’m not fluently speaking regex as well. If I have to look something up I’m used to take a look at this site first: