Grabbing last 4 characters of matches

shodredux · October 24, 2016, 3:42pm

This is my current script that i have wrote:

Select-string ‘D:\Powershell\Sample.txt’-pattern ~TRN*1*\w+ -AllMatches | Select matches

This is the result:

Matches

{~TRN110100000000*}
{~TRN110100000001*}
{~TRN110100000002*}
{~TRN110100000003*}
{~TRN110100000004*}
{~TRN110100000005*}
{~TRN110100000006*}

What I would like my result to be is throughout all the text file look between the “~TRN1” and the next “*” and only show the last 4 characters between, so the end result would just show the following:

Matches

0000
0001
0002
0003
0004
0005
0006

Any help would be greatly appreciated, thanks!

Olaf · October 24, 2016, 4:15pm

Something like this?

Get-Content -Path C:\_Temp\test\sample.txt | ForEach-Object -Process {$_ -match '~TRN\*1.*(\d{4})\*' | Out-Null ; $Matches[1]}

shodredux · October 24, 2016, 5:05pm

Thank you for the reply!

By reading your setup, it has helped me get closer…

The Regex output result is where i am getting jammed up

All i want is what is in bold. Let me know if you need more info and thank you!

what i currently get:
{~TRN110100000000}
{~TRN110100000001}
{~TRN110100000002}
{~TRN110100000003}
{~TRN110100000004}
{~TRN110100000005}
{~TRN110100000006*}

what i want to get:

0000
0001
0002
0003
0004
0005
0006

shodredux · October 24, 2016, 5:38pm

Making progress…

Current script:
Select-string ‘D:\Powershell\Sample.txt’-pattern '(?<=~TRN*1*)(\w+)' -AllMatches | Select matches

Current Results:

Matches

{10100000000}
{10100000001}
{10100000002}
{10100000003}
{10100000004}
{10100000005}
{10100000006}

Now I want it to do only the last 4 characters of these numbers, in example:

Matches

{0000}
{0001}
{0002}
{0003}
{0004}
{0005}
{0006}

Olaf · October 24, 2016, 6:02pm

What do you get when you run my code? How does the txt file look like?

rob-simmers · October 24, 2016, 7:28pm

Olaf’s code worked for me. Although I’m not sure how the regex knows to get the last 4 digits versus the first 4.

peter-jurgens-2 · October 24, 2016, 7:46pm

In Olaf’s regex he uses a capture group and the key element is placing the “*” right after the capture group. This pretty much describes that the 4 digits he wants in the capture group “(\d{4})” come right before the asterisk in the sample text.

Personally I prefer to name my capture groups if ever I use capture groups then you can access them via $matches[“name”] rather than a numerical index.

shodredux · October 24, 2016, 9:03pm

This is the result i get with Olaf’s code:

0580
0580
0580
0580
0580
0580
0580
0580
0580

The text file I have does not have any spaces but is using ‘*’ for all the spaces if that makes sense. For example a small segment on how the text file is:

ACHCCP01111DA331234567890**01111DA2220100101~TRN1101000000001000000000~REFEVETIN~DTM40520100101~N1PRNYSDOH~N3OFFICE OF HEALTH INSURANCE PROGRAMSCORNING TOWER, EMPIRE STATE PLAZA~N4ALBANYNY122370080~PERBLPROVIDER SERVICESTE8003439000URwww.emedny.org~N1PEMAJOR MEDICAL PROVIDERXX9999999995~REFTJ000000000~LX1~CLPPATIENT ACCOUNT NUMBER134.2534.25**MC100021000000003011~NM1QC1SUBMITTED LASTSUBMITTED FIRSTMILL99999L~NM1741CORRECTED LASTCORRECTED FIRST~REFEAPATIENT ACCOUNT NUMBER~DTM232*20100101~DT

Now this text file may have multiple ~TRN1 in it which can vary in numerical length. I want to grab the last four numbers Between ~TRN1 and the following ‘*’ for every instance that it can be found in this text file. (forgive me guy’s, I am new to a lot of this and i am trying to understand. So if i am missing something just please let me know) I appreciate all the help!

shodredux · October 24, 2016, 9:35pm

This is the result i have:
0580
0580
0580
0580
0580
0580
0580
0580
0580

This is a small example of the text document and how it is layed out:

ACHCCP01111DA331234567890**01111DA2220100101~TRN1101000000001000000000~REFEVETIN~DTM40520100101~N1PRNYSDOH~N3OFFICE OF HEALTH INSURANCE PROGRAMSCORNING TOWER, EMPIRE STATE PLAZA~N4ALBANYNY122370080~PERBLPROVIDER SERVICESTE8003439000URwww.emedny.org~N1PEMAJOR MEDICAL PROVIDERXX9999999995~REFTJ000000000~LX1~CLPPATIENT ACCOUNT NUMBER134.2534.25MC100021000000003011~NM1QC1SUBMITTED LASTSUBMITTED FIRSTMILL99999L~NM1741CORRECTED LASTCORRECTED FIRST~REFEAPATIENT ACCOUNT NUMBER~DTM23220100101~DTM23320100101~AMTAU34.25~SVCHC:V2020:RB661~DTM47220100101~AMTB66~SVCHC:V2700:RB2.752.75**1~DTM47220100101~AMTB62.75~SVCHC:V2103:RB5.55.51~DTM47220100101~AMTB65.5~SVCHC:S058020*202~DTM47220100101~AMTB6

I just want to find all the numbers between every ~TRN1 and the next asterisk found in the text file. and only display the last 4 sets of numbers regardless of the length between the numbers, i just need it to always be the last 4. Thank you all so much for your assistance with this, i am a total newbie at this and trying to learn.

shodredux · October 24, 2016, 9:54pm

This is what the text file looks like (small sample but throughout):

ACHCCP01111DA331234567890**01111DA2220100101~TRN1101000000001000000000~REFEVETIN~DTM40520100101~N1PRNYSDOH~N3OFFICE OF HEALTH INSURANCE PROGRAMSCORNING TOWER, EMPIRE STATE PLAZA~N4ALBANYNY122370080~PERBLPROVIDER SERVICESTE8003439000URwww.emedny.org~N1PEMAJOR MEDICAL PROVIDERXX9999999995~REFTJ000000000~LX1~CLPPATIENT ACCOUNT NUMBER134.2534.25MC100021000000003011~NM1QC1SUBMITTED LASTSUBMITTED FIRSTMILL99999L~NM1741CORRECTED LASTCORRECTED FIRST~REFEAPATIENT ACCOUNT NUMBER~DTM23220100101~DTM23320100101~AMTAU34.25~SVCHC:V2020:RB661~DTM47220100101~AMTB66~SVCHC:V2700:RB2.752.75**1~DTM47220100101~AMTB62.75~SVCHC:V2103:RB5.55.51~DTM47220100101~AMTB65.5~SVCHC:S058020*202~DTM47220100101~AMTB6

The results i get back from running your code:

0580
0580
0580
0580
0580
0580
0580
0580
0580

Olaf · October 25, 2016, 8:18am

OK, now I know what’s wrong. Try this:

Get-Content -Path C:\_Temp\test\sample.txt | ForEach-Object -Process {$_ -match '~TRN\*1.*?(\d{4})\*' | Out-Null ; $Matches[1]}

I assume your file has line breaks

shodredux · October 25, 2016, 2:25pm

There are no line breaks, it all runs on a single line.

I also ran your code and the results are:

0000

It seems to see only 1 single instance of this match which comes first, now how do we make it display all the matches?

Again, thank you so much for your assistance olaf!

Olaf · October 25, 2016, 3:23pm

I will not give up … yet.

Try this:

(Select-string 'D:\Powershell\Sample.txt'-pattern ~TRN\*1\*\w+ -AllMatches | select -ExpandProperty Matches).Value | ForEach-Object -Process {$_ -match '~TRN\*1.*?(\d{4})$' | Out-Null ; $Matches[1]}

… and BTW: that’s really weird to have file without any line breaks. I’m just curious: where do get this file from?

shodredux · October 25, 2016, 4:09pm

The results are showing now! Thank you, thank you, thank you! You have been a overwhelmingly great help Olaf!

To be more specific on why I was trying to figure this out. I work in Ambulance Billing and I receive these Electronic Remittance Advices (ERAs) that are supposed to be opened in a application called Medicare EasyPrint reader. These files come in the file extension of .EDI (example.EDI). What I am currently having to do is open these files one by one and rename them based on the insurance payor and the check numbers contained within, sometimes a single check number, other times a whole bunch of check numbers, but only the last 4 digits of each check number. What I figured is since this is a repetitive task, I could use powershell to automate this process. Eventually I would like to run a powershell script, that will pull that information within each EDI file and rename them based upon those specific matches you have assisted me so awesomely with Olaf and I thank you soooooo much for that!

I didn’t want to tell everyone what the bigger picture was because I didn’t want someone to just make the whole thing for me or people thinking I wanted them to do this for me, I am trying to study and understand everything being used and you have given me enough to study with the code you provided me Olaf.

Also since these files come in .EDI File extension, I just mass change them to the .txt extension to make it readable which is the sample result I gave you, all in one line only.

Here is an example of an ERA.EDI file (not the best one because it only has 1 check number in it but):
https://www.emedny.org/HIPAA/5010/5010_sample_files/835%20Sample%20(Professional%20Claims%20only-w%20payment).2014.txt

Olaf · October 25, 2016, 4:16pm

Ronald,

cool … thanks for the explanation. So we both learned something today. Great.

BTW: you don’t have to rename the files to read them with Powershell.

Olaf · October 25, 2016, 7:18pm

Ronald,

I’ve played a little around and I’ve found an even shorter one: (maybe faster)

(Select-string 'D:\Powershell\sample.txt' -pattern '~TRN\*1\*\d*(\d{4})\*' -AllMatches).Matches.Groups.value[(0..1000|%{$_*2+1})]

But … and it’s a big BUT - it assumes that you don’t have more than 1000 hits per file. If so you have to increase the number in the expression.

Topic		Replies	Views
Regular expression help PowerShell Help	1	206	May 16, 2024
Regex and Various Matches PowerShell Help	6	261	May 16, 2024
Read text file and find match PowerShell Help	16	972	May 16, 2024
check numbers from a text register file PowerShell Help	16	211	May 16, 2024
Regular Expression: Hidden characters in string PowerShell Help	1	206	May 16, 2024

Grabbing last 4 characters of matches

Matches

Matches

Matches

Matches

Related topics