Read through raw text files in a directory and report out a count of unique data

Hello. I need help. I’m tinkering with a ps script, but I am a novice and cannot get this to work. Here’s what I need:1 - open a directory2 - read in the text files in the directory (*.txt)3 - open each one & do a delimiter functions. Each text file is raw data. It doesn’t have any headers, etc. The “|” pipe symbol is the delimiter; each field is separated by a pipe.4 - I want it to count for unique last name.5 - what’s the easiest way to output that data to summarize for all files? Is it easier in a csv output or a text file output?
I was able to get this code to work for one text file.
[PRE]@(Import-Csv C:\itdept\494-Billing_201808010300.txt -Delimiter “|” -Header a,b,c,d,e | Select-Object -unique d,e[/PRE]

The directory is like this D:\RASLO-BILLING
The format of the text file is like this:

1|2R149653||DOE|JANE||19790127|DOE|JANE||231 STREET AVE||CITY|CA|93420||805|1234567|N|18||||||v|

2|2R149653||DOE|JANE||19790127|231 STREET AVE||CITY|CA|93420||805|1234567|||F||6230|||||||||||||||

3|2R149653||DOE|JANE||19790127|P|136391580|||2950|BC||SIF928A56074|DOE|JANE||231 STREET AVE||CITY|CA|93420||805|1234567|F|19790127||18||||||||||INSURANCE 60007|PO BOX 60007||LOS ANGELES|CA|90060||INSURANCE 60007|(800) 227-3771||||||

3|2R149653||DOE|JANE||19790127|S||||RAS1159|CAP||XEH903377346|DOE|JANE||231 STREET AVE||CITY|CA|93420||805|1234567|F|19790127||18||||||||||INSURANCE|PO BOX 13518||BAKERSFIELD|CA|93389||CCPN|(800)604-8752||||||

4|2R149653||DOE|JANE||19790127|2950|3JXWVOV1|1|20180731|493|100|6230|70543||||||||||v|||WEISS|WENDY|E|892506|892506|C4|MRI SOFT TISSUE NECK W/O+W (35/40)|Procedure|1609867738|PISMO FAM PRAC|D|(805) 773-0707|N|1|136391580|494-707|2018079JNSGG2X6|

1|2R149822||SMITH|JANET||20160919|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|3694564|||||||||

2|2R149822||SMITH|JANET||20160919|540 STREET AVE|#119|CITY|CA|93446||805|3694564|||F||2782|||||||||||||||

3|2R149822||SMITH|JANET||20160919|P||||031809SS|MC||97814821G|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|3694564|F|20160919||||||||||||INSURANCE|PO BOX 948||GOLETA|CA|93110||INSURANCE|(877) 814-1861||||||

4|2R149822||SMITH|JANET||20160919|031809SS|3K08828P|1|20180731|493|132|2782|73100|||||||||||||CORYELL|JAMES|D|893024|893024|C4|XR RIGHT WRIST 2 VIEWS|Procedure|1427006212|CENT CST PEDS-TP|D|(805) 434-3796||0||493-702|20180800IHMY9ZH|

1|2R149822||SMITH|JANET||20160919|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|1234567|||||||||

2|2R149822||SMITH|JANET||20160919|540 STREET AVE|#119|CITY|CA|93446||805|1234567|||F||2782|||||||||||||||

3|2R149822||SMITH|JANET||20160919|P||||031809SS|MC||97814821G|SMITH|JANET||540 STREET AVE|#119|CITY|CA|93446||805|1234567|F|20160919||||||||||||INSURANCE|PO BOX 948||GOLETA|CA|93110||INSURANCE|(877) 814-1861||||||

4|2R149822||SMITH|JANET||20160919|031809SS|3K08A1AE|1|20180731|493|132|2782|73100|||||||||||||CORYELL|JAMES|D|893023|893023|C4|XR LEFT WRIST 2 VIEWS|Procedure|1427006212|CENT CST PEDS-TP|D|(805) 434-3796||0||493-702|20180800IHMY9ZH|

 

 

 

I think this is what you are looking for?

$Path = 'd:\RASLOW-BILLING\*.txt'
Import-Csv -Path $path -Delimiter '|' -Header a,b,c,d,e | Group-Object -Property d | Select-Object Name,Count

Hey Gina. The key here is in order to do comparison or grouping, we need column headers. You can do it like you started, like so:

$header = Idx,
          Hdr1,
          LastName,
          FirstName...

I am lazy and I also have no idea what the data is to create a header, so we can take a line and make it a character array, find all of the pipe symbols and get a count (which is 44).

$test = '4|2R149653||DOE|JANE||19790127|2950|3JXWVOV1|1|20180731|493|100|6230|70543||||||||||v|||WEISS|WENDY|E|892506|892506|C4|MRI SOFT TISSUE NECK W/O+W (35/40)|Procedure|1609867738|PISMO FAM PRAC|D|(805) 773-0707|N|1|136391580|494-707|2018079JNSGG2X6|'
($test.ToCharArray() | Where{$_ -eq '|'}).Count

If there are 44 pipes, there could be a data after the last pipe, so I went with 45 generic headers:

$header = 1..45 | foreach {'Hdr{0}' -f $_}

Now I have a generic header row that is Hdr1, Hdr2 and so on up to Hdr45. Next we get 3 files, import them into a single collection so that we can start doing analysis:

<pre
#Create a generic header array
$header = 1…45 | foreach {'Hdr{0}' -f $_}
#Get the files
$files = Get-ChildItem 'C:\Scripts*.txt'

#Go thru each file and import it with the header
$results = foreach ( $file in $files ) {
$csv = Import-CSV -Path $file.FullName -Delimiter '|' -Header $header
#Get all of the header information and add on the file we got the data from
$csv | Select *, @{Name='FileName';Expression={$file.BaseName}}
}

#Assume header 4 is what we want to group
$results | Group-Object -Property hdr4 -NoElement
#See how many results we got per file
$results | Group-Object -Property FileName -NoElement

Now you can get grouped information:

PS C:\Users\Rob> $results | Group-Object -Property hdr4 -NoElement


Count Name                     
----- ----                     
    5 DOE                      
    8 SMITH                    



PS C:\Users\Rob> $results | Group-Object -Property FileName -NoElement

Count Name                     
----- ----                     
    5 file1                    
    4 file2                    
    4 file3                    

Hey Gina. The key here is in order to do comparison or grouping, we need column headers. You can do it like you started, like so:

$header = Idx,
          Hdr1,
          LastName,
          FirstName...

I am lazy and I also have no idea what the data is to create a header, so we can take a line and make it a character array, find all of the pipe symbols and get a count (which is 44).

$test = '4|2R149653||DOE|JANE||19790127|2950|3JXWVOV1|1|20180731|493|100|6230|70543||||||||||v|||WEISS|WENDY|E|892506|892506|C4|MRI SOFT TISSUE NECK W/O+W (35/40)|Procedure|1609867738|PISMO FAM PRAC|D|(805) 773-0707|N|1|136391580|494-707|2018079JNSGG2X6|'
($test.ToCharArray() | Where{$_ -eq '|'}).Count

If there are 44 pipes, there could be a data after the last pipe, so I went with 45 generic headers:

$header = 1..45 | foreach {'Hdr{0}' -f $_}

Now I have a generic header row that is Hdr1, Hdr2 and so on up to Hdr45. Next we get 3 files, import them into a single collection so that we can start doing analysis:

#Create a generic header array
$header = 1..45 | foreach {'Hdr{0}' -f $_}
#Get the files
$files = Get-ChildItem 'C:\Scripts\*.txt'

#Go thru each file and import it with the header
$results = foreach ( $file in $files ) {
    $csv = Import-CSV -Path $file.FullName -Delimiter '|' -Header $header
    #Get all of the header information and add on the file we got the data from
    $csv | Select *, @{Name='FileName';Expression={$file.BaseName}}
}

#Assume header 4 is what we want to group
$results | Group-Object -Property hdr4 -NoElement
#See how many results we got per file
$results | Group-Object -Property FileName -NoElement

Now you can get grouped information:

PS C:\Users\Rob> $results | Group-Object -Property hdr4 -NoElement


Count Name                     
----- ----                     
    5 DOE                      
    8 SMITH                    



PS C:\Users\Rob> $results | Group-Object -Property FileName -NoElement

Count Name                     
----- ----                     
    5 file1                    
    4 file2                    
    4 file3                    

Thanks. I will try that and see what it produces!

It worked! Your code helped me get it working. I was able to also output to a csv file and email the file to the IT department. Thank you so very much!

Glad it worked!! Now you can show those IT guys how to use Powershell :slight_smile: