Read in Array elements, Compare to Word Document, Add & Count to CSV File

I need help modifying this script.

https://gist.github.com/gbautista72/a319218beaa53df8a935906f9be0c91d

It does a lot and maybe it can be re-written to be more efficient.

Currently, this script has an array of doctor names. When a new doctor comes on board, the name is added to this array.

The script looks in a folder on the server and reads in word documents.

The script looks for the line that starts with “Dictated by:” and then it finds and compares the doctor name on the report and the array. If found, it counts the occurrence and adds to a csv file.

Then the script emails the csv as an attachment.

There is one doctor name that is non-standard and coming over in an unknown format: YAP, FELIX Y.

I want the script to accept this name and continue. I want the script to count it on the csv. Also how can I get the outputted csv file to total each column and display the column totals? I’m scratching my head on this.

Help! Thanks.

 

Hello Gina,

Can you give an example of the non-standard format issue you are facing with Yap, Felix Y?

Hello. In my doctor name array, I have John X. Smith, etc. There are two names that do not come over in this format. We have YAP, FELIX Y. and we have *Mark Ziemba. I have a “exceptions” hash or a "synomym hash table but for some reason the script is still not recognizing YAP, FELIX Y.

hash table of alternative names in format @{‘alias’ = ‘standard name’, ‘alias’ = ‘standard name’}

[pre]$docSynonyms = @{‘YAP, FELIX Y.’ = ‘Felix Y. Yap’
‘*Mark Ziemba’ = ‘Mark Ziemba’}[/pre]

How do I get the script to work with these exceptions? How do I get it so that if I have other exceptions, it will see this hash table and carry on successfully with what I set in the hash table?

Also how can I get it to add up columns and put a total at the bottom of each column? Thank you!

 

Hello Gina,

I’m analyzing the code in further, you stated you are using Synonyms for those doctors name. At what point in the code are you pulling the doctor’s name that needs to be converted?

I’m trying to understand why this was done?

$Match = Select-String -Path $document -Pattern '^*Dictated\s+by\s*:?\s*(.*?)\s*\Z' | Select-Object -Last 1 -ExpandProperty Matches

Why not just use the following?

Select-String -Pattern "Dictabed by"

Hi Jason. There is now a portion of code that sits beneath the doclist array. Also, the name of the doctor varies and it follows the “Dictated by:” text. So on the bottom of every report you have this line of text in this standard format:
Dictated by: Timothy L. Davis, M.D. on 8/28/2019 at 7:11

There are two cases where I know of right now that the doctor name is NOT in standard format:
Dictated by: YAP, FELIX Y., M.D. on x/xx/xxxx at hh:mm
AND
Dictated by: *Mark Ziemba on x/xx/xx at hh.mm

 

[pre] # hash table of alternative names in format @{‘alias’ = ‘standard name’, ‘alias’ = ‘standard name’}
$docSynonyms = @{‘YAP, FELIX Y.’ = ‘Felix Y. Yap’}

$doc = @{}

try {

predefine doctor structure

foreach ($name in $docList)
{
$doc[$name] = @{}
foreach ($site in $sites) { $doc[$name] += @{ $site = @() } }
}

process documents based on site

foreach ($site in $sites)
{
Get-ChildItem ($(if ($site -eq ‘rdc’) { “D:\reports\dr_reports” } else { “D:\ininbound$site” }) + “\archive$date”) |
% {
$document = $_.FullName
if ($Match = Select-String -Path $document -Pattern ‘^Dictated\s+by`:\s**?(.*?),\s’ | Select-Object -Last 1 -ExpandProperty Matches)
{
$content = $Match.Groups[1].Value
Write-Host “`r`n”
Write-Host $document -ForegroundColor Cyan
Write-Host $content

if ($content -in $docSynonyms) { $content = $docSynonyms[$content] }
if ($doc[$content])
{
$doc[$content].$site += $document
Write-Host “$document matched $content” -ForegroundColor Yellow
Add-Content -path $doctorLog -value “$content found on $document”
} else {
Write-Host -f red “$content not in list of doctors”
}
} else {

String not found[/pre]

 

Gina,

I’m under the impression the $DocList is was originally the $Doctors array you created? Just want to get some verification.

Hey Gina,

Okay so after parsing your code and revamping some of it to make more sense to me, I’ve found what is happening. The regular expression you have is stopping at the first comma found. Below is an example of the output. This is what you need to fix to finish pulling all of the data. All I did was create 2 files with varying data you had created.

PS C:\Folder> Get-Content File*.txt
Dictated by: Timothy L. Davis, M.D. on 8/28/2019 at 7:11
Dictated by: YAP, FELIX Y., M.D on 8/28/2019 at 7:11
PS C:\Folder> (Get-ChildItem | Select-String -Pattern '^*Dictated\s+by\`*:\s*\*?(.*?),\s' | Select-Object -ExpandProperty Matches)

Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 31
Value : Dictated by: Timothy L. Davis,

Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 18
Value : Dictated by: YAP,

Gina,

Great news, spoke with a colleague who uses Regex more than me and we worked it out. Try this Regex now.

‘^Dictated\s+by*:\s*?(.?)(?=,\sM.D)’

or

.+?(?=,\sM.D)

(Get-childitem .\ | Select-String -Pattern '^Dictated\s+by*:\s\*?(.*?)*(?=,\sM.D)' | Select-object -expand matches).Groups.Value
Dictated by: Timothy L. Davis

Dictated by: YAP, FELIX Y.
PS C:\Folder> (Get-childitem .\ | Select-String -Pattern '.+?(?=,\sM.D)' | Select-object -expand matches).Groups.Value
Dictated by: Timothy L. Davis
Dictated by: YAP, FELIX Y.

Awesome. Thank you! Sorry for the delayed response. Big projects in the mix. I will play with this code and let ya know what happens! Thank you so much. This is great. :slight_smile:

I must be typing in something wrong. This isn’t working for me. I get an error. Here’s the code I’m trying and I’ll try and error below too.

[pre]

foreach ($site in $sites)
{
Get-ChildItem ($(if ($site -eq ‘rdc’) { “D:\reports\dr_reports” } else { “D:\ininbound$site” }) + “\archive$date”) |
% {
$document = $_.FullName
if ($Match = Select-String -Path $document -Pattern ‘^Dictated\s+by*:\s*?(.?)(?=,\sM.D)’ | Select-Object -Last 1 -ExpandProperty Matches)
{
$content = $Match.Groups[1].Value

[/pre]

ERROR I receive:

D:\reports\dr_reports\archive\2019-08-29\1102211.doc
not in list of doctors
D:\reports\dr_reports\archive\2019-08-29\20195LD9VUVY9.doc
not in list of doctors

Get-Content : Cannot find path ‘D:\script\Isaacs-Scripts\scripts\get-doctorcount\logs\08-29-2019.02.25.txt’ because it does not exist.
At D:\script\Isaacs-Scripts\scripts\get-doctorcount\TESTING-082919.ps1:113 char:22

  • $sortedContent = Get-Content $doctorLog | Sort-Object
  • CategoryInfo : ObjectNotFound: (D:\script\Isaac…-2019.02.25.txt:String) [Get-Content], ItemNotFoundException
  • FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

 

Gina,

The issue is because the item in question doesn’t exist. Its not an issue with the Regex.

Good morning. Thanks for helping with the names and REGEX. I’m going to let that go for now because it’s more complex than I thought. I’m not sure how to make the script accept a name that isn’t in standard format and add it to the running total. It’s a complicated task.
Lastly though, is there any way to total each column? Right now the script does a grand total on the far left. I can’t figure out how to get each column to total.

Hello Gina,

Please provide a copy of the updated script for review. With so many pieces having been worked on I’m not sure what you are using and not at the moment. If you are adding each of the values into its own array you could do a count on the property. I.e.

$Object.Doctorname.Count

Yes, I agree. Lots of tweeking going on. Here’s the latest.

[PRE]

$sites = ‘cdc’,‘fh’,‘mrmc’,‘svh’,‘scdc’,‘rdc’

$date = (Get-Date(Get-Date).AddDays(-1) -Format ‘yyyy-MM-dd’)
$dateTime = (Get-Date(Get-Date).AddDays(-1) -Format ‘MM-dd-yyyy.hh.mm’)

$csvPath = “D:\script\Isaacs-Scripts\scripts\get-doctorcount\CSVs$dateTime.csv”
$errorLog = “D:\script\Isaacs-Scripts\scripts\get-doctorcount\errorlogs$dateTime.html”
$errordoc = “D:\script\Isaacs-Scripts\scripts\get-doctorcount\errorlogs$dateTime-doctor-error.txt”
$doctorLog = “D:\script\Isaacs-Scripts\scripts\get-doctorcount\logs$dateTime.txt”

array of all doctor names

to add to it, simply add quotes and a comma (unless its the last in the array)

$docList = @(
‘Benjamin T. Addicott’,
‘Timothy L. Auran’,
‘Eugene F. Barasch’,
‘William C. Burnette’,
‘Kevin R. Connolly’,
‘Harry F. Corbett’,
‘Timothy L. Davis’,
‘Arthur C. Duberg’,
‘Stephen R. Holtzman’,
‘Taylor R. Jordan’,
‘Thomas L. Miller’,
‘Linda D. Mulder’,
‘Ragu C. Nathan’,
‘Erik M. Olson’,
‘Jaywant P. Parmar’,
‘Mohsin M. Rahman’,
‘David A. Rigual’,
‘William M. Russell’,
‘Farhad W. Sani’,
‘Melissa E. Sims’,
‘Laura E. Traube’,
‘David J. Tuttle’,
‘Erica Vergara’,
‘Donna E. Winingham’,
‘Felix Y. Yap’,
‘Mark Ziemba’,
“Teresa O’Neill”,
“Sonja M. O’Brien”
)

hash table of alternative names in format @{‘alias’ = ‘standard name’, ‘alias’ = ‘standard name’}

Would like some way of capturing the KNOWN maflormed names so that the script continues. Take that malformed name and add it to the running total for the doctor and for the site.

$docSynonyms = @{‘YAP, FELIX Y.’ = ‘Felix Y. Yap’}

$doc = @{}

try {

predefine doctor structure

foreach ($name in $docList)
{
$doc[$name] = @{}
foreach ($site in $sites) { $doc[$name] += @{ $site = @() } }
}

process documents based on site

foreach ($site in $sites)
{
Get-ChildItem ($(if ($site -eq ‘rdc’) { “D:\reports\dr_reports” } else { “D:\ininbound$site” }) + “\archive$date”) |
% {
$document = $_.FullName
if ($Match = Select-String -Path $document -Pattern ‘^Dictated\s+by`:\s**?(.*?),\s’ | Select-Object -Last 1 -ExpandProperty Matches)

{
$content = $Match.Groups[1].Value
Write-Host “`r`n”
Write-Host $document -ForegroundColor Cyan
Write-Host $content

if ($content -in $docSynonyms) { $content = $docSynonyms[$content] }
if ($doc[$content])
{
$doc[$content].$site += $document
Write-Host “$document matched $content” -ForegroundColor Yellow
Add-Content -path $doctorLog -value “$content found on $document”
} else {
Write-Host -f red “$content not in list of doctors”
Add-Content $errordoc -Value “$content not in the list of doctors”
}
} else {

String not found

}
}
}

Set-Content $csvPath -Value $dateTime
Add-Content $csvPath -value “DRName,TOTAL,RASLO,AGCH,FH,MRMC,SVH,SCDC”

$completeTotal = 0
foreach ($doctor in $docList)
{
$curdoc = $Doc[$doctor]
$total = ($curdoc[$sites] | % { $_ }).Count
$completeTotal += $total
$firstname = $doctor.Split(’ ')
$firstname = $firstname[0] + $firstname[-1][0]

Add-Content $csvPath -Value “$firstname,$total,$($curdoc.rdc.count),$($curdoc.cdc.count),$($curdoc.fh.count),$($curdoc.mrmc.count),$($curdoc.svh.count),$($curdoc.scdc.count)”
}

Add-Content $csvPath -Value “Total Doctor Count,$completeTotal”
Send-MailMessage -From ‘email’ -to ‘email’ -Subject “Radiologist Daily Read Count for $date” -Body “Radiologist counts for $date” -SmtpServer ‘aspmx.l.google.com’ -port ‘25’ -Attachments $csvPath
Send-MailMessage -From ‘email’ -to ‘email’ -Subject “Error - NON Rads for $date” -Body “Error - Rads not in rad array for $date” -SmtpServer ‘aspmx.l.google.com’ -port ‘25’ -Attachments $errordoc

$sortedContent = Get-Content $doctorLog | Sort-Object
Set-Content -path $doctorLog -Value $sortedContent

} catch {
Add-Content $errorlog -Value “<font color = ‘red’>Error Content</font> -<br><br>Exception: $($.Exception)<br><br> CategoryInfo: $($.CategoryInfo)<br><br> StackLocation: $($.ScriptStackTrace)<hr>"
Send-MailMessage -From ‘email’ -to ‘email’ -Subject ‘Error found with Doctor Count:’ -Body "<font color = ‘red’>Error Content</font> -<br><br>Exception: $($
.Exception)<br><br> CategoryInfo: $($.CategoryInfo)<br><br> StackLocation: $($.ScriptStackTrace)<hr>” -SmtpServer ‘aspmx.l.google.com’ -port ‘25’ -BodyAsHtml
}
[/PRE]

 

Gina,

I’ve updated your code, let me know if this still works. Once we verify this we can start on the next piece. I highly recommend you stop using one liners and aliases in your script. This is not best practice when creating a script.

<script src=“https://gist.github.com/JasonRobertson/9cada63bb4d7c6688f5fc66ba444d2b5.js”></script>