I’ve started on a script to find any Personal Identifiable Information (PII) in a set list of file types. I’m currently just testing .docx files. The object output from the Find-PIIWord helper function isn’t outputting as I expect it would. As shown at the very bottom the object outputs all at once at the very end after all the verbose and warning output and not each time it’s called.
Should I be returning the object from Find-PIIWord back to the main function and outputting the object from there?
Is a function call from a switch statement really the right approach here?
Any other critiques would be greatly appreciated.
Function Find-PII { [cmdletbinding()] Param ( [Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)] [Alias("FilePath")] [string[]] $Path = $PWD ) Begin { #Converts relative path to absolute path $Path = Convert-Path $Path #Has 9 digits, may be split as xxx-xx-xxxx by dashes or spaces $patternSocial = '(\d{3}[-| ]\d{2}[-| ]\d{4})|(\d{9})' #Starts with a 4 and have 16 digits, may be split as xxxx-xxxx-xxxx-xxxx by dashes or spaces $patternVisa = '(4\d{3}[-| ]\d{4}[-| ]\d{4}[-| ]\d{4})|(4\d{15})' #Starts with 51-55 and have 16 digits, may be split as xxxx-xxxx-xxxx-xxxx by dashes or spaces $patternMC = '(5[1-5]\d{2}[-| ]\d{4}[-| ]\d{4}[-| ]\d{4})|(5[1-5]\d{14})' #Starts with 34 or 37 and have 15 digits, may be split as xxxx-xxxxxx-xxxxx by dashes or spaces $patternAMEX = '(3[47]\d{2}[-| ]\d{6}[-| ]\d{5})|(3[47]\d{13})' #Start with 6011 or 65 and have 16 digits, may be split as xxxx-xxxx-xxxx-xxxx by dashes or spaces $patternDiscover = '(6(?:011|5\d{2})[-| ]\d{4}[-| ]\d{4}[-| ]\d{4})|(6(?:011|5\d{2})\d{12})' New-PIITempFolder $PIITemp = Get-Item -Path "$env:TEMP\FindPII" } Process { $files = Get-ChildItem -Path $Path -Include '*.docx' -Recurse #$files = Get-ChildItem -Path $Path -Include '*.docx', '*.xlsx', '*.pdf', '*.pptx', '*.txt' -Recurse foreach ($file in $files) { switch ($file.Extension) { .docx {Find-PIIWord -InputObject $file} #.xlsx {Find-PIIExcel} #.pptx {Find-PIIPowerPoint} #.pdf {Find-PIIPdf} #.txt {Find-PIITxt} #default {break} } } } End { #Remove-PIITempFolder } } Function Find-PIIWord { param ( [Parameter(ValueFromPipeline = $true)] [System.IO.FileInfo] $InputObject ) Write-Verbose "Looking for PII in $($InputObject.Name)" $docxTemp = "$PIITemp\$($InputObject.Name)" New-Item -Path "$PIITemp\docx" -ItemType Directory -Force | Out-Null Copy-Item -Path $InputObject.FullName -Destination "$docxTemp.zip" -Force Expand-Archive -Path "$docxTemp.zip" -DestinationPath "$PIITemp\docx\" -Force | Out-Null [xml] $docx = Get-Content -Path "$PIITemp\docx\word\document.xml" $PIIFound = $docx.document.body.p.r.t | Select-String -Pattern $patternSocial, $patternVisa -Quiet if ($PIIFound) { $obj = [pscustomobject] @{ 'Name' = $InputObject.Name; 'Length' = $InputObject.Length; 'LastWriteTime' = $InputObject.LastWriteTime; 'FullName' = $InputObject.FullName } Write-Warning "PII found in $($InputObject.name)" Write-Output $obj } #Remove-Item -Path "$PIITemp\docx" -Recurse -Force } Function New-PIITempFolder { if (-not (Test-Path -Path "$env:TEMP\FindPII")) { New-Item -Path $env:TEMP -Name FindPII -ItemType Directory | Out-Null } } Function Remove-PIITempFolder { if (Test-Path -Path "$env:TEMP\FindPII") { Remove-Item -Path "$env:TEMP\FindPII" -Recurse -Force } }
The output of my test is below:
PS G:\Microsoft\Powershell> Find-PII -Path . -Verbose VERBOSE: Looking for PII in Resume.docx WARNING: PII found in Resume.docx VERBOSE: Looking for PII in test.docx WARNING: PII found in test.docx Name Length LastWriteTime FullName ---- ------ ------------- -------- Resume.docx 27689 12/12/2017 12:22:58 AM G:\Microsoft\Powershell\Resume.docx test.docx 11970 12/12/2017 12:26:21 AM G:\Microsoft\Powershell\test.docx