Workflow work scope question

To preface this I have a working script as is but because it is searching 300,000 to 400,000 lines in 4 separate files and the time to complete operations is not desirable. What I would like to happen as an end result would be the foreach match operation to simultaneously scan each of the 4 files at once and I’m not sure if this should be termed as parallel operation via workflow or threading via scriptblock and job operations. I tinkered with both and the workflow sample bellow is the closest I’ve come to what I am trying to achieve except I cannot seem to get the results back for additional work. Via operation time I can presume the work is being done as its relative to the original operation if i limit it to just one of the four files but it would seem I need some direction to move forward on this.`
[pre]

$SawComp = Get-Content “C:\Log\Complete.log”
$SawComp1 = Get-Content “C:\Log\Complete.log.1”
$SawComp2 = Get-Content “C:\Log\Complete.log.2”
$SawComp3 = Get-Content “C:\Log\Complete.log.3”
$PrinterReport = “C:\WIP\PrinterReport.htm”
$PrintErrors = @()
$SawCombinedLogs = $SawComp + $SawComp1 + $SawComp2 + $SawComp3

If (Test-Path $PrinterReport) {Remove-Item -Path $PrinterReport -Force}
workflow test
{
param($SawCombinedLogs)

foreach -parallel ($_ in $Using:SawCombinedLogs )
{
if ($_ -match “Part( Started -| Complete)|PRT(0004|0005|0009)|ENG(0029|0037)|OPR0078|WaitingForPrinterTrigger”) { $Output = $_}
$PrintErrors += $Output | Out-String
}
}

$PrintErrors

[/pre]

 

You’re not executing the workflow, try something like this:

workflow test {
param($files)

    $results = foreach -parallel ($_ in $files ) {
        $content = Get-Content $_

        if ($content -match 'file') {$_}
    }

    $results
}

$files = 'C:\Scripts\file1.txt', 'C:\Scripts\file2.txt'

test $files

I assume the last line where you call “test” is like calling a function which i have tinkered with but still am not getting any results in $PrinterErrors. Also why are you putting the file list variable after test? I’m afraid I’m not following how to apply this, can you explain better what your doing in this example thats different besides calling the workflow name?

If you have 2 files, we’ll say 100k rows for each file, and you do this:

$file1 = Get-Content 'C:\Scripts\file1.txt'
$file2 = Get-Content 'C:\Scripts\file2.txt'

$file1 + $file2

Get-Content creates and array of lines, using the + is joining the $file1 and $file2 content into a 200k single array, which means you are processing rows from both files in one big array. Let’s start with a typical\normal loop, process each file, process each row

One file at a time:

$files = 'C:\Scripts\file1.txt', 'C:\Scripts\file2.txt'

$results = foreach ($file in $files ) {
    $content = Get-Content $file

    foreach ($row in $content) {
        if ($row -match 'file') {$row}
    }
}

$results

Parallel processing:

You want to process multiple files at once, so the process is the same, but you’re just adding -parallel:

workflow test {
param($files)

    $results = foreach -parallel ($file in $files ) {
        $content = Get-Content $file

        foreach ($row in $content) {
            if ($row -match 'file') {$row}
        }
    }

    $results
}

$files = 'C:\Scripts\file1.txt', 'C:\Scripts\file2.txt'

test $files

Use Measure-Command to see how long each process takes. There is a lot of information on using Powershell to process large files. Recommend you do research to find the best approach to read a single file. Even processing these 4 huge files is going to take a lot of memory, so look at processes like this:

https://stackoverflow.com/questions/9439210/how-can-i-make-this-powershell-script-parse-large-files-faster

Once you have found the best approach for a single file, then you can look at doing parallel operations.