Batch Processing a pile of Word Documents

Hey Folks,

I’ve been looking into a problem of a friend of mine. He has a group people filling aut a form (word / doc /docx ) and these are in a folder. So we need to automate the processing hundreds documents and grab specific parameters and write the into a “.csv”-file.

So, I can read the files and stack them but fishing the specified string, that`s where it starts to get tuff.

Can anybody help out here?

$docPath = $args[0]
Write-Host "Processing Documents from:" $docPath 
$all_docs = Get-ChildItem $docPath -filter "*.docx"

$word = New-Object -comobject "Word.Application"
$word.Visible = $False

# Now, open each document and list the "ContentControls"
$all_items = @()
foreach ( $doc in $all_docs)
  Write-Host "Processing :" $doc.FullName
  $doc = $word.Documents.Open($doc.FullName);
  $controls = $doc.ContentControls.Count	
  Write-Host "Found : " $controls.Count " Content Controls"

  # Now, we create a collection of custom objects which are holding the data
  $item = New-Object System.Object
  foreach ( $control in $doc.ContentControls )
    $item | Add-Member -type NoteProperty -name $control.Title -value $control.Range.Text
  $all_items += $item

# Last, we save the collection to a CSV file
$all_items | Export-CSV "exportData_DATE.CSV"

Yeah, Word is about the worst-case scenario for this, unfortunately, and you’re going to be stuck with a decade-old COM object to work with. And this isn’t really PowerShell; it’s COM programming against Word. I say this only because, historically, we’ve have very low turnout on this type of question, and I didn’t want you hanging around waiting for an answer that might not be forthcoming. Sorry :(.

But, if you figure it out, maybe you’ll consider dropping by now and again to answer these kinds of questions when they come up :).

It looks like he may have already figured out the COM part, but doesn’t know how to get the data being returned in the format he wants. It might help to fill out a form with test data, show us the output, and then show us what you want it to be. This might actually be more of a Regex question.