Extract header or footer from word as a table (powershell)

Hi dear colleagues!!,

i would request your help to get the fields of the header or footer in proper format.
The code is here:


 $docPath = "C:\Users\Dummy\Desktop\"

 $outputfile = "C:\Users\Dummy\Desktop\header_out.csv"

 if(Test-Path -path $outputfile) { Remove-Item -path $outputfile }


 $all_docs = Get-ChildItem $docPath -filter "n.docx"

 $word = New-Object -comobject "Word.Application"

 $word.Visible = $False

 $all_items = @()

 foreach ( $document in $all_docs)

 {

  $item = New-Object System.Object

  $doc = $word.Documents.Open($document.FullName);

  $header = $doc.Sections.Item(1).Headers.Item(1).Range.Text

  $item | Add-Member -type NoteProperty -name Name -value $document.FullName

  $item | Add-Member -type NoteProperty -name Header -value $doc.Sections.Item(1).Headers.Item(1).Range.Text

  $all_items += $item

  $doc.Close()

 }

 $word.Quit()

 Remove-Variable doc

 Remove-Variable word

 $all_items | Export-CSV $outputfile

but the csv output returns all the data with the title of fields unrelated with his values like this

Confidentiality Class External Confidentiality Label Document Type Page Public Non restricted Instruction 0 (1) Prepared By (Subject Responsibl
e) Approved By (Document Responsible) Checked Powersitch Powerswitch boss Monday Document Number Revision Date Reference Number 32232 PA2 2021
-06-29 WPH-1

the searched format would be (like the table of the header) :

Confidentiality Class External Confidentiality Label Document Type Page
Public Non restricted Instruction 1 (1)

Prepared By (Subject Responsible) Approved By (Document Responsible) Checked
Powersitch Powerswitch boss Monday

Document Number Revision Date Reference
Number 32232 PA2 2021-06-29 WPH-1

the document is this

please, could anybody help to reach this?

Thanks & BR to all !!!

This is a bit tricky, perhaps more so depending on how that data in your header is actually formatted. You can’t export it straight to a CSV because you don’t have a table with a single header row followed by rows of data.

Assuming that is a table and not just some funky formatting, I think the simplest approach is just to get each cell and separate it with a comma.

$docPath = 'E:\Temp\Files\header.docx'
$headerPath = 'E:\Temp\Files\header_out.csv'

$word = New-Object -ComObject 'Word.Application'

$doc = $word.Documents.Open($docPath)

$numberOfRows = $doc.Sections.Item(1).Headers.Item(1).Range.Tables(1).Rows.Count
$numberOfColumns = $doc.Sections.Item(1).Headers.Item(1).Range.Tables(1).Columns.Count

for ($i = 1; $i -le $numberOfRows; $i++) {

    $rowData = for ($j = 1; $j -le $numberOfColumns; $j++) {

        $doc.Sections.Item(1).Headers.Item(1).Range.Tables(1).Cell($i,$j).Range.Text -replace '\a' -replace '\r'

    }

    $rowData -join ',' | Out-File $headerPath -Append -NoClobber
}

$doc.Close()
$word.Quit()
#Output
Confidentiality Class,External Confidentiality Label,Document Type,Page
Public,Non restricted,Instruction,1
Prepared By,,Approved By,Checked
PowerSwitch,,PowerSwitch Boss,Monday

My test document had some invisible characters that were being exported and screwing up the output so I used the -replace operator to sort that out. You might not need to do that, or you might have other control characters than need to be dealt with.

1 Like

Hi Matt!!!,

you code worked like a charm and was so helpful !!
if you allow me, i would like to ask to you if will be possible to do the same over a powerpoint doc type (header/footer)

Thanks & BR!!!

Glad that worked for you :+1:

The tables in a PowerPoint document work in a similar way so it should be possible to adapt the code for a PowerPoint document. Explore the properties and methods of your document with Get-Member and check the documentation for the object model:

https://docs.microsoft.com/en-us/office/vba/api/powerpoint.table

1 Like

Thanks again Matt :grinning:!! i will follow all your suggestions !!!

BR :wink: :+1: