Hi dear colleagues!!,
i would request your help to get the fields of the header or footer in proper format.
The code is here:
$docPath = "C:\Users\Dummy\Desktop\"
$outputfile = "C:\Users\Dummy\Desktop\header_out.csv"
if(Test-Path -path $outputfile) { Remove-Item -path $outputfile }
$all_docs = Get-ChildItem $docPath -filter "n.docx"
$word = New-Object -comobject "Word.Application"
$word.Visible = $False
$all_items = @()
foreach ( $document in $all_docs)
{
$item = New-Object System.Object
$doc = $word.Documents.Open($document.FullName);
$header = $doc.Sections.Item(1).Headers.Item(1).Range.Text
$item | Add-Member -type NoteProperty -name Name -value $document.FullName
$item | Add-Member -type NoteProperty -name Header -value $doc.Sections.Item(1).Headers.Item(1).Range.Text
$all_items += $item
$doc.Close()
}
$word.Quit()
Remove-Variable doc
Remove-Variable word
$all_items | Export-CSV $outputfile
but the csv output returns all the data with the title of fields unrelated with his values like this
Confidentiality Class External Confidentiality Label Document Type Page Public Non restricted Instruction 0 (1) Prepared By (Subject Responsibl
e) Approved By (Document Responsible) Checked Powersitch Powerswitch boss Monday Document Number Revision Date Reference Number 32232 PA2 2021
-06-29 WPH-1
the searched format would be (like the table of the header) :
Confidentiality Class External Confidentiality Label Document Type Page
Public Non restricted Instruction 1 (1)
Prepared By (Subject Responsible) Approved By (Document Responsible) Checked
Powersitch Powerswitch boss Monday
Document Number Revision Date Reference
Number 32232 PA2 2021-06-29 WPH-1
the document is this
please, could anybody help to reach this?
Thanks & BR to all !!!
This is a bit tricky, perhaps more so depending on how that data in your header is actually formatted. You can’t export it straight to a CSV because you don’t have a table with a single header row followed by rows of data.
Assuming that is a table and not just some funky formatting, I think the simplest approach is just to get each cell and separate it with a comma.
$docPath = 'E:\Temp\Files\header.docx'
$headerPath = 'E:\Temp\Files\header_out.csv'
$word = New-Object -ComObject 'Word.Application'
$doc = $word.Documents.Open($docPath)
$numberOfRows = $doc.Sections.Item(1).Headers.Item(1).Range.Tables(1).Rows.Count
$numberOfColumns = $doc.Sections.Item(1).Headers.Item(1).Range.Tables(1).Columns.Count
for ($i = 1; $i -le $numberOfRows; $i++) {
$rowData = for ($j = 1; $j -le $numberOfColumns; $j++) {
$doc.Sections.Item(1).Headers.Item(1).Range.Tables(1).Cell($i,$j).Range.Text -replace '\a' -replace '\r'
}
$rowData -join ',' | Out-File $headerPath -Append -NoClobber
}
$doc.Close()
$word.Quit()
#Output
Confidentiality Class,External Confidentiality Label,Document Type,Page
Public,Non restricted,Instruction,1
Prepared By,,Approved By,Checked
PowerSwitch,,PowerSwitch Boss,Monday
My test document had some invisible characters that were being exported and screwing up the output so I used the -replace operator to sort that out. You might not need to do that, or you might have other control characters than need to be dealt with.
1 Like
Hi Matt!!!,
you code worked like a charm and was so helpful !!
if you allow me, i would like to ask to you if will be possible to do the same over a powerpoint doc type (header/footer)
Thanks & BR!!!
Glad that worked for you 
The tables in a PowerPoint document work in a similar way so it should be possible to adapt the code for a PowerPoint document. Explore the properties and methods of your document with Get-Member and check the documentation for the object model:
1 Like
Thanks again Matt
!! i will follow all your suggestions !!!
BR
