Filtering input data from word file tables

Hello all,
I have a script that allows the user to select a word file and it will ingest the tables from the file, my end goal is to export it to a csv file.

However, I need to remove all rows that contain the word ‘resolved’ and I need to remove one of the columns with the heading of ‘changed’.

Looking for some guidance.

$date = Get-Date -Format "Y"
$currentFolder = (Get-Location).Path
$tempoutputfile = $currentFolder+"\"+"out_temp.csv"
$outputfile = $date + " "+$name +" " + " Monthly Metrics.csv"
$header = "Security Criteria,Date,Number of Findings,Severity"


########################################################################
### This function lists a menu for the user to choose which organization
########################################################################
# Define an array of names to choose from
$names = @("co1", "co2", "co3", "co4")

# Display the menu and get the user's choice
Write-Host "Which organization are you exporting for?"
for ($i=0; $i -lt $names.Length; $i++) {
    Write-Host ("{0}. {1}" -f ($i+1), $names[$i])
}
$org = Read-Host "Enter the number of your choice"

##########################
# Get the chosen selection
$name = $names[$org-1]

#####################################################################################
### This function opens a Windows Dialog Box for the selection of the source file ###
#####################################################################################

Write-Output "################################################"
Write-Output " "
Write-Output "Select the source file."
Write-Output "The starting directory is  C:\."
Write-Output " "
Write-Output "################################################"

Function Get-FileName($initialDirectory)
{
    [System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null
   
    $OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
    $OpenFileDialog.initialDirectory = $initialDirectory
    $OpenFileDialog.filter = "docx (*.docx)| *.docx"
    $OpenFileDialog.ShowDialog() | Out-Null
    $OpenFileDialog.filename
}
$inputfile = Get-FileName "C:\"

$Word     = New-Object -Com Word.Application
$Document = $Word.Documents.Open($inputfile)

#######################
### End of function ###


# Ask the user how many tables need importing and save it to choice variable
$choice = Read-Host "Enter the number of tables in the word document that need to be imported"

# Select the choice amount of tables
$Tables = $Document.Tables | Select-Object -First $choice


foreach($Table in $Tables)
{
$TableCols = $Table.Columns.Count
$TableRows = $Table.Rows.Count
# this gets the list separator character your local Excel expects
$delimiter = [cultureinfo]::CurrentCulture.TextInfo.ListSeparator



for($r = 2; $r -le $TableRows; $r++) {
    # capture an array of cell contents
    $content = for($c = 1; $c -le $TableCols; $c++) {
        # surround each value with quotes to prevent fields that contain the delimiter character would ruin the csv,
        # double any double-quotes the value may contain,
        # remove the control characters (0x0D 0x07) Word appends to the cell text
        # trim the resulting value from leading or trailing whitespace characters
        '"{0}"' -f ($Table.Cell($r,$c).Range.Text -replace '"', '""' -replace '[\x00-\x1F\x7F]').Trim()
		($Table.Cell($r,$c).Range.Text -replace '[\x00-\x1F\x7F]').Trim()
    }
    # output this array joined with the delimiter, both on screen and to file
   $content -join $delimiter | Add-Content -Path $tempoutputfile -PassThru
}
}
$Document.Close()
$Word.Quit()
# Stop Winword Process
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Document)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()




Does this affect the export?

If you can get the data out of the Word file OK, then it’s going to be way easier to work with the CSV file than to try and filter before export:

# $data represents the contents of the exported CSV file
$data = @'
column1,column2,column3,changed
dog,resolved,badger,no
fox,rabbit,muppet,yes
turnip,carrot,cake,resolved
resolved,computer,rice,maybe
paper,rock,elephant,pen
'@

# This simulates Import-CSV .\exportedData.csv
$csv = $data | ConvertFrom-Csv

# Filter out rows containing resolved, and select the required columns
$csv | Where-Object { $_ -notmatch 'resolved' } | Select-Object column1,column2,column3

Thanks Matt, I was thinking the same.

I am currently exporting the data via add-content to a temporary file $tempoutputfile. However, I am stripping the column headings because I wanted to change the header names. Should I do that first, before exporting the data to the csv? How could I edit / change / add new column headers?

Again, for ease, I think I’d do it afterwards. You can use calculated properties:

# $data represents the contents of the exported CSV file
$data = @'
column1,column2,column3,changed
dog,resolved,badger,no
fox,rabbit,muppet,yes
turnip,carrot,cake,resolved
resolved,computer,rice,maybe
paper,rock,elephant,pen
'@

# This simulates Import-CSV .\exportedData.csv
$csv = $data | ConvertFrom-Csv

# Filter out rows containing resolved, and select the required columns
$calculatedProperties = @(
    @{l='NewColumn1';e={$_.column1}}
    @{l='NewColumn2';e={$_.column2}}
    @{l='NewColumn3';e={$_.column3}}
)

$csv | Where-Object { $_ -notmatch 'resolved' } | Select-Object -Property $calculatedProperties

Here is where I am at now (I ended up adding the headers early on in the script)

BUT, It seems the searching / finding I am doing in the word file is generating duplicate data?
The source word file has col1, col2, col3, col4 for example with rows of findings.

My output is showing

col1, col1, col2, col2, col3, col3, col4, col4
row 1, row1, row2, row2, row3, row3, row4, row4

$date = Get-Date -Format "Y"
$currentFolder = (Get-Location).Path
$tempoutputfile = $currentFolder+"\"+"out_temp.csv"
$outputfile = $date + " "+$name +" " + "Monthly Metrics.csv"
$header = "Security Criteria,Date,Number of Findings,Severity"


########################################################################
### This function lists a menu for the user to choose which organization
########################################################################
# Define an array of names to choose from
$names = @("co1", "co2", "co3", "co4")

# Display the menu and get the user's choice
Write-Host "Which organization are you exporting for?"
for ($i=0; $i -lt $names.Length; $i++) {
    Write-Host ("{0}. {1}" -f ($i+1), $names[$i])
}
$org = Read-Host "Enter the number of your choice"

##########################
# Get the chosen selection
$name = $names[$org-1]

#####################################################################################
### This function opens a Windows Dialog Box for the selection of the source file ###
#####################################################################################

Write-Output "################################################"
Write-Output " "
Write-Output "Select the source file."
Write-Output "The starting directory is  C:\."
Write-Output " "
Write-Output "################################################"

Function Get-FileName($initialDirectory)
{
    [System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null
   
    $OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
    $OpenFileDialog.initialDirectory = $initialDirectory
    $OpenFileDialog.filter = "docx (*.docx)| *.docx"
    $OpenFileDialog.ShowDialog() | Out-Null
    $OpenFileDialog.filename
}
$inputfile = Get-FileName "C:\"

$Word     = New-Object -Com Word.Application
$Document = $Word.Documents.Open($inputfile)

#######################
### End of function ###


# Ask the user how many tables need importing and save it to choice variable
$choice = Read-Host "Enter the number of tables in the word document that need to be imported"

# Select the choice amount of tables
$Tables = $Document.Tables | Select-Object -First $choice

$header = @'
Security Criteria,Date,Number of Findings,Severity
'@
$header | Add-Content -Path $tempoutputfile -PassThru

foreach($Table in $Tables)
{
$TableCols = $Table.Columns.Count
$TableRows = $Table.Rows.Count
# this gets the list separator character your local Excel expects
$delimiter = [cultureinfo]::CurrentCulture.TextInfo.ListSeparator



for($r = 1; $r -le $TableRows; $r++) {
    # capture an array of cell contents
    $content = for($c = 1; $c -le $TableCols; $c++) {
        # surround each value with quotes to prevent fields that contain the delimiter character would ruin the csv,
        # double any double-quotes the value may contain,
        # remove the control characters (0x0D 0x07) Word appends to the cell text
        # trim the resulting value from leading or trailing whitespace characters
        '"{0}"' -f ($Table.Cell($r,$c).Range.Text -replace '"', '""' -replace '[\x00-\x1F\x7F]').Trim()
		($Table.Cell($r,$c).Range.Text -replace '[\x00-\x1F\x7F]').Trim()
    }
    # output this array joined with the delimiter, both on screen and to file
    $content -join $delimiter | Add-Content -Path $tempoutputfile -PassThru
}
}
$Document.Close()
$Word.Quit()
# Stop Winword Process
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Document)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()



$csv = Import-CSV $tempoutputfile `
| Where-Object { $_ -notmatch 'high' } | `
Select-Object 'Security Criteria','Date','Number of Findings','Severity' | Export-CSV $outputfile






The output is caused by duplicate code:

'"{0}"' -f ($Table.Cell($r,$c).Range.Text -replace '"', '""' -replace '[\x00-\x1F\x7F]').Trim()
($Table.Cell($r,$c).Range.Text -replace '[\x00-\x1F\x7F]').Trim()

I would suggest you do a separate temporary file for each table. It’s going to be difficult to process if all the data is in one file.

That makes sense, thanks Matt.
I have one last ask, inserting a column into the new csv and filling it with the date in format:

2/29/2024 0:00

Ideally… I want it to be the last day of the current month, but not sure if that’s very simple?

Add it to the calculated properties:

# $data represents the contents of the exported CSV file
$data = @'
column1,column2,column3,changed
dog,resolved,badger,no
fox,rabbit,muppet,yes
turnip,carrot,cake,resolved
resolved,computer,rice,maybe
paper,rock,elephant,pen
'@

# This simulates Import-CSV .\exportedData.csv
$csv = $data | ConvertFrom-Csv

$LastDay = (Get-Date (Get-Date -format 'y')).AddMonths(1).AddDays(-1).ToString('M/d/yyyy h:mm')

# Filter out rows containing resolved, and select the required columns
$calculatedProperties = @(
    @{l='NewColumn1';e={$_.column1}}
    @{l='NewColumn2';e={$_.column2}}
    @{l='NewColumn3';e={$_.column3}}
    @{l='Date';e={$LastDay}}
)

$csv | Where-Object { $_ -notmatch 'resolved' } | Select-Object -Property $calculatedProperties

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.