multi HTML > single PDF Automation wkHTMLtoPDF (using Powershell)

Hello,

 

I have been trying to automate converting a bulk of HTML documents I have (powershell generated HTML documents) into a single .PDF document as a report.

I have been using wktmltopdf to start - and it works fine for the task, but I cannot seem to automate it (at least multiple-in, single-out).

The issue I’m having is outputting the selected files into the wkhtmltopdf cli as a “list”.

##Syntax for wkhtmltopdf = command ran from /bin of directory wkhtmltopdf [global option] [documents/HTML] [file output full path]


##powershell script that takes HTML document path names and converts them into single PDF file

$OutputFile = '$HOME\Documents\TempPDFReport\reporttest6.pdf'
$wkhtmltopdfRootDir = 'C:\Program Files\wkhtmltopdf\bin'
$GetChildItems = (Get-ChildItem -Path $HOME'\documents\TempHTMLConvert' -recurse |`
where {$_.extension -eq ".html"} |`
Select-Object -Property FullName).FullName -join ' '


&'c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' $GetChildItems $OutputFile

What I expected here was that it would generate a list of all child items, join them with a space (The files and paths have no spaces in the names so no quotations needed), and act as a “list” of all the files to be input to the wkhtmltopdf.exe CLI, separated with a space.

I am however getting errors. When I add a Set-Clipboard pipe within the GetChildItems cmdlt and paste that into the wkhtmltopdf.exe CLI and add an output location - it works just fine. But the pass in the script doesn’t seem to function, it throws wkhtmltopdf.exe error “unknown protocol c”, which means it thinks that the first C in the first file path C:… is a protocol, but I can’t figure out what in the output format is causing that. There is no space after the C:.

If anyone knows a good way of being able to “pipe” PowerShell objects into other CLI - I’d greatly appreciate the help. I’m not crazy good with powershell, so I’d imagine you can do an array or something, or maybe a more complex -join?

 

Thanks in advance,

-Mackling101

Hi,

It looks like you need to specify the protocol when using wkhtmltopdf.exe. For files, this is usually file:/// rather than http://

This appears to work:

$files = Get-ChildItem E:\temp\html\ -Include *.html -Recurse | Select -ExpandProperty FullName

$fileList = @()

foreach ($file in $files) {

    $file = "file:///$file"
    $fileList += $file
}

& E:\Temp\wkhtmltox\bin\wkhtmltopdf.exe $fileList output.pdf

Hey Matt,

Thanks for the advice - sadly, the big issue I’m having is dropping the added crap that powershell puts on the objects: @{FullName=

That sticks in front of the filenames - and I’m not sure how to get rid of that. If I could just grab only the filename, and that’s it… nothing else, I think it would be okay. Also, the File:/// is not working either as I end up with an object looking like this:

file:///@{FullName=C:\Users…

So each filename ends up with all of that stuff in front if the C:. The join I had above got me the C:\ only, but doesn’t seem to work still. Maybe at this point I just give up on this, and admit that passing filenames to wkhtmltopdf from powershell isn’t possible. Maybe if I push objects to a CSV temp file, and then grab them from that they won’t have all the added stuff?

 

Thanks for the help.

 

-Mackling101

Well it is possible. The code I posted works. Did you try it?

You need to make sure you use -ExpandProperty to get the name as a String object.

I am really not sure what you are after here, but Get-Children return a file object, not feel content.

Get-Content, return file content of an individual file called.

You are using a 3rdP external app, wkhtmltopdf.exe, which I’ve never heard of before to create a PDF, when you can just use the PDF printer in Windows, but I digress.

Using Set-Clipboard with Get-ChildItem, means nothing really, relative to the file content. You’ll only be send the full file object to the clipboard, and that make little sense.

So, are you just wanting to send the full filenames to a single PD or the actual file content?

If it is the later, you have more work to do. You have to:

  • Loop to read each file
  • Get it's content
  • Add that to a single file or variable
  • The convert that to PDF

If you are getting stuff back from just the above. Then good, if not, you need to figure out why.

This is wrong, because you are using a variable than needs to be expanded.
Single quotes are for simple strings.

$OutputFile = '$HOME\Documents\TempPDFReport\reporttest6.pdf'

it should be this
Double quotes are for variable expansion

$OutputFile = "$HOME\Documents\TempPDFReport\reporttest6.pdf"
$wkhtmltopdfRootDir = 'C:\Program Files\wkhtmltopdf\bin'

This, by it self would only send filenames to the PDF file, not content.

$GetChildItems = (Get-ChildItem -Path $HOME'\documents\TempHTMLConvert' -recurse |`
where {$_.extension -eq ".html"} |`
Select-Object -Property FullName).FullName -join ' '

Don’t use the backtick after the pipe for line continuation, the pipe is a natural line continuation, there are many line continuations in PowerShell. Backtick as it’s place, and I do use it, but not here. Other just malign it always.

This is a good article on the topic, though I feel the author convolutes things to justify some of his point, and this I disagree with him. Yet, most of what’s there, is on the money.

Bye Bye Backtick: Natural Line Continuations in PowerShell

# Get the fullname of the file and the full content of all target files, and put into a variable, 
$HtmlContent = Get-ChildItem -Path "$HOME\documents\TempHTMLConvert\*.html" -recurse | 
ForEach{ 
    $PSItem.FullName
    Get-Content -Path $PSItem.FullName 
}

Running external commands with PowerShell require special attention.

Using PowerShell and external commands and their parameters or switches.

PowerShell: Running Executables
https://social.technet.microsoft.com/wiki/contents/articles/7703.powershell-running-executables.aspx

Solve Problems with External Command Lines in PowerShell
https://devblogs.microsoft.com/scripting/solve-problems-with-external-command-lines-in-powershell

Top 5 tips for running external commands in Powershell
https://powershelleverydayfaq.blogspot.com/2012/04/top-5-tips-for-running-external.html

Using Windows PowerShell to run old command line tools (and their weirdest parameters)
https://blogs.technet.microsoft.com/josebda/2012/03/03/using-windows-powershell-to-run-old-command-line-tools-and-their-weirdest-parameters

Execution of external commands in PowerShell done right
https://mnaoumov.wordpress.com/2015/01/11/execution-of-external-commands-in-powershell-done-right
https://mnaoumov.wordpress.com/2015/03/31/execution-of-external-commands-native-applications-in-powershell-done-right-part-2
https://mnaoumov.wordpress.com/2015/04/05/execution-of-external-commands-native-applications-in-powershell-done-right-part-3

http://edgylogic.com/blog/powershell-and-external-commands-done-right

Quoiting specifics
https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_quoting_rules
https://trevorsullivan.net/2016/07/20/powershell-quoting

& 'c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe' $HtmlContent $OutputFile

Sorry for the late reply - I did try to use exactly what you had posted Matt, but it was still throwing me the same errors.

 

Postanote - Thank you for your reply, I will read through this and see if I can get it working.

 

Thank you both for the replies, I appreciate the assistance here.

 

-Mackling101

Can you post the code you tested with after my post so I can try and re-test?
The sample I posted worked fine for me, generating a single multi-page PDF file from three HTML files.

[pre]

$OutputFile = ‘c:\users—\Documents\TempPDFReport\reporttest8.pdf’
$Files = Get-ChildItem C:\Users—\Documents\TempHTMLConvert -Include *.html -Recurse | Select -ExpandProperty FullName

$FilesList = @()

Foreach ($File in $Files) {

$File = “File:///$File”
$FileList += $File

}

&‘c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe’ $FileList $OutputFile

[/pre]

 

Here is what I am using. Sorry for late reply - again. Due to functionality of this, I was a bit forced to move on. But I’m still passively working on it.

 

I get output like this: “Failed to load file:///C:\Users” that is error from wkhtmltopdf.exe

 

Thanks in advance.

 

-Mackling101

 

 

If that’s an exact copy/paste then the problem is that your array is called $Fileslist (with an ‘s’ in the middle) but in the loop and in the argument, it’s called $FileList. Instead of passing a list of files, you’re passing one big string that looks like this:

file:///filename1.htmlfile:///filename2.htmlfile:///filename3.html

Oh. My. God. LUL

I can’t believe I missed that… seriously. So sorry - That’s pretty bad.

 

That worked, and I now have a better understanding of arrays and passing variables. Thank you both for the help, and Matt - thank you a ton. I feel terrible that I missed that. Yikes…

 

1000 thank you’s.

You’re very welcome and don’t feel too bad about it, these things are easily overlooked. You may want to consider using VSCode for developing your scripts; one of its features is it tells you which variables are assigned but not used.