Can anyone help me to convert a c# .net program to powershell cmdlet.
Has anyone used this cmdlet ConvertFrom-PDF
I need to scan PDF’s and I cant seem to get the source code in this blog post into a working cmdlet
Anyhelp would be greatly apprecated
Hi Dave thanks for getting back to me
I tried Get-ReferencesFromPdf cmdlet It didn’t return any data no errors either Any suggestions for troubleshooting this?
I do have the iTextSharp.dll and created the same directory structure from the post
That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It’s not meant for you to be able to run it directly.
However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that’s up to you.
Here’s a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:
[Parameter(Mandatory = $true)]
$Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
$stringBuilder = New-Object System.Text.StringBuilder
for ($page = 1; $page -le $reader.NumberOfPages; $page++)
$text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
$null = $stringBuilder.AppendLine($text)
ok I tried it , still not returning a string am I still using
Add-Type -Path .\PdfToText\itextsharp.dll
I feel like im not placing this .dll right
any other suggestions
Try something like this
(Copying the file there as well of course)
Use fully qualified paths BTW.
Then test Dave’s function. Worked good for me.
For some reason I cannot load the dll as described…
instead i have to do like this
$bytes = [System.IO.File]::ReadAllBytes("c:\...\itextsharp.dll")