ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

imessage357_h · November 20, 2014, 5:37am

Can anyone help me to convert a c# .net program to powershell cmdlet.

Has anyone used this cmdlet ConvertFrom-PDF

http://www.beefycode.com/post/ConvertFrom-PDF-Cmdlet.aspx

I need to scan PDF’s and I cant seem to get the source code in this blog post into a working cmdlet

Anyhelp would be greatly apprecated

system · November 20, 2014, 5:50am

I’ve done some work with the iTextSharp libraries directly in PowerShell before. You can see an example at Search a PDF and return specific text . You will need to download a copy of iTextSharp.dll.

imessage357_h · November 20, 2014, 6:50am

Hi Dave thanks for getting back to me

I tried Get-ReferencesFromPdf cmdlet It didn’t return any data no errors either Any suggestions for troubleshooting this?

I do have the iTextSharp.dll and created the same directory structure from the post

system · November 20, 2014, 6:59am

That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It’s not meant for you to be able to run it directly.

However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that’s up to you.

Here’s a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:

function Get-PdfText
{
    [CmdletBinding()]
    [OutputType([string])]
    param (
        [Parameter(Mandatory = $true)]
        [string]
        $Path
    )

    $Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)

    try
    {
        $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
    }
    catch
    {
        throw
    }

    $stringBuilder = New-Object System.Text.StringBuilder

    for ($page = 1; $page -le $reader.NumberOfPages; $page++)
    {
        $text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
        $null = $stringBuilder.AppendLine($text) 
    }

    $reader.Close()

    return $stringBuilder.ToString()
}

imessage357_h · November 20, 2014, 7:21am

ok I tried it , still not returning a string am I still using

Add-Type -Path .\PdfToText\itextsharp.dll

I feel like im not placing this .dll right

any other suggestions

thx

_timpringle · November 20, 2014, 5:50pm

Try something like this

[System.Reflection.Assembly]::LoadFrom(‘C:\Data\iTextSharp.DLL’)

.

(Copying the file there as well of course)

Use fully qualified paths BTW.

Then test Dave’s function. Worked good for me.

guldmann · August 14, 2017, 4:56am

For some reason I cannot load the dll as described…
instead i have to do like this

$bytes = [System.IO.File]::ReadAllBytes("c:\...\itextsharp.dll")
[System.Reflection.Assembly]::Load($bytes)

Topic		Replies	Views
PowerShell using iTextSharp to combine PDF files PowerShell Help	3	219	May 16, 2024
Unable to run C# code in PowerShell PowerShell Help	1	232	May 16, 2024
Parsing PDF file PowerShell Help	2	477	February 10, 2024
Itextsharp - use in foreach PowerShell Help	6	291	May 16, 2024
Convert txt file to PDF using adobe printer PowerShell Help	1	269	May 16, 2024

ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

Related topics