ConvertFrom-PDF PowerShell Cmdlet | convert a c# .net program to powershell cmd

Can anyone help me to convert a c# .net program to powershell cmdlet.

Has anyone used this cmdlet ConvertFrom-PDF

http://www.beefycode.com/post/ConvertFrom-PDF-Cmdlet.aspx

I need to scan PDF’s and I cant seem to get the source code in this blog post into a working cmdlet

Anyhelp would be greatly apprecated

I’ve done some work with the iTextSharp libraries directly in PowerShell before. You can see an example at Search a PDF and return specific text . You will need to download a copy of iTextSharp.dll.

Hi Dave thanks for getting back to me

I tried Get-ReferencesFromPdf cmdlet It didn’t return any data no errors either Any suggestions for troubleshooting this?

I do have the iTextSharp.dll and created the same directory structure from the post

That function was written specifically for the question posted on that thread, looking for section numbers followed by some number of lines matching ABC-*. It’s not meant for you to be able to run it directly.

However, the code does show you how to use the PdfReader and PdfTextExtractor classes to pull text out of a PDF into a .NET String variable. From there, you can split it by line as in the example, or just work with the whole page text as one string; that’s up to you.

Here’s a more trimmed down example that just extracts all of the text from the PDF and outputs it as a single string, that you can manipulate however you want:

function Get-PdfText
{
    [CmdletBinding()]
    [OutputType([string])]
    param (
        [Parameter(Mandatory = $true)]
        [string]
        $Path
    )

    $Path = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($Path)

    try
    {
        $reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $Path
    }
    catch
    {
        throw
    }

    $stringBuilder = New-Object System.Text.StringBuilder

    for ($page = 1; $page -le $reader.NumberOfPages; $page++)
    {
        $text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page)
        $null = $stringBuilder.AppendLine($text) 
    }

    $reader.Close()

    return $stringBuilder.ToString()
}

ok I tried it , still not returning a string am I still using

Add-Type -Path .\PdfToText\itextsharp.dll

I feel like im not placing this .dll right

any other suggestions

thx

Try something like this

[System.Reflection.Assembly]::LoadFrom(‘C:\Data\iTextSharp.DLL’)
.

(Copying the file there as well of course)

Use fully qualified paths BTW.

Then test Dave’s function. Worked good for me.

For some reason I cannot load the dll as described…
instead i have to do like this

$bytes = [System.IO.File]::ReadAllBytes("c:\...\itextsharp.dll")
[System.Reflection.Assembly]::Load($bytes)