简体   繁体   English

如何在Powershell中使用iTextsharp合并/追加PDF?

[英]How to merge/append PDFs with iTextsharp in Powershell?

I have seen a few ways to do this in C# but the syntax differences are giving me a hard time. 我已经看到了几种在C#中执行此操作的方法,但是语法上的差异让我很难受。 Could someone please help me out just with a PowerShell way of doing this. 有人可以用PowerShell的方法来帮助我。

What I am trying to accomplish is this: I have a large PDF that has customer accounting statements. 我要完成的工作是:我有一个包含客户会计报表的大PDF。 Each customer has a different amount of pages for their statement. 每个客户的账单页面数量不同。 I am parsing the text on the PDF to find their account number (I have this done). 我正在解析PDF上的文本以找到其帐号(我已完成此操作)。 So I create a PDF with that first page, then it checks to see if there are any more pages for that account number. 因此,我用第一页创建了一个PDF,然后检查该帐号是否还有更多页面。 If there are, then it will (this is where I need the help) take that page and append it to the first page I have created until there are no more pages with that account number. 如果有,那么它将(需要我的帮助)拿到该页面并将其附加到我创建的第一页,直到不再有该帐号的页面为止。 So in the end I will have pdf files with the account numbers as filenames and the appropriate pages in each PDF. 因此,最后我将得到带有帐号作为文件名的pdf文件,以及每个PDF中的相应页面。

I am stuck on the appending pages after the first one is created. 创建第一个页面后,我会停留在附加页面上。

Thank you very much! 非常感谢你! Mark 标记

Assuming you have the hard part done already as stated, parsing the text to find the page number of the account numbers, here's a working example that shows how to append pages from your large PDF that has customer accounting statements: 假设您已经按照说明完成了艰苦的工作,分析文本以找到帐号的页码,这是一个有效的示例,该示例显示了如何从具有客户会计报表的大型PDF中追加页面:

$workingDirectory = Split-Path -Parent $MyInvocation.MyCommand.Path;
[void] [System.Reflection.Assembly]::LoadFrom(
    [System.IO.Path]::Combine($workingDirectory, 'itextsharp.dll')
);

$output = [System.IO.Path]::Combine($workingDirectory, 'output.pdf');
$statements = [System.IO.Path]::Combine($workingDirectory, 'statements.pdf');
$fileStream = New-Object System.IO.FileStream($output, [System.IO.FileMode]::OpenOrCreate);
$document = New-Object iTextSharp.text.Document;
$pdfCopy = New-Object iTextSharp.text.pdf.PdfCopy($document, $fileStream);
$reader = New-Object iTextSharp.text.pdf.PdfReader($statements);
$document.Open();
$pageCount = $reader.NumberOfPages;
for ($i = 0; $i -lt $pageCount; $i++) {
    if ($i % 2 -eq 0) {
        $pdfCopy.AddPage(
            $pdfCopy.GetImportedPage($reader, $i + 1)
                                             # ^^^^^
                                             # your page number here
        );                                                                                               
    }
}
$pdfCopy.FreeReader($reader);
$reader.Dispose();
$document.Dispose();
$fileStream.Dispose();

Add a separate loop to create copies for each individual account number. 添加一个单独的循环为每个单独的帐号创建副本。

This doesn't exactly answer what I was originally trying to do. 这并不能完全回答我最初试图做的事情。 However I was able to accomplish what I needed to for this task. 但是,我能够完成此任务所需的工作。 I would still like to know how to append a page to an existing PDF. 我仍然想知道如何将页面附加到现有的PDF中。

Add-Type -Path D:\FlavinHOA\itext\itextsharp.dll
$saveDir = 'D:\FlavinHOA\accountid'
$pageSpread = 0
function Copy-PDFPages {
    param(
        [Parameter(Mandatory)]
        [string]$pdfFile,
        [int]$StartPage,
        [int]$EndPage
    )
    echo $pdfFile
    $inputPdf = New-Object iTextSharp.text.pdf.PdfReader $pdfFile
    $PageCount = $inputPdf.NumberOfPages
    if ($EndPage -lt $StartPage -or $EndPage -gt $PageCount) {
        $EndPage = $PageCount
    }

    $inputDoc = New-Object `
        iTextSharp.text.Document $inputPdf.GetPageSizeWithRotation(1)

    $fs = New-Object System.IO.FileStream `
        ("$saveDir\$accountID.pdf", "Create")

    $outputWriter = [iTextSharp.text.pdf.PdfWriter]::GetInstance($inputDoc ,$fs)

    $inputDoc.Open()
    $cb1 = $outputWriter.DirectContent

    ForEach($targetPage in ($StartPage..$EndPage)) {
        [void]$inputDoc.SetPageSize($inputPdf.GetPageSizeWithRotation($targetPage))
        [void]$inputDoc.NewPage()
        $page = $outputWriter.GetImportedPage($inputPdf, $targetPage);
        $rotation = $inputPdf.GetPageRotation($targetPage)

        if ($rotation -eq 90 -or $rotation -eq 270) {
            $cb1.AddTemplate($page, 0, -1, 1, 0, 0, $inputPdf.GetPageSizeWithRotation($targetPage).Height)
        } else {
            $cb1.AddTemplate($page, 1, 0, 0, 1, 0, 0)
        }
    }

    $inputDoc.Close()

    $fs.Close()
}


function Get-CustomerPages {
    param(
            [Parameter(Mandatory)]
            [string]$pdfFile
        )

$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $pdfFile  
    $count = 0      

        for ($page = 1; $page -le $reader.NumberOfPages; $page++)
        {
            #$page = 48
            $nextPage = $page + 1
            $strategy = new-object  'iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy'
            $strategy2 = new-object  'iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy'            
            $currentText = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page, $strategy);

            if ( $page -lt $reader.NumberOfPages ){
                $nextPageText = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $nextPage, $strategy2);
            } else {
                $nextPageText = 'Customer'
            }

            $currentCustMatch = [regex]::Matches($currentText, 'Customer.*')
            $currentCustID = ($currentCustMatch.Value).Replace('Customer Account ID: ','')

            $nextPageMatch = [regex]::Matches($nextPageText, 'Customer.*')
            $nextPageCustID = ($nextPageMatch.Value).Replace('Customer Account ID: ','')

            $accountID = $currentCustID
            $nextAccountID = $nextPageCustID


            if ( $nextAccountID -eq $accountID) {
                echo "More than 1"
                $pageSpread++
            } else {
               echo "-----------Start Export-------------"
               $startExportPage = $page - $pageSpread
               echo "Account ID fed into CopyPDF $accountID"
               echo "Pages exported from $startExportPage to $page"
               Copy-PDFPages -pdfFile $pdfFile -StartPage $startExportPage -EndPage $page
               echo "------------End Export--------------"
               $pageSpread = 0
            }

            $accountID
            $count++
        }
        $Reader.Close();
        $count
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM