使用 itextsharp 根據大小將 pdf 拆分為較小的 pdf

Question

所以我們有一些非常低效的代碼，它根據允許的最大大小將 pdf 分成更小的塊。 阿卡。 如果最大大小為 10megs，則將跳過 8meg 文件，而將根據頁數拆分 16meg 文件。

這是我繼承的代碼，我覺得必須有一種更有效的方法來做到這一點，只需要一種方法和更少的對象實例化。

我們使用以下代碼來調用這些方法：

        List<int> splitPoints = null;
        List<byte[]> documents = null;

        splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize);
        documents = this.SplitPDF(currentDocument, maxSize, splitPoints);

方法：

    private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize)
    {
        List<int> splitPoints = new List<int>();
        PdfReader reader = null;
        Document document = null;
        int pagesRemaining = currentDocument.Pages;

        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));

            using (MemoryStream ms = new MemoryStream())
            {
                PdfCopy copy = new PdfCopy(document, ms);
                PdfImportedPage page = null;

                document.Open();

                //Add pages until we run out from the original
                for (int i = 0; i < currentDocument.Pages; i++)
                {
                    int currentPage = currentDocument.Pages - (pagesRemaining - 1);

                    if (pagesRemaining == 0)
                    {
                        //The whole document has bee traversed
                        break;
                    }

                    page = copy.GetImportedPage(reader, currentPage);
                    copy.AddPage(page);

                    //If the current collection of pages exceeds the maximum size, we save off the index and start again
                    if (copy.CurrentDocumentSize > maxSize)
                    {
                        if (i == 0)
                        {
                            //One page is greater than the maximum size
                            throw new Exception("one page is greater than the maximum size and cannot be processed");
                        }

                        //We have gone one page too far, save this split index   
                        splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1));
                        break;
                    }
                    else
                    {
                        pagesRemaining--;
                    }
                }

                page = null;

                document.Close();
                document.Dispose();
                copy.Close();
                copy.Dispose();
                copy = null;
            }
        }

        if (reader != null)
        {
            reader.Close();
            reader = null;
        }

        document = null;

        return splitPoints;
    }

    private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints)
    {
        var documents = new List<byte[]>();
        PdfReader reader = null;
        Document document = null;
        MemoryStream fs = null;
        int pagesRemaining = currentDocument.Pages;

        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));

            fs = new MemoryStream();
            PdfCopy copy = new PdfCopy(document, fs);
            PdfImportedPage page = null;

            document.Open();

            //Add pages until we run out from the original
            for (int i = 0; i <= currentDocument.Pages; i++)
            {
                int currentPage = currentDocument.Pages - (pagesRemaining - 1);
                if (pagesRemaining == 0)
                {
                    //We have traversed all pages
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }

                page = copy.GetImportedPage(reader, currentPage);
                copy.AddPage(page);
                pagesRemaining--;

                if (splitPoints.Contains(currentPage + 1) == true)
                {
                    //Need to start a new document
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }
            }

            copy = null;
            page = null;

            fs.Dispose();
        }

        if (reader != null)
        {
            reader.Close();
            reader = null;
        }

        if (document != null)
        {
            document.Close();
            document.Dispose();
            document = null;
        }

        if (fs != null)
        {
            fs.Close();
            fs.Dispose();
            fs = null;
        }

        return documents;
    }

據我所知，我能看到的唯一在線代碼是 VB，不一定解決大小問題。

更新：

我們遇到了 OutofMemory 異常，我認為這是大對象堆的問題。 所以一個想法是減少代碼占用空間，這可能會減少堆上大對象的數量。

基本上，這是循環的一部分，該循環遍歷任意數量的 PDF，然后拆分它們並將它們存儲在數據庫中。 現在，我們必須將方法從一次執行所有這些操作（上次運行是 97 個不同大小的 pdf）更改為每 5 分鍾通過系統運行 5 個 pdf。 這並不理想，而且當我們將該工具擴展到更多客戶時，也無法很好地擴展。

（我們正在處理 50 -100 meg pdf，但它們可能更大）。

Answer 1

我也繼承了這個確切的代碼，它似乎有一個重大缺陷。 在GetPDFSplitPoints方法中，它根據 maxsize 檢查復制頁面的總大小，以確定在哪個頁面拆分文件。
在SplitPDF方法中，當它到達發生拆分的頁面時，該點的 MemoryStream 肯定低於允許的最大大小，再多一頁就會超過限制。 但是在document.Close(); 執行后， MemoryStream添加了更多內容（在我使用的一個 PDF 示例中， MemoryStream的Length在document.Close之前和之后從 9 MB 變為 19 MB。Close）。 我的理解是，復制頁面的所有必要資源都在Close添加。
我猜我必須完全重寫這段代碼，以確保在保留原始頁面完整性的同時不超過最大大小。

使用 itextsharp 根據大小將 pdf 拆分為較小的 pdf

問題描述

1 個解決方案

解決方案1
2 2012-10-13 20:15:07

使用 itextsharp 根據大小將 pdf 拆分為較小的 pdf

問題描述

1 個解決方案

解決方案1 2 2012-10-13 20:15:07

解決方案1
2 2012-10-13 20:15:07