简体   繁体   English

为什么我的 PDF 文件大小在拆分和合并后会增加? (使用 PDFSharp c#)

[英]Why does my PDF file size increase after splitting and merging back? (Using PDFSharp c#)

I am basically splitting a PDF document into multiple documents containing one page each.我基本上是将 PDF 文档拆分为多个文档,每个文档包含一页。 After splitting I perform some operations and the merge the documents back to a single PDF.拆分后,我执行一些操作并将文档合并回单个 PDF。 I am using PDFsharp in c# to do this.我在 c# 中使用 PDFsharp 来执行此操作。 Now the problem I am facing is that when I split the document and then add them back, the file size increases from 1.96Mbs to 12.2Mbs.现在我面临的问题是,当我拆分文档然后将它们添加回来时,文件大小从 1.96Mbs 增加到 12.2Mbs。 Now after thoroughly testing, I have pointed out that the problem lies not in the operations which I performing after splitting but in the actual splitting and merging of PDF documents.现在经过彻底测试,我指出问题不在于我在拆分后执行的操作,而在于PDF文档的实际拆分和合并。 The following are my functions which I have created.以下是我创建的函数。

 public static List<Stream> SplitPdf(Stream PdfDoc)
    {
        System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
        List<Stream> outputStreamList = new List<Stream>();
        PdfSharp.Pdf.PdfDocument inputDocument = PdfReader.Open(PdfDoc, PdfDocumentOpenMode.Import);

        for (int idx = 0; idx < inputDocument.PageCount; idx++)
        {
            PdfSharp.Pdf.PdfDocument outputDocument = new PdfSharp.Pdf.PdfDocument();
            outputDocument.Version = inputDocument.Version;
            outputDocument.Info.Title =
              String.Format("Page {0} of {1}", idx + 1, inputDocument.Info.Title);
            outputDocument.Info.Creator = inputDocument.Info.Creator;

            outputDocument.AddPage(inputDocument.Pages[idx]);
            MemoryStream stream = new MemoryStream();
            outputDocument.Save(stream);
            outputStreamList.Add(stream);
        }
        return outputStreamList;
    }

 public static Stream MergePdfs(List<Stream> PdfFiles)
    {
        System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
        PdfSharp.Pdf.PdfDocument outputPDFDocument = new PdfSharp.Pdf.PdfDocument();
        foreach (Stream pdfFile in PdfFiles)
        {
            PdfSharp.Pdf.PdfDocument inputPDFDocument = PdfReader.Open(pdfFile, PdfDocumentOpenMode.Import);
            outputPDFDocument.Version = inputPDFDocument.Version;
            foreach (PdfSharp.Pdf.PdfPage page in inputPDFDocument.Pages)
            {
                outputPDFDocument.AddPage(page);
            }
        }
        Stream compiledPdfStream = new MemoryStream();
        outputPDFDocument.Save(compiledPdfStream);
        return compiledPdfStream;
    }

The question which I have is:我的问题是:

  1. Why am I getting this behaviour?为什么我会出现这种行为?
  2. Is there a solution where I can perform split and merge and then get the file of same size?有没有一种解决方案可以让我执行拆分和合并,然后获得相同大小的文件? (Can be of any open-source c# library) (可以是任何开源c#库)

Replying to question 1:回复问题1:
When splitting the files, every file will contain all resources required by the pages it contains.拆分文件时,每个文件都将包含它所包含的页面所需的所有资源。

When merging with PDFsharp again, resources will not be merged and the final document may contain duplicated resources (fonts, images), thus leading to larger files.再次与 PDFsharp 合并时,不会合并资源,最终文档可能包含重复的资源(字体、图像),从而导致文件更大。
This is by design.这是设计使然。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM