如何使用 C# iText7 将 PDF Portfolio 中的所有 pdf 文件合并为普通 pdf 文件？

Question

I took this C# example and tried to get the attachments as a PdfDocument, but I couldn't figure out how to do it.我采用了这个 C# 示例并尝试将附件作为 PdfDocument 获取，但我不知道如何去做。

In the end I would like to simply merge every pdf file contained in a portfolio into a single "normal" pdf file.最后，我想简单地将投资组合中包含的每个 pdf 文件合并为一个“正常”的 pdf 文件。 Every non-pdf attachment should be ignored.每个非 pdf 附件都应该被忽略。

Edit:编辑：

(Okay, sorry for being too vague. By saying what I want to achieve, I simply wanted to make it easier for you guys to help me. I did not want to make you write the program for me.) （好吧，抱歉太含糊了。说出我想要实现的目标，我只是想让你们更容易帮助我。我不想让你们为我编写程序。）

So, here's part of the code from the linked example:因此，这是链接示例中的部分代码：

protected void ManipulatePdf(String dest)
{
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(dest));

    PdfDictionary root = pdfDoc.GetCatalog().GetPdfObject();
    PdfDictionary names = root.GetAsDictionary(PdfName.Names);
    PdfDictionary embeddedFiles = names.GetAsDictionary(PdfName.EmbeddedFiles);
    PdfArray namesArray = embeddedFiles.GetAsArray(PdfName.Names);
    
    // Remove the description of the embedded file
    namesArray.Remove(0);

    // Remove the reference to the embedded file.
    namesArray.Remove(0);

    pdfDoc.Close();
}

Instead of removing anything from the source document, I would like to know how to get the PdfDocument object(s) out of the PdfArray if possible.如果可能，我想知道如何从 PdfArray 中获取 PdfDocument 对象，而不是从源文档中删除任何内容。

Sample file: http://www.mediafire.com/file/c4tw07wci8swdx9/NPort_5000.pdf/file示例文件： http : //www.mediafire.com/file/c4tw07wci8swdx9/NPort_5000.pdf/file

Solution by mkl ported to C#:由 mkl 移植到 C# 的解决方案：

PdfNameTree embeddedFilesTree = pdfDocument.GetCatalog().GetNameTree(PdfName.EmbeddedFiles);
IDictionary<string, PdfObject> embeddedFilesMap = embeddedFilesTree.GetNames();
List<PdfStream> embeddedPdfs = new List<PdfStream>();
foreach (PdfObject pdfObject in embeddedFilesMap.Values)
{
    if (!(pdfObject is PdfDictionary))
        continue;
    PdfDictionary filespecDict = (PdfDictionary)pdfObject;
    PdfDictionary embeddedFileDict = filespecDict.GetAsDictionary(PdfName.EF);
    if (embeddedFileDict == null)
        continue;
    PdfStream embeddedFileStream = embeddedFileDict.GetAsStream(PdfName.F);
    if (embeddedFileStream == null)
        continue;
    PdfName subtype = embeddedFileStream.GetAsName(PdfName.Subtype);
    if (PdfName.ApplicationPdf.CompareTo(subtype) != 0)
        continue;
    embeddedPdfs.Add(embeddedFileStream);
}

if (embeddedPdfs.Count > 0)
{
    PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf", new WriterProperties().SetFullCompressionMode(true));
    PdfDocument flatPdfDocument = new PdfDocument(pdfWriter);
    PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
    RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
    foreach (PdfStream pdfStream in embeddedPdfs)
    {
        PdfReader embeddedReader = new PdfReader(sourceFactory.CreateSource(pdfStream.GetBytes()), new ReaderProperties());
        PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader);
        pdfMerger.Merge(embeddedPdfDocument, 1, embeddedPdfDocument.GetNumberOfPages());
    }
    flatPdfDocument.Close();
}

Answer 1

To merge all pdf files from a PDF Portfolio to a normal pdf file you have to walk the name tree of EmbeddedFiles , retrieve the streams of all PDFs therein, and then merge all these PDFs.要将PDF 包中的所有 pdf 文件合并为普通 pdf 文件，您必须遍历EmbeddedFiles的名称树，检索其中所有 PDF 的流，然后合并所有这些 PDF。

You can do this as follows for a portfolio loaded in a PdfDocument pdfDocument (Java version; the OP edited a port to C# into his question body):对于加载在PdfDocument pdfDocument （Java 版本；OP 将 C# 端口编辑到他的问题正文中）中加载的投资组合，您可以按如下方式执行此操作：

PdfNameTree embeddedFilesTree = pdfDocument.getCatalog().getNameTree(PdfName.EmbeddedFiles);
Map<String, PdfObject> embeddedFilesMap = embeddedFilesTree.getNames();
List<PdfStream> embeddedPdfs = new ArrayList<PdfStream>();
for (Map.Entry<String, PdfObject> entry : embeddedFilesMap.entrySet()) {
    PdfObject pdfObject = entry.getValue();
    if (!(pdfObject instanceof PdfDictionary))
        continue;
    PdfDictionary filespecDict = (PdfDictionary) pdfObject;
    PdfDictionary embeddedFileDict = filespecDict.getAsDictionary(PdfName.EF);
    if (embeddedFileDict == null)
        continue;
    PdfStream embeddedFileStream = embeddedFileDict.getAsStream(PdfName.F);
    if (embeddedFileStream == null)
        continue;
    PdfName subtype = embeddedFileStream.getAsName(PdfName.Subtype);
    if (!PdfName.ApplicationPdf.equals(subtype))
        continue;
    embeddedPdfs.add(embeddedFileStream);
}

Assert.assertFalse("No embedded PDFs found", embeddedPdfs.isEmpty());

try (   PdfWriter pdfWriter = new PdfWriter("NPort_5000-flat.pdf", new WriterProperties().setFullCompressionMode(true));
        PdfDocument flatPdfDocument = new PdfDocument(pdfWriter)    ) {
    PdfMerger pdfMerger = new PdfMerger(flatPdfDocument);
    RandomAccessSourceFactory sourceFactory = new RandomAccessSourceFactory();
    for (PdfStream pdfStream : embeddedPdfs) {
        try (   PdfReader embeddedReader = new PdfReader(sourceFactory.createSource(pdfStream.getBytes()), new ReaderProperties());
                PdfDocument embeddedPdfDocument = new PdfDocument(embeddedReader)) {
            pdfMerger.merge(embeddedPdfDocument, 1, embeddedPdfDocument.getNumberOfPages());
        }
    }
}

( FlattenPortfolio test testFlattenNPort_5000 ) （ FlattenPortfolio测试testFlattenNPort_5000 ）

如何使用 C# iText7 将 PDF Portfolio 中的所有 pdf 文件合并为普通 pdf 文件？

问题描述

Edit:编辑：

Solution by mkl ported to C#:由 mkl 移植到 C# 的解决方案：

1 个解决方案

解决方案1
0 已采纳 2020-08-27 17:56:47

如何使用 C# iText7 将 PDF Portfolio 中的所有 pdf 文件合并为普通 pdf 文件？

问题描述

Edit:编辑：

Solution by mkl ported to C#:由 mkl 移植到 C# 的解决方案：

1 个解决方案

解决方案1 0 已采纳 2020-08-27 17:56:47

解决方案1
0 已采纳 2020-08-27 17:56:47