简体   繁体   English

iTextSharp生成损坏的PDF文件

[英]iTextSharp generates corrupted PDF file

I am trying to generate a PDF file from and HTML string and external css files and save the PDF to disk. 我正在尝试从HTML字符串和外部CSS文件生成PDF文件,并将PDF保存到磁盘。 As you can see from this example, I am using very simple html. 从该示例可以看到,我使用的是非常简单的html。 I know the css files are getting read into the ccsResolver by viewing intellisense. 我知道通过查看智能感知将css文件读入ccsResolver。

Here is the code I am using : 这是我正在使用的代码:

internal string Create(PdfDocumentDefinition documentDefinition)
{
    MemoryStream output = new MemoryStream();
    MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body>Hello, World!</body></html>"));

    string pathName = @WebConfigurationManager.AppSettings["StagingPath"] + documentDefinition.DocumentName + ".pdf";
    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, output);

    using (output)
    {
        using (document)
        {
            document.Open();

            CssResolverPipeline pipeline = SetCssResolver(documentDefinition.CssFiles, document, writer);

            XMLWorker worker = new XMLWorker(pipeline, true);

            XMLParser parser = new XMLParser(worker);
            parser.Parse(input);

            output.Position = 0;
        }

        Byte[] data = output.ToArray();
        File.WriteAllBytes(pathName, data);
    }

    return pathName;
}

private CssResolverPipeline SetCssResolver(List<String> cssFiles, Document     document, PdfWriter writer)
{            
    var htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory());
    ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
    if (cssFiles != null)
    {
        foreach (String cssFile in cssFiles)
        {
             //cssResolver.AddCssFile(cssFile, true);
        }
    }

    return new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));            
}

Here is the output as viewed in NotePad++ : 这是在NotePad ++中查看的输出:

2 0 obj
<</Length 117/Filter/FlateDecode>>stream
xœ+ä*ä2гP€á¢t.c 256U0·0R(JåJã
ĪÊÜÒXÏÔHÁÌBÏÌBÁÐPÏ¢Ø@!¨¤Å)¤ÌÂÐH!$(¬khbè»*€„Ò¸4<RsròuÂó‹rR5C²€Š@J\C€ú¼i!*
endstream
endobj
4 0 obj
<</Type/Page/MediaBox[0 0 595 842]/Resources<</Font<</F1 1 0 R>>>>/Contents 2 0 R/Parent 3 0 R>>
endobj
1 0 obj
<</Type/Font/Subtype/Type1/BaseFont/Helvetica/Encoding/WinAnsiEncoding>>
endobj
3 0 obj
<</Type/Pages/Count 1/Kids[4 0 R]>>
endobj
5 0 obj
<</Type/Catalog/Pages 3 0 R>>
endobj
6 0 obj
<</Producer(iTextSharp’ 5.5.7 ©2000-2015 iText Group NV \(AGPL-version\))/CreationDate(D:20151026102026-05'00')/ModDate(D:20151026102026-05'00')>>
endobj
xref
0 7
0000000000 65535 f 
0000000311 00000 n 
0000000015 00000 n 
0000000399 00000 n 
0000000199 00000 n 
0000000450 00000 n 
0000000495 00000 n 
trailer
<</Size 7/Root 5 0 R/Info 6 0 R/ID [<055082e8139638e35ce08dedae069690><055082e8139638e35ce08dedae069690>]>>
%iText-5.5.7
startxref
657
%%EOF

I've been working on this for about 4 hours now. 我已经为此工作了大约4个小时。 Can anyone see why it is not generating a valid PDF? 谁能看到为什么它不能生成有效的PDF?

Trying it 尝试一下

I simplified the OP's original code to 我将OP的原始代码简化为

[Test]
public void ResetStreamPositionAtEndOfUsing()
{
    string outputFilePath = @"test-results\misc\resetStreamPosition.pdf";
    Directory.CreateDirectory(@"test-results\misc\");

    MemoryStream output = new MemoryStream();

    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, output);

    using (output)
    {
        using (document)
        {
            document.Open();
            document.Add(new Paragraph("Test"));
            output.Position = 0;
        }

        Byte[] data = output.ToArray();
        File.WriteAllBytes(outputFilePath, data);
    }
}

Running it produced an invalid PDF file nearly identical to the one pasted by the OP into the question. 运行它会产生一个无效的PDF文件,该文件几乎与OP粘贴到问题中的文件相同。 In particular the PDF header was missing. 尤其是缺少PDF标头。

As recommended by Chris Haas I then removed the spurious line 按照克里斯·哈斯Chris Haas)的建议,我然后删除了虚假线路

            output.Position = 0;

And indeed, now the output PDF is valid, in particular it has its header. 实际上,现在输出PDF有效,尤其是它具有标题。

Analysis 分析

What happens in the MemoryStream output ? MemoryStream output会发生什么?

    MemoryStream output = new MemoryStream();

output is created empty. output创建为空。

    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    PdfWriter writer = PdfWriter.GetInstance(document, output);

The new PdfWriter merely is instantiated, nothing is written, output is still empty. 新的PdfWriter只是被实例化,什么也没写, output仍然为空。

    using (output)
    {
        using (document)
        {
            document.Open();

document informs writer that document construction started, so writer starts by writing the PDF prologue, ie header line and a "binary" comment; document通知writer文档构建已开始,因此writer首先编写PDF序言,即标题行和“二进制”注释。 output now contains %PDF-1.4\\n%âãÏÓ\\n , the current stream position at the end. output现在包含%PDF-1.4 \\ n%ãÏÓ\\ n ,即当前流末尾的位置。

            document.Add(new Paragraph("Test"));

A new paragraph is added to the current (first) page, but only in memory, the objects constituting the content of the current page will only be written when a new page is started or the document is finished. 新的段落将添加到当前(第一)页面,但仅在内存中,仅在新页面开始或文档完成时才写入构成当前页面内容的对象。 output still contains %PDF-1.4\\n%âãÏÓ\\n , the current stream position still at the end. output仍然包含%PDF-1.4 \\ n%â€Ïn ,当前流位置仍在末尾。

            output.Position = 0;

The stream position is reset. 流位置被重置。 output still contains %PDF-1.4\\n%âãÏÓ\\n , but the current stream position now is at the start ! output仍然包含%PDF-1.4 \\ n%âãÏÓ\\ n ,但是当前流的位置现在在开始位置

        }

This is the end of the code block of using (document) . 这是using (document)的代码块的结尾。 Thus, the Dispose method of document is called. 因此,调用了文档的Dispose方法。 Therein document tells writer that the document creation is finished. 其中document告诉writer文档创建已完成。 writer , therefore, now writes all document objects still in memory and then adds the PDF file epilogue (cross references, trailer, ...). 因此, writer现在将所有文档对象仍写在内存中,然后添加PDF文件结尾(交叉引用,预告片等)。

As the stream position is at the start of the stream now, the existing content is overwritten ! 由于流位置现在位于流的开头, 因此现有内容将被覆盖 output now contains 2 0 obj...%%EOF , ie the complete PDF missing merely the PDF prologue. output现在包含2 0 obj ... %% EOF ,即完整的PDF仅缺少PDF序言。

Thanks to mkl's hint I was able to solve this, but, it doesn't seem right that it has to be done this way. 由于mkl的提示,我能够解决此问题,但是,必须以这种方式完成操作似乎并不正确。 There must be a better way. 一定会有更好的办法。 But the solution was to flush the output to one array to get the first 15 bytes, then close the document and flush to another array to get everything after the first 15 bytes (As far as I can see the output stream never contains all of the bytes) and then create a third array and copy the first 2 into it. 但是解决方案是将输出刷新到一个数组以获取前15个字节,然后关闭文档并刷新到另一个数组以获取前15个字节后的所有内容(据我所知,输出流从不包含所有个字节),然后创建第三个数组并将前两个数组复制到其中。 Here is the complete code: 这是完整的代码:

internal string Create(PdfDocumentDefinition documentDefinition)
{
    string pathName = @WebConfigurationManager.AppSettings["StagingPath"] + documentDefinition.DocumentName + ".pdf";

    MemoryStream input = new MemoryStream(Encoding.UTF8.GetBytes(documentDefinition.Source));

    Document document = new Document(PageSize.A4, 30, 30, 30, 30);
    MemoryStream output = new MemoryStream();
    using (output)
    { 
        PdfWriter writer = PdfWriter.GetInstance(document, output);
        document.Open();

        CssResolverPipeline pipeline = SetCssResolver(documentDefinition.CssFiles, document, writer);

        XMLWorker worker = new XMLWorker(pipeline, true);

        XMLParser parser = new XMLParser(worker);
        parser.Parse(input);

        output.Position = 0;

        Byte[] firstBytes = output.ToArray();

        document.Close();

        Byte[] lastBytes = output.ToArray();
        Byte[] allBytes = new Byte[firstBytes.Length + lastBytes.Length];

        firstBytes.CopyTo(allBytes, 0);
        lastBytes.CopyTo(allBytes, firstBytes.Length);
        File.WriteAllBytes(pathName, allBytes);
    }

    return pathName;
}

private CssResolverPipeline SetCssResolver(List<String> cssFiles, Document document, PdfWriter writer)
{            
    var htmlContext = new HtmlPipelineContext(null);
       htmlContext.SetTagFactory(iTextSharp.tool.xml.html.Tags.GetHtmlTagProcessorFactory());
    ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
    if (cssFiles != null)
    {
        foreach (String cssFile in cssFiles)
        {
            cssResolver.AddCssFile(cssFile, true);
        }
    }
    return new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));            
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM