简体   繁体   English

数百页的 C# 中的 Word/PDF 生成速度太慢

[英]Word/PDF Generation in C# with hundreds of pages is too slow

I'm having speed issue generating documentation in C#.我在用 C# 生成文档时遇到速度问题。

I am basically trying to create documents with 600+ pages.我基本上是在尝试创建 600 多页的文档。 But the tools I have used handle this very slowly.但是我使用的工具处理这个非常慢。

I first tried using DocX by Novacode.我首先尝试使用Novacode的 DocX。 Creation of this document with 600+ pages takes upwards to 3 minutes.创建此 600 多页的文档最多需要 3 分钟。 I learned that there could be an issue with the function "InsertDocument" so I tried to find a different solution.我了解到函数“InsertDocument”可能存在问题,因此我试图找到不同的解决方案。

I started looking into opening a HTML document into word.我开始研究将 HTML 文档打开到 word 中。 While this is a fast solution, images are not embedded into the document.虽然这是一个快速的解决方案,但图像不会嵌入到文档中。 The HTML syntax (src="") is not supported in MS Word. MS Word 不支持 HTML 语法 (src="")。

I could use URLs to the images, but then if the internet connection is down, the images would not display.我可以使用图像的 URL,但是如果互联网连接中断,图像将不会显示。

I then started looking into a HTML->PDF solution.然后我开始研究 HTML->PDF 解决方案。 iTextSharp is a little faster than the DocX solution, but still takes 1-2 minutes to generate this document. iTextSharp比 DocX 解决方案快一点,但仍然需要 1-2 分钟来生成此文档。

I am simply out of ideas.我只是没有想法。 I'm not sure a commercial product would be better, and I don't want to shell out that kind of cash, to just have the same speed issue.我不确定商业产品会更好,而且我不想支付那种现金,只是有相同的速度问题。

Has anyone had experience with creating Word/PDF documents with 600+ pages in C# that is fairly quick (1-5 seconds).有没有人在 C# 中创建 600 多页的 Word/PDF 文档的经验相当快(1-5 秒)。

You should be able to create a rich formatted DOCX file with 600+ pages in that time frame, but for PDF file I'm not sure... it will probably depend on your document content.您应该能够在该时间范围内创建一个包含 600 多页的格式丰富的 DOCX 文件,但对于 PDF 文件,我不确定……这可能取决于您的文档内容。

Anyway, I'm able to create a rather large DOCX file with GemBox.Document in just few seconds (0-4 sec), and PDF file as well, but it does take a bit more time then DOCX output.无论如何,我能够在短短几秒钟(0-4 秒)内使用GemBox.Document创建一个相当大的 DOCX 文件,以及 PDF 文件,但它确实比 DOCX 输出需要更多的时间。

You can also convert HTML to DOCX or HTML to PDF really fast, but that can depend on the HTML content itself.您还可以非常快速地将HTML转换为 DOCX或将HTML转换为 PDF ,但这取决于 HTML 内容本身。

If possible, you should prefer having well written HTML content that's "printer-friendly", doesn't have too much nesting levels, has optimized images, has single CSS file, etc. Also, if you're providing an URL as an input path then I think it's better to have embedded base64 images then links in order to avoid making additional web requests.如果可能,您应该更喜欢编写良好的 HTML 内容,“打印友好”,没有太多嵌套级别,优化图像,具有单个 CSS 文件等。此外,如果您提供 URL 作为输入path 那么我认为最好先嵌入 base64 图像然后链接以避免发出额外的 Web 请求。

Last, I don't think there is much difference in Flat OPC XML vs DOCX.最后,我认为 Flat OPC XML 与 DOCX 没有太大区别。 Basically they both generate the same content, it's just that DOCX file is additionally zipped which is a neglectable performance penalty.基本上它们都生成相同的内容,只是 DOCX 文件被额外压缩,这是一个可以忽略的性能损失。

If you are trying to do this from a web server, you should be careful about resources consumption of this process, since you may run out of memory for example quite easily.如果您尝试从 Web 服务器执行此操作,则应注意此进程的资源消耗,因为例如很容易耗尽内存。

If at some point you decide to consider commercial libraries, maybe you could give Amyuni PDF Creator .Net a try.如果在某个时候你决定考虑商业图书馆,也许你可以试试 Amyuni PDF Creator .Net Amyuni PDF Creator .Net provides a "page by page" mode that saves resources when processing exceptionally long PDF documents. Amyuni PDF Creator .Net 提供“逐页”模式,可在处理超长PDF 文档时节省资源。 The idea is to save each page to the output file as soon as it is generated, maybe keeping a few pages in memory in case they need to be modified.这个想法是在每个页面生成后立即将其保存到输出文件中,可能会在内存中保留几页以防需要修改。

Take a look on these links for more details:查看这些链接以了解更多详细信息:

usual disclaimer applies通常的免责声明适用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM