简体   繁体   English

将问题转换为 .pdf 合并后的 .docx 文件可以在 Word 中正常打开

[英]Issue converting to .pdf a merged .docx file that opens fine in Word

So, I have the following scenario.所以,我有以下情况。

I am working on a system for academical papers.我正在研究学术论文系统。 I have several inputs that are for stuff like author name, coauthors, title, type of paper, introduction, objectives and so on.我有几个输入,用于作者姓名、合著者、标题、论文类型、介绍、目标等。 I store all that information in a database.我将所有这些信息存储在数据库中。 The user has a Preview button which when clicked, generates a Word asynchronously and sends the file location back to the user and that file is afterwards shown to the user in an iframe using Google Doc Viewer.用户有一个预览按钮,单击该按钮会异步生成一个 Word 并将文件位置发回给用户,然后该文件会使用 Google Doc Viewer 在 iframe 中显示给用户。

There's a specific use case where the user/author of the paper can attach a.docx file with a table, or a.jpeg file for a figure.有一个特定的用例,论文的用户/作者可以附加一个带有表格的 .docx 文件,或一个带有图形的 .jpeg 文件。 That table/figure has to be included inside the final.docx file.该表/图必须包含在 final.docx 文件中。

For the.docx generation process I am using PHPWord .对于 the.docx 生成过程,我使用的是PHPWord

So up until this point everything works fine, but my issues start when I try to mix everything and put together the.docx file.所以到目前为止一切正常,但是当我尝试混合所有内容并将 .docx 文件放在一起时,我的问题就开始了。

Approach Number One方法一

My first approach on doing this was to do everything with PHPWord.我这样做的第一个方法是用 PHPWord 做所有事情。 I create the file, add the texts where required and in the case of the image just insert the image and after that the figure caption below the image.我创建文件,在需要的地方添加文本,如果是图像,只需插入图像,然后在图像下方插入图形标题。

Things get tricky though, when I try doing the same thing with the.docx table file.但是,当我尝试对 .docx 表格文件做同样的事情时,事情变得棘手了。 My only option was to get the table XML using this .我唯一的选择是使用this获取表 XML。 It did the trick, but the problem I ran into was that when I opened the resulting Word file, the table was there, but had lost all of its styling and had transparent borders.它成功了,但我遇到的问题是,当我打开生成的 Word 文件时,表格就在那里,但它的所有样式都丢失了并且有透明边框。 Because of those transparent borders, afterwards when converting it to PDF the borders were ignored and the table info is just scrambled text.由于这些透明边框,之后将其转换为 PDF 时,边框被忽略,表格信息只是乱码文本。

Approach Number Two (current one)方法二(当前方法)

After fighting with Approach Number One and just complicating stuff more, I decided to do something different.在与第一方法抗争并使事情变得更加复杂之后,我决定做一些不同的事情。 Since I already generated one docx file with the main paper information and I needed to add another docx file, I decided to use the DocX Merge Library .由于我已经生成了一个包含主要论文信息的 docx 文件并且我需要添加另一个 docx 文件,所以我决定使用DocX Merge Library

So, what i basically did was I have three generated word files, one for the main paper information, one for the table and one for the table caption (that last one is mainly to not overcomplicated the order of information).所以,我基本上做的是生成三个 word 文件,一个用于主要论文信息,一个用于表格,一个用于表格标题(最后一个主要是为了不使信息的顺序过于复杂)。 Also, that data is not in the table.docx file.此外,该数据不在 table.docx 文件中。

Then I run this:然后我运行这个:

$dm->merge( [
    'paper-info.docx',
    'attached-table.docx',
    'attached-table-caption.docx'
], 'complete-file.docx');

So, afterwards, I check and the Word file is generated just as I need it with the table maintaining its original styles and dimensions.因此,之后,我检查并根据需要生成 Word 文件,表格保持其原始 styles 和尺寸。

If I open it in LibreOffice though, I get this error message:但是,如果我在 LibreOffice 中打开它,则会收到以下错误消息:

LibreOffice 错误消息

Then if I continue and open the file, the file opens correctly with all the data with the only exception that it no longer respects the fonts of the file as they appear in Word.然后,如果我继续并打开该文件,该文件将正确打开所有数据,唯一的例外是它不再尊重文件在 Word 中出现的 fonts。

So, the problem comes in the next step.那么,问题就出现在下一步了。 Since I need to present a preview of the file using Google Doc Viewer using this syntax:由于我需要使用以下语法使用 Google Doc Viewer 显示文件预览:

<iframe src="https://docs.google.com/gview?embedded=true&hl=es_LA&url=https://usersite.net/complete-file.docx?pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="100%" height="600" style="border: none;"></iframe>

The document gets loaded fine, but when I review it what I see is that it only shows the content of the first paper-info.docx file and ends right where the table and table caption should appear.文档加载正常,但当我查看它时,我看到它只显示第一个paper-info.docx文件的内容,并在表格和表格标题应该出现的地方结束。 I open the exact same file in Word and it shows the table and caption.我在 Word 中打开完全相同的文件,它显示了表格和标题。

The other issue is when I try to convert the file to PDF.另一个问题是当我尝试将文件转换为 PDF 时。

If I use PHPWord's method of conversion in combination with DomPDF I get the exact same issue as with the Google Docs Viewer, I just have the content of the first file, using this code:如果我将 PHPWord 的转换方法与 DomPDF 结合使用,我会遇到与 Google Docs Viewer 完全相同的问题,我只有第一个文件的内容,使用以下代码:

$phpWordPDF = \PhpOffice\PhpWord\IOFactory::load('complete-file.docx');
$xmlWriterPDF = \PhpOffice\PhpWord\IOFactory::createWriter($phpWordPDF, 'PDF');
$xmlWriterPDF->save('complete-file-pdf');

So my only other viable route was to use LibreOffice's command line using this command:所以我唯一可行的方法是使用 LibreOffice 的命令行,使用这个命令:

soffice --headless --convert-to pdf complete-file.docx

This converts the file correctly, but has the issue mentioned when trying to open the.docx file in LibreOffice, the font styles are disconfigured.这会正确转换文件,但会在尝试在 LibreOffice 中打开 .docx 文件时出现问题,字体 styles 已取消配置。

Also weird part is that if I try to run this in my PHP script:同样奇怪的是,如果我尝试在我的 PHP 脚本中运行它:

shell_exec('soffice --headless --convert-to pdf complete-file.docx');

Nothing happens.什么都没发生。

I am running Apache 2.4.25, PHP 7.4.11 on Windows 10 x64.我在 Windows 10 x64 上运行 Apache 2.4.25、PHP 7.4.11。

Conclusion结论

Until now my best result was by merging the files, but it also caused this issue.到目前为止,我最好的结果是合并文件,但它也导致了这个问题。 So maybe the issue is coming from the merging process I am using.所以也许问题出在我正在使用的合并过程中。 What would be ideal is to be able to just insert the table with styles and everything using PHPWord, but I haven't been able to and haven't found any examples on how to do that.理想的情况是能够插入带有 styles 的表格以及使用 PHPWord 的所有内容,但我还没有能够也没有找到任何关于如何做到这一点的示例。

Another option that I've seen is this library , but the merge features is only in the license that's $599 USD, and since I am pretty close to solving this, I am not sure if it would solve my issue.我见过的另一个选择是这个库,但合并功能仅在 599 美元的许可证中,而且由于我非常接近解决这个问题,我不确定它是否能解决我的问题。 If it does, I'd invest in it since I need to get this done ASAP, but I wanted to check with you guys what your recommendations would be for this case.如果是这样,我会投资它,因为我需要尽快完成这项工作,但我想与你们核实你们对这种情况的建议。 Maybe another merging library or doing everything via PHPWord.也许另一个合并库或通过 PHPWord 做所有事情。

Help is appreciated!感谢帮助!

After a lot of attempts to fix it, I wasn't able to achieve what I wanted with PHPWord and the merging library I mentioned.在多次尝试修复它之后,我无法使用 PHPWord 和我提到的合并库实现我想要的。

Since I needed to fix this I decided to invest in the paid library I mentioned in my question.因为我需要解决这个问题,所以我决定投资我在问题中提到的付费图书馆。 It was an expensive purchase, but for those who are interested, it does exactly what was required and it does it perfectly.这是一笔昂贵的购买,但对于那些感兴趣的人来说,它完全符合要求,而且做得很完美。

The two main functions I required were document merging and importing of content to a.docx file.我需要的两个主要功能是文档合并和将内容导入 .docx 文件。

So I had to purchase the Premium package. Once there, the library literally does everything for you.所以我不得不购买 Premium package。一旦到了那里,图书馆就会为你做所有的事情。

Example for docx files merge code: docx 文件合并代码示例:

require_once 'classes/MultiMerge.php';

$merge = new MultiMerge();

$merge->mergeDocx('document.docx', array('second.docx', 'other.docx'), 'output.docx', array());

Example for how to import a table from another docx file如何从另一个 docx 文件导入表格的示例

require_once 'classes/CreateDocx.php';

$docx = new CreateDocxFromTemplate('document.docx');

// import tables
$referenceNode = array(
    'type' => 'table',
);

$docx->importContents('document_1.docx', $referenceNode);

$docx->createDocx('output');

As you can see it is pretty easy.如您所见,这很容易。 This answer is by no means an ad for this library, but for those that have the same problem as me, this is a life saver.这个答案绝不是这个图书馆的广告,但对于那些和我有同样问题的人来说,这是一个救命稻草。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM