[英]Convert PDF to Word using Aspose
Please note that by default every visually grouped block of text in the original PDF file is converted into a textbox in the resulting document.请注意,默认情况下,原始 PDF 文件中每个视觉分组的文本块都将转换为结果文档中的文本框。 This achieves maximal resemblance of the output document to the original PDF file.这实现了输出文档与原始 PDF 文件的最大相似度。 The output document will look good, but it will consist entirely of textboxes and it could make further editing of the document in Microsoft Word quite difficult.输出文档看起来不错,但它将完全由文本框组成,并且可能会使在 Microsoft Word 中进一步编辑文档变得非常困难。
Please use the Flow recognition mode for getting output without boundary boxes:请使用流识别模式获取无边界框的输出:
// Load source PDF file
Document doc = new Document( dataDir + "input.pdf");
// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
// Set recognition mode
saveOptions.setMode(RecognitionMode.Flow);
// Save resultant DOCX file
doc.save( dataDir + "output.docx", saveOptions);
In this mode the engine performs grouping and multi-level analysis to restore the original document author's intent and produce a maximally editable document.在此模式下,引擎执行分组和多级分析以恢复原始文档作者的意图并生成最大程度可编辑的文档。 The downside is that the output document might look different from the original PDF file.缺点是输出文档可能与原始 PDF 文件不同。
We hope this will be helpful.我们希望这会有所帮助。 Please feel free to contact if you need any further assistance.如果您需要任何进一步的帮助,请随时联系。
PS: I work with Aspose as Developer Evangelist. PS:我与 Aspose 合作,担任开发人员布道者。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.