简体   繁体   English

在Marklogic中将PDF转换为XML

[英]PDF to XML Convert in Marklogic

We are trying to convert a PDF to XML using the following command 我们正在尝试使用以下命令将PDF转换为XML

xquery version "1.0-ml";
let $results := xdmp:pdf-convert(
xdmp:document-get("d:\CFR-2010-title48-vol1.pdf"), "CFR-2010-title48-vol1.xml" ),
$manifest := $results[1]
return $results

But it didnt generate the XML output for the PDF. 但是它没有为PDF生成XML输出。 It generated the following output files. 它生成了以下输出文件。

<parts xmlns="xdmp:pdf-convert"> <part>CFR-2010-title48-vol1_xml.xhtml</part> <part>CFR-2010-title48-vol1_xml_parts/01_00.jpg</part> <part>CFR-2010-title48-vol1_xml_parts/01_01.jpg</part> <part>CFR-2010-title48-vol1_xml_parts/conv.css</part> <part>CFR-2010-title48-vol1_xml_parts/toc.txt</part> </parts>

Can you please suggest how to generate the XML output for given PDF file? 您能否建议如何为给定的PDF文件生成XML输出?

Thanks 谢谢

Venkat 文卡特

The first document returned is XML . 返回的第一个文档 XML

Were you looking to get the DocBook? 您要获取DocBook吗? For that you need to run the entire upconversion process, and the easiest way to do that is to run the document through the CPF conversion application, which runs through a series of steps and inferences to get to that point. 为此,您需要运行整个上转换过程,而最简单的方法是通过CPF转换应用程序运行文档,该应用程序将通过一系列步骤和推断来实现这一点。

Or: Are you wondering why the name in the part doesn't match the name from the second parameter to xdmp:pdf-convert ? 或者:您是否想知道为什么零件中的名称与第二个参数中xdmp:pdf-convert的名称不匹配? The second parameter is just used to adjust the generated hrefs to images; 第二个参数仅用于将生成的href调整为图像; it is not used for the conversion output itself. 它不用于转换输出本身。

Or: If you want to target XML of some other kind (not XHTML ) directly from the format conversion of xdmp:pdf-convert , you can apply a different configuration file. 或者:如果您想直接通过xdmp:pdf-convert的格式转换来定位其他类型的XML (非XHTML ),则可以应用其他配置文件。 See the documentation on that function for more details. 有关更多详细信息,请参见该功能的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM