简体   繁体   English

Python LXML目录查找

[英]Python LXML catalog lookup

I'm making an SCons file for building Docbook documentation. 我正在制作一个用于构建Docbook文档的SCons文件。 In order to trace dependencies I would like some way to resolve catalog file lookups to an absolute path to a file. 为了跟踪依赖关系,我想以某种方式将目录文件查找解析为文件的绝对路径。

So say I have a bit of Docbook XML : 假设我有一些Docbook XML:

<book xmlns="http://docbook.org/ns/docbook"
      xmlns:xi="http://www.w3.org/2001/XInclude">

  <info> 
    <title>Docbook example document</title>

    <xi:include href="file:///common/logo.xml"
        xpointer="logo"/>

  </info>
  <xi:include href="chap1/chap1.xml"/>
  <xi:include href="chap2/chap2.xml"/>
  <xi:include href="chap3/chap3.xml"/>
  <xi:include href="chap4/chap4.xml"/>

</book>

and a catalog.xml file : 和catalog.xml文件:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

  <rewriteURI
    uriStartString="file:///stylesheet/"
    rewritePrefix="file:///home/kst/svn/TOOLS/Docbook/stylesheet/" />

  <rewriteURI
    uriStartString="file:///common/"
    rewritePrefix="file:///home/kst/svn/TOOLS/Docbook/common/" />


  <nextCatalog  catalog="/etc/xml/catalog" />

</catalog>

Getting the xinclude href string is no problem using lxml but I'm stuck there. 使用lxml获取xinclude href字符串没有问题,但是我被困在那里。 What I need is some way to get the absolute filename that file:///common/logo.xml resolves to (in this case /home/kst/svn/TOOLS/Docbook/common/logo.xml) from the catalog file. 我需要某种方法来从目录文件中获取file:///common/logo.xml解析为的绝对文件名(在本例中为/home/kst/svn/TOOLS/Docbook/common/logo.xml)。 It needs to be some kind of Python code so I can use it in my SConstruct file without too much hassle. 它必须是某种Python代码,因此我可以在我的SConstruct文件中使用它而不必太麻烦。

Any help is appreciated. 任何帮助表示赞赏。

Lxml uses the catalog support from libxml2 . Lxml使用libxml2中的目录支持。 Use the environment variable XML_CATALOG_FILES to provide a list of catalogs (you could set this from python as well, using os.environ ), or, if this variable is not present, it checks for the existence of /etc/xml/catalog (can't use this one on windows of course). 使用环境变量XML_CATALOG_FILES提供目录列表(您也可以使用os.environ从python进行设置),或者,如果不存在此变量,则检查是否存在/etc/xml/catalog (可以当然不要在Windows上使用它)。

An alternative would be to use a custom URI resolver. 一种替代方法是使用自定义URI解析器。 You can find more information in the lxml docs 您可以在lxml文档中找到更多信息。

EDIT: apparently, the question was not about the actual xinclude processing, which works, but about a way to "query" the catalog, or ask it for the actual filenames that would be used for the inclusions. 编辑:显然,问题不在于有效的实际xinclude处理,而是关于“查询”目录或向其询问将用于包含的实际文件名的方法。

Lxml (at least currently) has no API to do that. Lxml(至少目前)没有API可以做到这一点。 The underlying libxml2 library does support this, however, and the "original" libxml2 python bindings allow you to do this (easy documentation is lacking though, the docstrings in the source code of the libxml2 help, however). 底层的libxml2库确实支持此功能,并且“原始” libxml2 python绑定允许您执行此操作(尽管缺少简单的文档,但是libxml2的源代码中的文档字符串有所帮助)。 So, although this module is not nearly as nice to use than lxml, it seems to be your best bet. 因此,尽管该模块的使用效果不如lxml好,但这似乎是您最好的选择。 Example which seems to work: 似乎可行的示例:

>>> import libxml2
>>> libxml2.loadCatalog('catalog.xml')
>>> print libxml2.catalogResolveURI('file:///common/logo.xml')
file:///home/kst/svn/TOOLS/Docbook/common/logo.xml

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM