在 Python 中使用 XPath 和 LXML

Question

I have a python script used to parse XMLs and export into a csv file certain elements of interest.我有一个 python 脚本，用于解析 XML 并将某些感兴趣的元素导出到 csv 文件中。 I have tried to now change the script to allow the filtering of an XML file under a criteria, the equivalent XPath query would be:我现在尝试更改脚本以允许根据条件过滤 XML 文件，等效的 XPath 查询将是：

\DC\Events\Confirmation[contains(TransactionId,"GTEREVIEW")]

When I try to use lxml to do so, my code is:当我尝试使用 lxml 这样做时，我的代码是：

xml_file = lxml.etree.parse(xml_file_path)
namespace = "{" + xml_file.getroot().nsmap[None] + "}"
node_list = xml_file.findall(namespace + "Events/" + namespace + "Confirmation[TransactionId='*GTEREVIEW*']")

But this doesn't seem to work.但这似乎不起作用。 Can anyone help?任何人都可以帮忙吗？ Example of XML file: XML 文件示例：

<Events>
  <Confirmation>
    <TransactionId>GTEREVIEW2012</TransactionId>
  </Confirmation>    
  <Confirmation>
    <TransactionId>GTEDEF2012</TransactionId>
  </Confirmation>    
</Events>

So I want all "Confirmation" nodes that contain a transaction Id which includes the string "GTEREVIEW".所以我想要所有包含交易 ID 的“确认”节点，其中包含字符串“GTEREVIEW”。 Thanks谢谢

Answer 1

findall() doesn't support XPath expressions, only ElementPath (see https://web.archive.org/web/20200504162744/http://effbot.org/zone/element-xpath.htm ). findall()不支持 XPath 表达式，只支持ElementPath （参见https://web.archive.org/web/20200504162744/http://effbot.org/zone/element-xpath.htm ）。 ElementPath doesn't support searching for elements containing a certain string. ElementPath 不支持搜索包含特定字符串的元素。

Why don't you use XPath?为什么不使用 XPath？ Assuming that the file test.xml contains your sample XML, the following works:假设文件test.xml包含您的示例 XML，以下工作：

> python
Python 2.7.9 (default, Jun 29 2016, 13:08:31) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> from lxml import etree
>>> tree=etree.parse("test.xml")
>>> tree.xpath("Confirmation[starts-with(TransactionId, 'GTEREVIEW')]")
[<Element Confirmation at 0x7f68b16c3c20>]

If you insist on using findall() , the best you can do is get the list of all Confirmation elements having a TransactionId child node:如果你坚持使用findall() ，你能做的最好的事情就是获取所有具有TransactionId子节点的Confirmation元素的列表：

>>> tree.findall("Confirmation[TransactionId]")
[<Element Confirmation at 0x7f68b16c3c20>, <Element Confirmation at 0x7f68b16c3ea8>]

You then need to filter this list manually, eg:然后您需要手动过滤此列表，例如：

>>> [e for e in tree.findall("Confirmation[TransactionId]")
     if e[0].text.startswith('GTEREVIEW')]
[<Element Confirmation at 0x7f68b16c3c20>]

If your document contains namespaces, the following will get you all Confirmation elements having a TransactionId child node, provided that the elements use the default namespace (I used xmlns="file:xyz" as the default namespace):如果您的文档包含命名空间，以下内容将为您提供具有TransactionId子节点的所有Confirmation元素，前提是这些元素使用默认命名空间（我使用xmlns="file:xyz"作为默认命名空间）：

>>> tree.findall("//{{{0}}}Confirmation[{{{0}}}TransactionId]".format(tree.getroot().nsmap[None]))
[<Element {file:xyz}Confirmation at 0x7f534a85d1b8>, <Element {file:xyz}Confirmation at 0x7f534a85d128>]

And there is of course etree.ETXPath :当然还有etree.ETXPath ：

>>> find=etree.ETXPath("//{{{0}}}Confirmation[starts-with({{{0}}}TransactionId, 'GTEREVIEW')]".format(tree.getroot().nsmap[None]))
>>> find(tree)
[<Element {file:xyz}Confirmation at 0x7f534a85d1b8>]

This allows you to combine XPath and namespaces.这允许您组合 XPath 和名称空间。

Answer 2

//Confirmation[TransactionId[contains(.,'GTEREVIEW')]]


father_tag[child_tag]  # select father_tag that has child_tag
[child_tag[filter]]    # select select child tag which match filter
[filter]

在 Python 中使用 XPath 和 LXML

问题描述

2 个解决方案

解决方案1
6 已采纳 2016-11-16 08:09:33

解决方案2
0 2016-11-16 08:33:11

在 Python 中使用 XPath 和 LXML

问题描述

2 个解决方案

解决方案1 6 已采纳 2016-11-16 08:09:33

解决方案2 0 2016-11-16 08:33:11

解决方案1
6 已采纳 2016-11-16 08:09:33

解决方案2
0 2016-11-16 08:33:11