Python lxml XPath：前面的关键字未给出预期的结果

Question

i am trying to parse an xml document as follows 我正在尝试解析一个xml文档，如下所示

import re
from lxml.html.soupparser import fromstring

inString = """
<doc>

<q></q>

<p1>
    <p2 dd="ert" ji="pp">

        <p3>1</p3>
        <p3>2</p3>
        <p3>ABC</p3>
        <p3>3</p3>

     </p2>

     <p2 dd="ert" ji="pp">

        <p3>4</p3>
        <p3>5</p3>
        <p3>ABC</p3>
        <p3>6</p3>

     </p2>

</p1>
<r></r>
<p1>
    <p2 dd="ert" ji="pp">

        <p3>7</p3>
        <p3>8</p3>
        <p3>ABC</p3>
        <p3>9</p3>

     </p2>

     <p2 dd="ert" ji="pp">

        <p3>10</p3>
        <p3>11</p3>
        <p3>ABC</p3>
        <p3>12</p3>

     </p2>

</p1>
</doc>
"""
root = fromstring(inString)

nodes = root.xpath("./doc//p1/p2/p3[contains(text(),'ABC')]//preceding::p2//p3")

print " ".join([re.sub('[\s+]', ' ', para.text.encode('utf-8').strip()) for para in nodes])

so, for each <p1> tag, i want to get to <p3> tags inside <p2> . 因此，对于每个<p1>标签，我想进入<p2>内部的<p2> <p3>标签。 Then i only want the <p3> tags upto tag having text like ABC . 然后我只希望<p3>标签最多具有ABC文本标签。 however, if i run the above code, i get 但是，如果我运行上面的代码，我得到

1 2 ABC 3 4 5 ABC 6 7 8 ABC 9

desired output is 所需的输出是

1 2 4 5 7 8 10 11

also, if i make this change 另外，如果我进行更改

nodes = root.xpath("./doc//p1/p2/p3[contains(text(),'ABC')]")

i get 我得到

ABC ABC ABC ABC

so looks like the second approach is able to get all the <p3> nodes from the entire document as per the xpath, which is fine. 因此，看起来第二种方法能够按照xpath从整个文档中获取所有<p3>节点，这很好。 why doesn't my first query work? 为什么我的第一个查询不起作用？

how do i get the desired output? 我如何获得所需的输出？

Answer 1

Once you've located the p3 containing ABC , you don't need to get up the tree - just go "sideways" using the preceding-sibling : 找到包含ABC的p3 ，您无需上树-只需使用preceding-sibling “横向”即可：

./doc//p1/p2/p3[contains(text(),'ABC')]/preceding-sibling::p3

Prints 1 2 4 5 7 8 10 11 . 打印1 2 4 5 7 8 10 11 。

Python lxml XPath：前面的关键字未给出预期的结果

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-12-02 17:51:27

Python lxml XPath：前面的关键字未给出预期的结果

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-12-02 17:51:27

解决方案1
1 已采纳 2015-12-02 17:51:27