简体   繁体   中英

Python lxml: How to traverse back up a tree

I have the following python code

import lxml.etree

root = lxml.etree.parse("../../xml/test.xml")

path="./pages/page/paragraph[contains(text(),'ash')]"
para = root.xpath(path)

once i reach the para node, i dont want to go any further. Now i want to travel back up to the root and look at all of the <paragraph> nodes. Is there a way to travel back up the tree.

Or look at it this way. I want the subtree between root and para . How would i do that?

For reference, here is the xml

<document>
    <pages>
        <page>
            <paragraph>XBV</paragraph>
            <paragraph>GFH</paragraph>
        </page>
        <page>
            <paragraph>ash</paragraph>
            <paragraph>lplp</paragraph>
        </page>
    </pages>
</document>

now in this case, i want the nodes XBV and GFH. How is that possible?

.. would bring you one level up the tree.

But, I think preceding is something you are looking for:

The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.

./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph

Sample code:

import lxml.etree


data = """
<document>
    <pages>

    <page>
       <paragraph>XBV</paragraph>

       <paragraph>GFH</paragraph>
    </page>

    <page>
       <paragraph>ash</paragraph>

       <paragraph>lplp</paragraph>
    </page>

    </pages>
</document>
"""

tree = lxml.etree.fromstring(data)
print [item.text for item in tree.xpath("./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph")]

Prints:

['XBV', 'GFH']

Go up and get all previous page (only page) node and paragraph node inside them and extract text from them-

>>>expresson = "./pages/page/paragraph[contains(text(),'ash')]//preceding::page//paragraph"
>>>x=  [i.text for i in expresson]
>>>['XBV', 'GFH']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM