I have the following python code
import lxml.etree
root = lxml.etree.parse("../../xml/test.xml")
path="./pages/page/paragraph[contains(text(),'ash')]"
para = root.xpath(path)
once i reach the para node, i dont want to go any further. Now i want to travel back up to the root and look at all of the <paragraph>
nodes. Is there a way to travel back up the tree.
Or look at it this way. I want the subtree between root
and para
. How would i do that?
For reference, here is the xml
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
now in this case, i want the nodes XBV and GFH. How is that possible?
..
would bring you one level up the tree.
But, I think preceding
is something you are looking for:
The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.
./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph
Sample code:
import lxml.etree
data = """
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
"""
tree = lxml.etree.fromstring(data)
print [item.text for item in tree.xpath("./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph")]
Prints:
['XBV', 'GFH']
Go up and get all previous page
(only page) node and paragraph
node inside them and extract text from them-
>>>expresson = "./pages/page/paragraph[contains(text(),'ash')]//preceding::page//paragraph"
>>>x= [i.text for i in expresson]
>>>['XBV', 'GFH']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.