简体   繁体   中英

Extract value with XPath, etree and python

I try to extract a value with XPath, Python and etree. I have no influence on the .xml file I receive and I think it seems to be somehow invalid.

My method already extracts the text node object I want to examine.

# This is the tag.
textTag = lastExportTree.xpath("//TEXT_NODE[@PROPERTY = '%s']/TEXT[@ID = '%s']" % (key, id[1]))

# This is a part of the xml. I already have the text node I want to examine.
<TEXT ID="1001" STATE="5" LOCKED="false"><SYSTEMMESSAGE>CALBUY</SYSTEMMESSAGE>Hiho</TEXT>
<TEXT ID="1002" STATE="1" LOCKED="false"/>
<TEXT ID="1003" STATE="5" LOCKED="false">Stack</TEXT>
<TEXT ID="1004" STATE="1" LOCKED="false">Overflow</TEXT>

If I want to access the content of ID="1003" I only have to type:

print(textTag.text); # Will print 'Stack'

But the tag with ID="1001" also includes the SYSTEMMESSAGE Tag. How can I access the content 'HiHo'? (textTag.text won't work!) Is this invalid xml what I receive?

Thank you a lot for your answer!

I've encountered this problem before as well, and this is what we ended up with. In our case we were interested in finding the text in all the non-script and non-style children of an element.

# Just to pre-compile our XPath. This will get all the text from this element from
# each of the child elements that aren't 'script' or 'style'
textXpath = etree.XPath(
    '(.|.//*[not(name()="script")][not(name()="style")])/text()')

# If instead you don't want to include the current element:
# textXpath = etree.XPath(
#   './/*[not(name()="script")][not(name()="style")]/text()')

results = ''.join(textXpath(textTag))

It might not be the prettiest chunk of code, but it's what we've resorted to.

Assuming you are showing us the nodes under lastExportTree, this should do it:

lastExportTree.xpath('TEXT[@STATE="5" and @LOCKED="false" and SYSTEMMESSAGE]/text()')[0]

That says to find all child nodes named TEXT that have the given STATE and LOCKED attributes and a SYSTEMMESSAGE child element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM