简体   繁体   中英

Get inner xml from lxml

I have the following string which is part of an bigger XML Document:

content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'

And I want to access Rathaus . My current approach is to parse it with lxml and trying to access the text of the element 'odvNameElem' :

from lxml import etree
content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'
root = etree.fromstring(content)
print(root.text)

This however results in None. What am I doing wrong?

etree.__version__ = '4.2.5'

I am not sure why the following works: root.xpath("string()") but root.xpath("//text()") only returns an empty list. Can somebody please explain this?

The "Rathaus" string is the value of the tail property of the itdMapItemList element. Examples:

root.xpath("itdMapItemList")[0].tail
root.find("itdMapItemList").tail

See https://lxml.de/tutorial.html#elements-contain-text .


root.xpath("string()") returns the concatenation of the string values of the root node and its descendants, which indeed is "Rathaus" in this case.

See https://www.w3.org/TR/xpath-10/#function-string .


root.xpath("//test") does not make sense (there is no test element). Did you mean root.xpath("//text()") ?

root.xpath("//text()") returns a list of all text nodes, which in this case is ['Rathaus'] .

If the input XML is changed to

<odvNameElem stopID="9001002">ABC<itdMapItemList/>Rathaus</odvNameElem>

then the result is ['ABC', 'Rathaus']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM