Get inner xml from lxml

Question

I have the following string which is part of an bigger XML Document:

content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'

And I want to access Rathaus . My current approach is to parse it with lxml and trying to access the text of the element 'odvNameElem' :

from lxml import etree
content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'
root = etree.fromstring(content)
print(root.text)

This however results in None. What am I doing wrong?

etree.__version__ = '4.2.5'

I am not sure why the following works: root.xpath("string()") but root.xpath("//text()") only returns an empty list. Can somebody please explain this?

Answer 1

The "Rathaus" string is the value of the tail property of the itdMapItemList element. Examples:

root.xpath("itdMapItemList")[0].tail
root.find("itdMapItemList").tail

See https://lxml.de/tutorial.html#elements-contain-text .

root.xpath("string()") returns the concatenation of the string values of the root node and its descendants, which indeed is "Rathaus" in this case.

See https://www.w3.org/TR/xpath-10/#function-string .

root.xpath("//test") does not make sense (there is no test element). Did you mean root.xpath("//text()") ?

root.xpath("//text()") returns a list of all text nodes, which in this case is ['Rathaus'] .

If the input XML is changed to

<odvNameElem stopID="9001002">ABC<itdMapItemList/>Rathaus</odvNameElem>

then the result is ['ABC', 'Rathaus']

Get inner xml from lxml

Question

1 answers

solution1
2 ACCPTED 2019-01-05 06:57:43

Get inner xml from lxml

Question

1 answers

solution1 2 ACCPTED 2019-01-05 06:57:43

solution1
2 ACCPTED 2019-01-05 06:57:43