简体   繁体   中英

lxml.etree not returning proper xpath value

I have an xml string like this

<description> asdasdasd <a> Item1 </a><a> Price </a></description>

i'm using lxml.etree as follows:

import lxml.etree as le
doc=le.fromstring("<description>asdasdasd <a>Item1</a> <a>Price</a> </description>")
desc = doc.xpath("//description")[0]
print desc.text

But desc.text is returning only asdasdasd . I was expecting asdasdasd Item1 Price . Is there any issue with my codes?

Here's one way to do it:

print desc.text + ' '.join(child.text for child in desc)

prints:

asdasdasd Item1 Price

Another option is to use descendant-or-self xpath trick:

desc = doc.xpath("//description/descendant-or-self::*")
print ' '.join(child.text for child in desc)

prints:

asdasdasd  Item1 Price

No, you have to see that as a tree (that's why lxml.etree )

An xml node can, by definition, have a text and some attributes and other nodes inside (see this )

|--> description
      |--> a
      |--> a

Maybe this helps understand:

import lxml.etree as le
doc=le.fromstring("<description>asdasdasd <a>Item1</a> <a>Price</a> </description>")
desc = doc.xpath("//description")[0]
print desc.text
for child in desc:
  print child.text

That outputs:

asdasdasd 
Item1
Price

The idea behind XML is to try to model instances (more or less). In your case, you have a description object with two a objects inside it (could be a list, for instance)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM