I have an XML file that looks like this:
xml = '''<?xml version="1.0"?>
<root>
<item>text</item>
<item2>more text</item2>
<targetroot>
<targetcontainer>
<target>text i want to get</target>
</targetcontainer>
<targetcontainer>
<target>text i want to get</target>
</targetcontainer>
</targetroot>
...more items
</root>
'''
With lxml I'm trying to acces the text in the element < target >. I've found a solution, but I'm sure there is a better, more efficient way to do this. My solution:
target = etree.XML(xml)
for x in target.getiterator('root'):
item1 = x.findtext('item')
for target in x.iterchildren('targetroot'):
for t in target.iterchildren('targetcontainer'):
targetText = t.findtext('target')
Although this works, as it gives me acces to all the elements in root as well as the target element, I'm having a hard time believing this is the most efficient solution.
So my question is this: is there a more efficient way to access the < target >'s texts while staying in the loop of root, because I also need access to the other elements.
You can use XPath :
for x in target.xpath('/root/targetroot/targetcontainer/target'):
print x.text
We ask all elements that match a path . In this case, the path is /root/targetroot/targetcontainer/target
, which means
all the
<target>
elements that are inside a<targetcontainer>
element, inside a<targetroot>
element, inside a<root>
element. Also, the<root>
element should be the document root because it is preceded by/
, which means the beginning of the document.
Also, your XML document had two problems. First, the <?xml version="1.0"?>
declaration should be the very first thing in the document - and in this example it is preceded by a newline and some space. Also, it is not a tag and should not be closed, so the </xml>
at the end of your string should be removed. I already edited your question anyway.
EDIT : this solution can be improved yet. You do not need to pass all the path - you can just ask to all elements <target>
inside the document. This is done by preceding the tag name by two slashes. Since you want all the <target>
texts, independent of where they are, this can be a better solution. So, the loop above can be written just as:
for x in target.xpath('//target'):
print x.text
I tried it at first but it did not worked. The problem, however, was the syntax problems in the XML, not the XPath, but I tried the other, longer path and forgot to retry this one. Sorry! Anyway, I hope I put some light about XPath nonetheless :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.