I am extracting text with respect to tags and I need to get them in a list form wrt p tags. I have this xpath expression as:
find = etree.XPath("//w:p//.//*[local-name() = 'ins']//text()" ,namespaces={'w':"http://schemas.openxmlformats.org/wordprocessingml/2006/main"})
And i want to use it in a findall
expression. I tried:
inserted_list_1=[]
for p in lxml_tree.findall('.//{' + w + '}p'):
inserted_list_1.append([t.text for t in p.findall('.//{' + w + '}ins')])
but all this returns is a list full of None
values whilst the former xpath works perfectly.
I think there's some intermediate path missing.
You cannot use that expression with findall()
; the findall()
method deliberately keeps compatibility with the limited ElementTree API XPath support .
Use the xpath()
method instead:
for p in lxml_tree.xpath('.//w:p', namespaces={'w': w}):
and just use namespace prefixes for much more readable queries.
If you just wanted to extract all contained text, you can use:
[t for t in p.xpath('../w:p//w:ins//text()',namespaces={'w': w})]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.