[英]How to match a text node then follow parent nodes using XPath
我试图用XPath解析一些HTML。 按照下面简化的XML示例,我想匹配字符串'Text 1',然后获取相关content
节点的content
。
<doc>
<block>
<title>Text 1</title>
<content>Stuff I want</content>
</block>
<block>
<title>Text 2</title>
<content>Stuff I don't want</content>
</block>
</doc>
我的Python代码抛出一个摇摆不定的:
>>> from lxml import etree
>>>
>>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff
I want</content></block><block><title>Text 2</title><content>Stuff I d
on't want</content></block></doc>")
>>>
>>> # get all titles
... tree.xpath('//title/text()')
['Text 1', 'Text 2']
>>>
>>> # match 'Text 1'
... tree.xpath('//title/text()="Text 1"')
True
>>>
>>> # Follow parent from selected nodes
... tree.xpath('//title/text()/../..//text()')
['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
>>>
>>> # Follow parent from selected node
... tree.xpath('//title/text()="Text 1"/../..//text()')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
lxml/lxml.etree.c:14542)
File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
ll__ (src/lxml/lxml.etree.c:90093)
File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
e_result (src/lxml/lxml.etree.c:89446)
File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
_eval_error (src/lxml/lxml.etree.c:89281)
lxml.etree.XPathEvalError: Invalid type
这在XPath中可行吗? 我是否需要以不同的方式表达我想要做的事情?
你想要那个吗?
//title[text()='Text 1']/../content/text()
用途 :
string(/*/*/title[. = 'Text 1']/following-sibling::content)
与目前公认的JohannesWeiß解决方案相比, 这至少代表了两项改进 :
避免 使用非常昂贵的缩写“//” (通常导致整个XML文档被扫描) ,因为无论何时预先知道XML文档的结构,都应该这样做。
没有返回到父级 (避免位置步骤“/ ..”)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.