[英]python lxml xpath: how to get this predicate working
Good Morning, 早上好,
Recently I picked up python and web scraping as a hobby ... 最近,我将python和网络抓取作为一种爱好...
I'm trying to get my head around an issue with python lxml and xpath predicates but alas - apparently there's nothing similar on stackoverflow. 我正在设法解决python lxml和xpath谓词的问题,但可惜-显然stackoverflow上没有类似之处。 So I managed to reproduce in below code, hoping someone sees what I don't ... 所以我设法在下面的代码中重现,希望有人看到我不知道的东西...
Is there somebody who can explain why the Result3 is an empty list? 有谁可以解释为什么Result3是一个空列表? I was expecting Result3 to be the same as Result1. 我期望Result3与Result1相同。
How can I achieve that Result3 = Result1 ? 如何获得Result3 = Result1?
Versions: Python 3.7.3, lxml 4.4.0 (installed using pip, not Christoph Gohlke's binary) on an AMD windows machine. 版本:Python 3.7.3,lxml 4.4.0(使用pip而非Christoph Gohlke的二进制文件安装)在AMD Windows计算机上。
Thanks in advance! 提前致谢!
Stef 斯蒂夫
import lxml.html
simple_record = """<a href="some_map/some_file.png">dododo</a>"""
tree = lxml.html.fromstring(simple_record)
simple_xpath = "@href"
found_field = tree.xpath(simple_xpath)
print("Result1 = {}".format(found_field))
simple_xpath = """contains(@href,"some_file")"""
found_field = tree.xpath(simple_xpath)
print("Result2 = {}".format(found_field))
simple_xpath = """@href[contains(@href,"some_file")]"""
found_field = tree.xpath(simple_xpath)
print("Result3 = {}".format(found_field))
Actual output: 实际输出:
Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = []
Expected output: 预期产量:
Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = ['some_map/some_file.png']
Your predicate in the third example ( @href[contains(@href,"some_file")]
), translated into English means "find a node in simple_record
which has an attribute href
which itself has an attribute href
which has an attribute value containing the string some_file
". 您在第三个示例中的谓词( @href[contains(@href,"some_file")]
)译为英文,意味着“在simple_record
找到一个具有属性href
的节点,该节点本身具有属性href
,该属性href
的属性值包含字符串some_file
“。 Such node doesn't exist, so an empty result list is returned. 该节点不存在,因此返回空结果列表。
What you intended to ask, in English, is "find a node in simple_record
which has an attribute href
which has a value containing the string some_file
" (Thanks @DanielHaley!) . 用英语想问的是“在simple_record
找到一个具有属性href
的节点,该属性的值包含字符串some_file
”(谢谢@DanielHaley!)。 Translated into xpath, you would write it as 转换为xpath,您可以将其写为
simple_xpath = '@href[contains(.,"some_file")]'
The .
的.
now refers back to the context node that is being filtered by the predicate (ie the @href
attribute itself). 现在返回引用由谓词过滤的上下文节点(即@href
属性本身)。 That expression would cause Result 3 to be the same as Result 1. 该表达式将导致结果3与结果1相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.