简体   繁体   English

python lxml xpath:如何使此谓词正常工作

[英]python lxml xpath: how to get this predicate working

Good Morning, 早上好,

Recently I picked up python and web scraping as a hobby ... 最近,我将python和网络抓取作为一种爱好...

I'm trying to get my head around an issue with python lxml and xpath predicates but alas - apparently there's nothing similar on stackoverflow. 我正在设法解决python lxml和xpath谓词的问题,但可惜-显然stackoverflow上没有类似之处。 So I managed to reproduce in below code, hoping someone sees what I don't ... 所以我设法在下面的代码中重现,希望有人看到我不知道的东西...

Is there somebody who can explain why the Result3 is an empty list? 有谁可以解释为什么Result3是一个空列表? I was expecting Result3 to be the same as Result1. 我期望Result3与Result1相同。

How can I achieve that Result3 = Result1 ? 如何获得Result3 = Result1?

Versions: Python 3.7.3, lxml 4.4.0 (installed using pip, not Christoph Gohlke's binary) on an AMD windows machine. 版本:Python 3.7.3,lxml 4.4.0(使用pip而非Christoph Gohlke的二进制文件安装)在AMD Windows计算机上。

Thanks in advance! 提前致谢!

Stef 斯蒂夫

import lxml.html

simple_record  = """<a href="some_map/some_file.png">dododo</a>"""
tree           = lxml.html.fromstring(simple_record)

simple_xpath   = "@href"
found_field    = tree.xpath(simple_xpath)
print("Result1 = {}".format(found_field))

simple_xpath   = """contains(@href,"some_file")"""
found_field    = tree.xpath(simple_xpath)
print("Result2 = {}".format(found_field))

simple_xpath   = """@href[contains(@href,"some_file")]"""
found_field    = tree.xpath(simple_xpath)
print("Result3 = {}".format(found_field))

Actual output: 实际输出:

Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = []

Expected output: 预期产量:

Result1 = ['some_map/some_file.png']
Result2 = True
Result3 = ['some_map/some_file.png']

Your predicate in the third example ( @href[contains(@href,"some_file")] ), translated into English means "find a node in simple_record which has an attribute href which itself has an attribute href which has an attribute value containing the string some_file ". 您在第三个示例中的谓词( @href[contains(@href,"some_file")] )译为英文,意味着“在simple_record找到一个具有属性href的节点,该节点本身具有属性href ,该属性href的属性值包含字符串some_file “。 Such node doesn't exist, so an empty result list is returned. 该节点不存在,因此返回空结果列表。

What you intended to ask, in English, is "find a node in simple_record which has an attribute href which has a value containing the string some_file " (Thanks @DanielHaley!) . 用英语想问的是“在simple_record找到一个具有属性href的节点,该属性的值包含字符串some_file ”(谢谢@DanielHaley!)。 Translated into xpath, you would write it as 转换为xpath,您可以将其写为

simple_xpath   = '@href[contains(.,"some_file")]'

The . . now refers back to the context node that is being filtered by the predicate (ie the @href attribute itself). 现在返回引用由谓词过滤的上下文节点(即@href属性本身)。 That expression would cause Result 3 to be the same as Result 1. 该表达式将导致结果3与结果1相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM