繁体   English   中英

Python,将数据从列表传递到lxml中的xpath

[英]Python, passing data from a list into xpath, in lxml

lxml中无法将数据从列表传递到xpath中。

请参阅以下这些行:

formatted_xpath = str(xpath_string[x])
current_data = xpath_html.xpath("string(formatted_xpath)")
# current_data = xpath_html.xpath("string(//*[@class='title-container clearfix']/h1)")

如果我使用它,并且在def brandScrapePage中使用硬编码的xpath,则可以正常工作:

current_data = xpath_html.xpath("string(//*[@class='title-container clearfix']/h1)")

但是,如果我使用formatted_xpath变量,它将不起作用。 找不到任何东西。 我知道formatted_xpath是正确的。 打印时看起来不错。 不知道是什么问题。

码:

xpath_string = ["//*[@class='title-container clearfix']/h2"]


def pageSource(line):
    response = urllib2.urlopen(line)
    html = response.read()
    html = html.decode('utf-8', 'ignore').encode('utf-8')
    return html


def brandScrapePage(brand_html, x):
    xpath_html = etree.HTML(brand_html)
    formatted_xpath = str(xpath_string[x])
    current_data = xpath_html.xpath("string(formatted_xpath)")
    # current_data = xpath_html.xpath("string(//*[@class='title-container clearfix']/h1)")

def main():

    f = open(home_dir + url_file).readlines()
    for line in f:
        for x in xrange(0, len(field_names)):
            current_html = pageSource(line)
            brandScrapePage(current_html, x)


if __name__ == '__main__':
    main()

问题是您基本上拥有一个formatted_xpath字符串,但打算创建一个占位符并使用实际值格式化该字符串:

current_data = xpath_html.xpath("string(%s)" % formatted_xpath)

或者,使用format()

current_data = xpath_html.xpath("string({})".format(formatted_xpath))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM