Python Scrapy蜘蛛正在抓取网址，但什么也没有返回

Question

I'm trying to parse site . 我正在尝试解析网站。 It's my first project with scrapy and i'm a beginner in python. 这是我的第一个scrapy项目，我是python的初学者。 Using this article , I crawled one url and didn't get any data from it. 使用这篇文章，我抓取了一个网址，但没有从中获取任何数据。

I tried some different xpath queries and changed the USER_AGENT in settings, but it still return nothing. 我尝试了一些不同的xpath查询并在设置中更改了USER_AGENT，但它仍然没有返回任何内容。

This is the part of code that describes what i'm trying to parse: 这是描述我正在尝试解析的代码的一部分：

        def parse(self, response):
    SET_SELECTOR = '.set'
    for brickset in response.css(SET_SELECTOR):

        TITLE_SELECTOR= '//head//title/text'
        DATE_SELECTOR= '//table/tbody[2]//td[2]//text()'
        TEMP_SELECTOR= '//table/tbody[2]/tr[1]/td[1]//text()'
        yield {
            'title': brickset.xpath(TITLE_SELECTOR).extract_first(),
            'date': brickset.xpath(DATE_SELECTOR).extract_first(),
            'temp1':brickset.xpath(TEMP_SELECTOR).extract_first(),
        }

This is the data from the command line: 这是来自命令行的数据：

 DEBUG: Crawled (200) <GET https://www.gismeteo.ru/diary/4368/2019/6/> (referer: None)

Answer 1

You just set the wrong selector. 你只需设置错误的选择器。 I've tested it for you: 我已经为你测试了它：

    def parse(self, response):
        TITLE_SELECTOR= '//div[@id="page_title"]//text()'
        DATE_SELECTOR= '//table//tbody[1]//text()'

        yield {
            'title': response.xpath(TITLE_SELECTOR).extract_first(),
            'date': response.xpath(DATE_SELECTOR).extract(),
        }

Python Scrapy蜘蛛正在抓取网址，但什么也没有返回

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-06-15 17:49:13

Python Scrapy蜘蛛正在抓取网址，但什么也没有返回

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-06-15 17:49:13

解决方案1
0 已采纳 2019-06-15 17:49:13