简体   繁体   中英

Python Scrapy spider is crawling the url, but returns nothing

I'm trying to parse site . It's my first project with scrapy and i'm a beginner in python. Using this article , I crawled one url and didn't get any data from it.

I tried some different xpath queries and changed the USER_AGENT in settings, but it still return nothing.

This is the part of code that describes what i'm trying to parse:

        def parse(self, response):
    SET_SELECTOR = '.set'
    for brickset in response.css(SET_SELECTOR):

        TITLE_SELECTOR= '//head//title/text'
        DATE_SELECTOR= '//table/tbody[2]//td[2]//text()'
        TEMP_SELECTOR= '//table/tbody[2]/tr[1]/td[1]//text()'
        yield {
            'title': brickset.xpath(TITLE_SELECTOR).extract_first(),
            'date': brickset.xpath(DATE_SELECTOR).extract_first(),
            'temp1':brickset.xpath(TEMP_SELECTOR).extract_first(),
        }

This is the data from the command line:

 DEBUG: Crawled (200) <GET https://www.gismeteo.ru/diary/4368/2019/6/> (referer: None)

You just set the wrong selector. I've tested it for you:

    def parse(self, response):
        TITLE_SELECTOR= '//div[@id="page_title"]//text()'
        DATE_SELECTOR= '//table//tbody[1]//text()'

        yield {
            'title': response.xpath(TITLE_SELECTOR).extract_first(),
            'date': response.xpath(DATE_SELECTOR).extract(),
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM