scrapy.Request 似乎無法回調 url

Question

我已經修改了我的代碼來確定錯誤發生的位置。 我正在使用scrapy，在第一個“def parse”中我試圖調用一個url，然后在下一個“def”中我試圖抓取那個url。

但是，我似乎無法使 scray.Request 工作，它不會抓取 URL。

import scrapy
#from urllib.parse import urljoin

from CharlesChurch.items import CharleschurchItem

class charleschurchSpider(scrapy.Spider):
    name = "charleschurch"
    allowed_domains = ["charleschurch.com"]
    start_urls = ["https://www.charleschurch.com/sitemap"]

    def parse(self, response):
        # for href in response.xpath('//*[@class="contacts-item"]/ul/li/a/@href'):
        #     url = urljoin('https://www.charleschurch.com/',href.extract())
        #     yield scrapy.Request(url, callback=self.parse_dir_contents)
        url = 'https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923'
        yield scrapy.Request(url, self.parse_dir_contents)

            
    def parse_dir_contents(self, response):
#    def parse(self, response):
        for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
            item = CharleschurchItem()
            item['name'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/span[1]/b/text()').extract()
            item['address'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/div/*[@itemprop="postalCode"]/text()').extract()
            plotnames = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/text()').extract()
            plotnames = [plotname.strip() for plotname in plotnames]
            plotids = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/@href').extract()
            plotids = [plotid.strip() for plotid in plotids]
            plotprices = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__price"]/text()').extract()
            plotprices = [plotprice.strip() for plotprice in plotprices]
            result = zip(plotnames, plotids, plotprices)
            for plotname, plotid, plotprice in result:
                item['plotname'] = plotname
                item['plotid'] = plotid
                item['plotprice'] = plotprice
                yield item

我得到的錯誤是：

2020-09-08 22:12:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-09-08 22:12:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.charleschurch.com/sitemap> (referer: None)
2020-09-08 22:12:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923> (referer: https://www.charleschurch.com/sitemap)
2020-09-08 22:12:08 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923> (referer: https://www.charleschurch.com/sitemap)
Traceback (most recent call last):
  File "C:\Users\andre\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: <200 https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 60, in process_spider_input
    return scrape_func(response, request, spider)
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\core\scraper.py", line 152, in call_spider
    warn_on_generator_with_return_value(spider, callback)
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\misc.py", line 218, in warn_on_generator_with_return_value
    if is_generator_with_return_value(callable):
  File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\misc.py", line 203, in is_generator_with_return_value
    tree = ast.parse(dedent(inspect.getsource(callable)))
  File "C:\Users\andre\Anaconda3\lib\ast.py", line 47, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 1
    def parse_dir_contents(self, response):
    ^
IndentationError: unexpected indent

似乎代碼yield scrapy.Request(url, self.parse_dir_contents)不起作用，我不知道為什么？

Answer 1

從你的日志：

    def parse_dir_contents(self, response):
    ^
IndentationError: unexpected indent

你有一個識別錯誤：

    def parse_dir_contents(self, response):
#    def parse(self, response):
        for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
            item = CharleschurchItem()

刪除注釋行或修復標識

scrapy.Request 似乎無法回調 url

問題描述

1 個解決方案

解決方案1
0 已采納 2020-09-08 21:37:30

scrapy.Request 似乎無法回調 url

問題描述

1 個解決方案

解決方案1 0 已采納 2020-09-08 21:37:30

解決方案1
0 已采納 2020-09-08 21:37:30