[英]scrapy.Request appears unable to callback a url
I have modified my code to nail down where the error is arising.我已经修改了我的代码来确定错误发生的位置。 I am using scrapy and in the first "def parse" i am trying to call a url and then in the next "def" i am trying to then crawl that url.我正在使用scrapy,在第一个“def parse”中我试图调用一个url,然后在下一个“def”中我试图抓取那个url。
But, i seem unable to make the scray.Request work, it wont crawl the URL.但是,我似乎无法使 scray.Request 工作,它不会抓取 URL。
import scrapy
#from urllib.parse import urljoin
from CharlesChurch.items import CharleschurchItem
class charleschurchSpider(scrapy.Spider):
name = "charleschurch"
allowed_domains = ["charleschurch.com"]
start_urls = ["https://www.charleschurch.com/sitemap"]
def parse(self, response):
# for href in response.xpath('//*[@class="contacts-item"]/ul/li/a/@href'):
# url = urljoin('https://www.charleschurch.com/',href.extract())
# yield scrapy.Request(url, callback=self.parse_dir_contents)
url = 'https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923'
yield scrapy.Request(url, self.parse_dir_contents)
def parse_dir_contents(self, response):
# def parse(self, response):
for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
item = CharleschurchItem()
item['name'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/span[1]/b/text()').extract()
item['address'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/div/*[@itemprop="postalCode"]/text()').extract()
plotnames = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/text()').extract()
plotnames = [plotname.strip() for plotname in plotnames]
plotids = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/@href').extract()
plotids = [plotid.strip() for plotid in plotids]
plotprices = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__price"]/text()').extract()
plotprices = [plotprice.strip() for plotprice in plotprices]
result = zip(plotnames, plotids, plotprices)
for plotname, plotid, plotprice in result:
item['plotname'] = plotname
item['plotid'] = plotid
item['plotprice'] = plotprice
yield item
the error i get is:我得到的错误是:
2020-09-08 22:12:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-09-08 22:12:07 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.charleschurch.com/sitemap> (referer: None)
2020-09-08 22:12:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923> (referer: https://www.charleschurch.com/sitemap)
2020-09-08 22:12:08 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923> (referer: https://www.charleschurch.com/sitemap)
Traceback (most recent call last):
File "C:\Users\andre\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
StopIteration: <200 https://www.charleschurch.com/north-yorkshire_harrogate/kingsley-park-10923>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
result = f(*args, **kw)
File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 60, in process_spider_input
return scrape_func(response, request, spider)
File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\core\scraper.py", line 152, in call_spider
warn_on_generator_with_return_value(spider, callback)
File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\misc.py", line 218, in warn_on_generator_with_return_value
if is_generator_with_return_value(callable):
File "C:\Users\andre\Anaconda3\lib\site-packages\scrapy\utils\misc.py", line 203, in is_generator_with_return_value
tree = ast.parse(dedent(inspect.getsource(callable)))
File "C:\Users\andre\Anaconda3\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
def parse_dir_contents(self, response):
^
IndentationError: unexpected indent
It seems the code yield scrapy.Request(url, self.parse_dir_contents)
is not working and i am not sure why?似乎代码yield scrapy.Request(url, self.parse_dir_contents)
不起作用,我不知道为什么?
From your logs:从你的日志:
def parse_dir_contents(self, response):
^
IndentationError: unexpected indent
You have an identation error:你有一个识别错误:
def parse_dir_contents(self, response):
# def parse(self, response):
for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
item = CharleschurchItem()
Remove the commented line or fix the identation删除注释行或修复标识
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.