[英]Scrapy randomly crashing with celery in django
I am running my Scrapy project within Django on a Ubuntu Server. 我在Ubuntu服务器上运行Django中的Scrapy项目。 The problem is, Scrapy randomly crash even if Its only one spider running. 问题是,即使只有一只蜘蛛在运行,Scrapy会随机崩溃。
Below is a snippet of the TraceBack. 下面是TraceBack的片段。 As a none expert, I have googled 作为一个没有专家,我用Google搜索
_SIGCHLDWaker Scrappy _SIGCHLDWaker Scrappy
but couldn't comprehend the solutions found for the snippet of below: 但无法理解下面的片段找到的解决方案:
--- <exception caught here> ---
File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 602, in _doReadOrWrite
why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'
I am not familiar with twisted and it seems very unfriendly to me despite trying to understand it. 我不熟悉扭曲,尽管我试图理解它,但对我来说似乎非常不友好。
Below is the full traceback: 以下是完整的追溯:
2015-10-10 14:17:13,652: INFO/Worker-4] Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, RandomUserAgentMiddleware, ProxyMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
[2015-10-10 14:17:13,655: INFO/Worker-4] Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
[2015-10-10 14:17:13,656: INFO/Worker-4] Enabled item pipelines: MadePipeline
[2015-10-10 14:17:13,656: INFO/Worker-4] Spider opened
[2015-10-10 14:17:13,657: INFO/Worker-4] Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
Unhandled Error
Traceback (most recent call last):
File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/log.py", line 101, in callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/log.py", line 84, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 602, in _doReadOrWrite
why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'
Here is how I have implemented my task per documentation of scrapy 以下是我根据scrapy文档实现任务的方法
from scrapy.crawler import CrawlerProcess, CrawlerRunner
from twisted.internet import reactor
from scrapy.utils.project import get_project_settings
@shared_task
def run_spider(**kwargs):
task_id = run_spider.request.id
status = AsyncResult(str(task_id)).status
source = kwargs.get("source")
pro, created = Project.objects.get_or_create(name="b2b")
query, _ = SearchTerm.objects.get_or_create(term=kwargs['query'])
src, _ = Source.objects.get_or_create(term=query, engine=kwargs['source'])
b, _ = Bot.objects.get_or_create(project=pro, query=src, spiderid=str(task_id), status=status, start_time=timezone.now())
process = CrawlerRunner(get_project_settings())
if source == "amazon":
d = process.crawl(ComberSpider, query=kwargs['query'], job_id=task_id)
d.addBoth(lambda _: reactor.stop())
else:
d = process.crawl(MadeSpider, query=kwargs['query'], job_id=task_id)
d.addBoth(lambda _: reactor.stop())
reactor.run()
Also I have tried something like this tutorial but it results in a different problem which I couldn't get traceback 此外,我尝试了类似这个教程的东西,但它导致了一个我无法追溯的不同问题
For completeness here is a snippet of my Spider 为了完整性,这里是我的蜘蛛的片段
class ComberSpider(CrawlSpider):
name = "amazon"
allowed_domains = ["amazon.com"]
rules = (Rule(LinkExtractor(allow=r'corporations/.+/-*50/[0-9]+\.html', restrict_xpaths="//a[@class='next']"),
callback="parse_items", follow=True),
)
def __init__(self, *args, **kwargs):
super(ComberSpider, self).__init__(*args, **kwargs)
self.query = kwargs.get('query')
self.job_id = kwargs.get('job_id')
SignalManager(dispatcher.Any).connect(self.closed_handler, signal=signals.spider_closed)
self.start_urls = (
"http://www.amazon.com/corporations/%s/------------"
"--------50/1.html" % self.query.strip().replace(" ", "_").lower(),
)
This is a known Scrapy issue. 这是一个已知的Scrapy问题。 See the issue report thread for details and possible workarounds. 有关详细信息和可能的解决方法,请参阅问题报告主题 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.