[英]Scrapy Error: Couldn't bind: 24: Too many open files
I'm running Scrapy on a list of domains, and a lot of the pages are getting this error: Couldn't bind: 24: Too many open files.
我在域列表上运行Scrapy,很多页面都出现此错误:
Couldn't bind: 24: Too many open files.
I was not getting this error on my linux machine, but I am now getting it on my Mac. 我的Linux机器上没有出现此错误,但现在Mac上出现了此错误。 I'm not sure if this is to do with running on Sierra or if perhaps I left out a Scrapy configuration.
我不确定这是否与在Sierra上运行或是否遗漏了Scrapy配置有关。 I checked
ulimit
and it returns unlimited
so I don't think it is that. 我检查了
ulimit
,它返回unlimited
所以我认为不是那样。
In case it is to do with my spider, here is that: 万一与我的蜘蛛有关,这是:
class JakeSpider(CrawlSpider):
name = 'jake'
allowed_domains = allowedDomains
start_urls = startUrls
rules = (
Rule(LinkExtractor(), callback='parse_page', follow=True),
)
def parse_page(self, response):
page = response.url
domain = urlparse(page).netloc
domain = domain.replace('www.','')
#print(domain, 'is domain and page is', page)
linksToGet = getHotelUrlsForDomain(domain)
#if(len(linksToGet) == 0):
# print('\n ... links to get was zero \n')
#print('linksToGet = ', linksToGet)
links = response.xpath('//a/@href').extract()
for link in links:
if link in linksToGet:
print('\n\n\n found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n')
with open('hotelBacklinks.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'hotelURL':link, 'targetDomain': domain})
Edit: here is the full error line for one of them. 编辑:这是其中之一的完整错误行。 It isn't causing the scrape to crash, but there are a lot of lines like this, so I think I'm not getting as many pages as I otherwise would.
这不会导致刮擦崩溃,但是有很多这样的行,所以我认为我得到的页面不如我想的那么多。 The error line:
2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.
错误行:
2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.
Thanks in advance for any tips. 在此先感谢您提供任何提示。
pipeline
for saving all scraped data. pipeline
来保存所有抓取的数据。 parse_page
. parse_page
。 Every function tries to open and write to the same file.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.