Scrapy错误：无法绑定：24：打开的文件太多

Question

I'm running Scrapy on a list of domains, and a lot of the pages are getting this error: Couldn't bind: 24: Too many open files. 我在域列表上运行Scrapy，很多页面都出现此错误： Couldn't bind: 24: Too many open files.

I was not getting this error on my linux machine, but I am now getting it on my Mac. 我的Linux机器上没有出现此错误，但现在Mac上出现了此错误。 I'm not sure if this is to do with running on Sierra or if perhaps I left out a Scrapy configuration. 我不确定这是否与在Sierra上运行或是否遗漏了Scrapy配置有关。 I checked ulimit and it returns unlimited so I don't think it is that. 我检查了ulimit ，它返回unlimited所以我认为不是那样。

In case it is to do with my spider, here is that: 万一与我的蜘蛛有关，这是：

class JakeSpider(CrawlSpider):
    name = 'jake'
    allowed_domains = allowedDomains
    start_urls = startUrls
    rules = (
        Rule(LinkExtractor(), callback='parse_page', follow=True),
    )


    def parse_page(self, response):
        page = response.url
        domain = urlparse(page).netloc
        domain = domain.replace('www.','')
        #print(domain, 'is domain and page is', page)
        linksToGet = getHotelUrlsForDomain(domain)
        #if(len(linksToGet) == 0):
        #    print('\n ... links to get was zero \n')
        #print('linksToGet = ', linksToGet)
        links = response.xpath('//a/@href').extract()
        for link in links:
            if link in linksToGet:
                print('\n\n\n   found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n')
                with open('hotelBacklinks.csv', 'a') as csvfile:
                    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                    writer.writerow({'hotelURL':link, 'targetDomain': domain})

Edit: here is the full error line for one of them. 编辑：这是其中之一的完整错误行。 It isn't causing the scrape to crash, but there are a lot of lines like this, so I think I'm not getting as many pages as I otherwise would. 这不会导致刮擦崩溃，但是有很多这样的行，所以我认为我得到的页面不如我想的那么多。 The error line: 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files. 错误行： 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.

Thanks in advance for any tips. 在此先感谢您提供任何提示。

Answer 1

You should use pipeline for saving all scraped data. 您应该使用pipeline来保存所有抓取的数据。
You have this error because you have many calls function parse_page . 您有此错误，因为您有很多调用函数parse_page 。 Every function tries to open and write to the same file. 每个函数都尝试打开并写入相同的文件。 Write to file is block operation Here is doc from Scrapy https://doc.scrapy.org/en/latest/topics/item-pipeline.html 写入文件是块操作这是来自Scrapy的文档https://doc.scrapy.org/en/latest/topics/item-pipeline.html

Scrapy错误：无法绑定：24：打开的文件太多

问题描述

1 个解决方案

解决方案1
0 2017-09-24 18:16:30

Scrapy错误：无法绑定：24：打开的文件太多

问题描述

1 个解决方案

解决方案1 0 2017-09-24 18:16:30

解决方案1
0 2017-09-24 18:16:30