简体   繁体   English

Scrapy错误:无法绑定:24:打开的文件太多

[英]Scrapy Error: Couldn't bind: 24: Too many open files

I'm running Scrapy on a list of domains, and a lot of the pages are getting this error: Couldn't bind: 24: Too many open files. 我在域列表上运行Scrapy,很多页面都出现此错误: Couldn't bind: 24: Too many open files.

I was not getting this error on my linux machine, but I am now getting it on my Mac. 我的Linux机器上没有出现此错误,但现在Mac上出现了此错误。 I'm not sure if this is to do with running on Sierra or if perhaps I left out a Scrapy configuration. 我不确定这是否与在Sierra上运行或是否遗漏了Scrapy配置有关。 I checked ulimit and it returns unlimited so I don't think it is that. 我检查了ulimit ,它返回unlimited所以我认为不是那样。

In case it is to do with my spider, here is that: 万一与我的蜘蛛有关,这是:

class JakeSpider(CrawlSpider):
    name = 'jake'
    allowed_domains = allowedDomains
    start_urls = startUrls
    rules = (
        Rule(LinkExtractor(), callback='parse_page', follow=True),
    )


    def parse_page(self, response):
        page = response.url
        domain = urlparse(page).netloc
        domain = domain.replace('www.','')
        #print(domain, 'is domain and page is', page)
        linksToGet = getHotelUrlsForDomain(domain)
        #if(len(linksToGet) == 0):
        #    print('\n ... links to get was zero \n')
        #print('linksToGet = ', linksToGet)
        links = response.xpath('//a/@href').extract()
        for link in links:
            if link in linksToGet:
                print('\n\n\n   found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n')
                with open('hotelBacklinks.csv', 'a') as csvfile:
                    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                    writer.writerow({'hotelURL':link, 'targetDomain': domain})

Edit: here is the full error line for one of them. 编辑:这是其中之一的完整错误行。 It isn't causing the scrape to crash, but there are a lot of lines like this, so I think I'm not getting as many pages as I otherwise would. 这不会导致刮擦崩溃,但是有很多这样的行,所以我认为我得到的页面不如我想的那么多。 The error line: 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files. 错误行: 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.

Thanks in advance for any tips. 在此先感谢您提供任何提示。

  1. You should use pipeline for saving all scraped data. 您应该使用pipeline来保存所有抓取的数据。
  2. You have this error because you have many calls function parse_page . 您有此错误,因为您有很多调用函数parse_page Every function tries to open and write to the same file. 每个函数都尝试打开并写入相同的文件。 Write to file is block operation Here is doc from Scrapy https://doc.scrapy.org/en/latest/topics/item-pipeline.html 写入文件是块操作这是来自Scrapy的文档https://doc.scrapy.org/en/latest/topics/item-pipeline.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM