简体   繁体   中英

Scrapy Error: Couldn't bind: 24: Too many open files

I'm running Scrapy on a list of domains, and a lot of the pages are getting this error: Couldn't bind: 24: Too many open files.

I was not getting this error on my linux machine, but I am now getting it on my Mac. I'm not sure if this is to do with running on Sierra or if perhaps I left out a Scrapy configuration. I checked ulimit and it returns unlimited so I don't think it is that.

In case it is to do with my spider, here is that:

class JakeSpider(CrawlSpider):
    name = 'jake'
    allowed_domains = allowedDomains
    start_urls = startUrls
    rules = (
        Rule(LinkExtractor(), callback='parse_page', follow=True),
    )


    def parse_page(self, response):
        page = response.url
        domain = urlparse(page).netloc
        domain = domain.replace('www.','')
        #print(domain, 'is domain and page is', page)
        linksToGet = getHotelUrlsForDomain(domain)
        #if(len(linksToGet) == 0):
        #    print('\n ... links to get was zero \n')
        #print('linksToGet = ', linksToGet)
        links = response.xpath('//a/@href').extract()
        for link in links:
            if link in linksToGet:
                print('\n\n\n   found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n')
                with open('hotelBacklinks.csv', 'a') as csvfile:
                    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                    writer.writerow({'hotelURL':link, 'targetDomain': domain})

Edit: here is the full error line for one of them. It isn't causing the scrape to crash, but there are a lot of lines like this, so I think I'm not getting as many pages as I otherwise would. The error line: 2017-09-24 14:21:29 [scrapy.core.scraper] ERROR: Error downloading <GET https://alabamatheatre.com/>: Couldn't bind: 24: Too many open files.

Thanks in advance for any tips.

  1. You should use pipeline for saving all scraped data.
  2. You have this error because you have many calls function parse_page . Every function tries to open and write to the same file. Write to file is block operation Here is doc from Scrapy https://doc.scrapy.org/en/latest/topics/item-pipeline.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM