简体   繁体   中英

Scrapy :: How to get requests with exceptions to export to .csv?

I'm fairly new to using Scrapy and have been coding for about 2 years now (sorry if this is a dumb question).

I am currently attempting to scrape for generic information like whether or not a website has a 'privacy policy' link or an 'about us' link on a list of websites. I've been able to scrape the information on websites that have URLS that support HTTPS or have live links.

I've been getting exceptions for websites that don't load or have issues with HTTPS vs. HTTP:

  • twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

  • twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]>]

Based on the multiple crawls of the spider, I find that the websites that the resulting .csv excludes these links.

I was wondering how to get the spider to include these failed links with preset parameters for each column if possible.

In Request function besides callback there is errback (documentation is here ).

You can write function for processing requests that generate errors.

So you use: yield Request(url="http://www.example.com", callback=self.mycallback, errback=self.myerrback)

And define:

def myerrback(self, failure):
    # your processing here

Check usage here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM