简体   繁体   中英

scrapy.Spider sub class, not able to invoke instance method

first of all i'm new python and the world of web scraping. I just wanted to call an instance method/function inside a Spider sub class.

Code:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http://quotes.toscrape.com/tag/humor/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'author': quote.xpath('span/small/text()').get(),
                'text': quote.css('span.text::text').get(),
            }
        print("*** call next page function")
        self.parse_next_page(response)    

    def parse_next_page(self, response):
        print("*** parsee next page function invoked")
        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

Output:

2020-07-29 09:30:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/tag/humor/> (referer: None)
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Jane Austen', 'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Steve Martin', 'text': '“A day without sunshine is like, you know, night.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Garrison Keillor', 'text': '“Anyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Jim Henson', 'text': '“Beauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Charles M. Schulz', 'text': "“All you need is love. But a little chocolate now and then doesn't hurt.”"}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Suzanne Collins', 'text': "“Remember, we're madly in love, so it's all right to kiss me anytime you feel like it.”"}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Charles Bukowski', 'text': '“Some people never go crazy. What truly horrible lives they must lead.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Terry Pratchett', 'text': '“The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'Dr. Seuss', 'text': '“Think left and think right and think low and think high. Oh, the thinks you can think up if only you try!”'}
2020-07-29 09:30:39 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/tag/humor/>
{'author': 'George Carlin', 'text': '“The reason I talk to myself is because I’m the only one whose answers I accept.”'}
*** call next page funcion
2020-07-29 09:30:39 [scrapy.core.engine] INFO: Closing spider (finished)

You may see the instance method "parse_next_page" is not getting invoke. Please let me know what am i doing wrong in here.

Try this for pagination. Working fine for me

def parse(self, response):
    for quote in response.css('div.quote'):
        yield {
            'author': quote.xpath('span/small/text()').get(),
            'text': quote.css('span.text::text').get(),
        }
    next_page = response.css('li.next a::attr("href")').get()
    if next_page is not None:
        yield response.follow(next_page, self.parse)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM