Scrapy CrawlSpider Output while Crawling

Question

I'm trying to learn Scrapy framework and I'm able to write a spider and crawl around the web and so forth. I'm also able to save the desired data but not in a way I would like to do.

Example Code:

    import scrapy
    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor
    
    class ExampleSpider(CrawlSpider):
        name = 'examplecrawler'
        allowed_domains = ['example.com']
        start_urls = ['https://www.example/']
        rules = [
            Rule(LinkExtractor(unique=True), follow=True, callback="parse")
        ]
    
        def parse(self, response):
            url = response.url
            yield {'link': url}

Current Result: Spider runs recursively and will only write output using Item Exporters when I stop it using Control + C

Desired Result: Spider runs recursively and write to output while running, not having to stop it to write output.

I have read through the documentation and see where I could possibly use something like writing a custom pipeline to write the data, but I was wondering if this was possible with the current item exporters. ie: csv and json.

Answer 1

In order to modify the way your current crawler works so it prints out it's real time status you will have to modify the existing code of the base class or create a crawler yourself . Since you are importing an existing module you really have no way of changing how it works so your best (if not only) bet is to create you own crawler with customized output.

Scrapy CrawlSpider Output while Crawling

Question

1 answers

solution1
0 2020-09-22 23:17:55

Scrapy CrawlSpider Output while Crawling

Question

1 answers

solution1 0 2020-09-22 23:17:55

solution1
0 2020-09-22 23:17:55