Python: why is in scrapy crawlspider not printing or doing anything?

Question

I'm new to scrapy and cant get it to do anything. Eventually I want to scrape all the html comments from a website by following internal links.

For now I'm just trying to scrape the internal links and add them to a list.

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

    class comment_spider(CrawlSpider):
        name = 'test'
        allowed_domains = ['https://www.andnowuknow.com/']
        start_urls = ["https://www.andnowuknow.com/"]

        rules = (Rule(LinkExtractor(), callback='parse_start_url', follow=True),)

        def parse_start_url(self, response):
            return self.parse_item(response)

        def parse_item(self, response):
            urls = []
            for link in LinkExtractor(allow=(),).extract_links(response):
                urls.append(link)
                print(urls)

I'm just trying get it to print something at this point, nothing I've tried so far works.

It finishes with an exit code of 0, but won't print so I cant tell whats happening.

What am I missing?

Answer 1

Surely your messages log should give us some hints, but I see your allowed_domains has a URL instead of a domain. You should set it like this:

allowed_domains = ["andnowuknow.com"]

(See it in the official documentation)

Hope it helps.

Python: why is in scrapy crawlspider not printing or doing anything?

Question

1 answers

solution1
2 ACCPTED 2019-03-20 05:22:29

Python: why is in scrapy crawlspider not printing or doing anything?

Question

1 answers

solution1 2 ACCPTED 2019-03-20 05:22:29

solution1
2 ACCPTED 2019-03-20 05:22:29