Scrapy start_urls in text file

Question

I'm trying to crawl through urls and retrieve h1 of each url. The url is stored in a text file. The code is:

class MySpider(CrawlSpider):
    name = "sitemaplocation"
    allowed_domains = ["xyz.nl"]
    f = open("locationlist.txt",'r')
    start_urls = [url.strip() for url in f.readlines()]
    f.close()


def parse(self, response):
    sel = Selector(response)

    title= sel.xpath("//h1[@class='no-bd']/text()").extract()
    print title

The code crawls through the site but doesn't print anything. Any help would be useful.

Answer 1

Try to place this:

name = "sitemaplocation"
allowed_domains = ["xyz.nl"]
f = open("locationlist.txt",'r')
start_urls = [url.strip() for url in f.readlines()]
f.close()

into

__init__

method in MySpider class.

And also where do you call parse function?

Answer 2

Try inheriting your spider from Spider not from CrawlSpider :

When writing crawl spider rules, avoid using parse as callback, since the CrawlSpider uses the parse method itself to implement its logic. So if you override the parse method, the crawl spider will no longer work.

Scrapy start_urls in text file

Question

2 answers

solution1
1 2014-04-09 10:48:03

solution2
0 2014-04-09 10:51:05

Scrapy start_urls in text file

Question

2 answers

solution1 1 2014-04-09 10:48:03

solution2 0 2014-04-09 10:51:05

solution1
1 2014-04-09 10:48:03

solution2
0 2014-04-09 10:51:05