Scrapy xpath iterate (shell works)

Question

I am trying to scrape some info from the companieshouse of the UK using scrapy. I made a connection with the website through the shell and throught he command

 scrapy shell https://beta.companieshouse.gov.uk/search?q=a

and with

response.xpath('//*[@id="results"]').extract()

I managed to get the results back.

I tried to put this into a program so i could export it to a csv or json. But I am having trouble getting it to work.. This is what i got;

import scrapy


class QuotesSpider(scrapy.Spider):
name = "gov2"

def start_requests(self):
    start_urls = ['https://beta.companieshouse.gov.uk/search?q=a']

def parse(self, response):
    products = response.xpath('//*[@id="results"]').extract()
    print(products)

Very simple but tried a lot. Any insight would be appreciated!!

Answer 1

These lines of code are the problem:

def start_requests(self):
    start_urls = ['https://beta.companieshouse.gov.uk/search?q=a']

The start_requests method should return an iterable of Request s; yours returns None .

The default start_requests creates this iterable from urls specified in start_urls , so simply defining that as a class variable (outside of any function) and not overriding start_requests will work as you want.

Answer 2

Try to do:

import scrapy


class QuotesSpider(scrapy.Spider):

    name = "gov2"
    start_urls = ["https://beta.companieshouse.gov.uk/search?q=a"]

    def parse(self, response):
        products = response.xpath('//*[@id="results"]').extract()
        print(products)

Scrapy xpath iterate (shell works)

Question

2 answers

solution1
2 ACCPTED 2019-03-13 19:55:34

solution2
0 2019-03-13 20:08:40

Scrapy xpath iterate (shell works)

Question

2 answers

solution1 2 ACCPTED 2019-03-13 19:55:34

solution2 0 2019-03-13 20:08:40

solution1
2 ACCPTED 2019-03-13 19:55:34

solution2
0 2019-03-13 20:08:40