[英]Scrapy xpath iterate (shell works)
I am trying to scrape some info from the companieshouse of the UK using scrapy. 我正在尝试使用scrapy从英国的companyhouse收集一些信息。 I made a connection with the website through the shell and throught he command
我通过外壳与网站建立了联系,并通过他的命令
scrapy shell https://beta.companieshouse.gov.uk/search?q=a
and with 与
response.xpath('//*[@id="results"]').extract()
I managed to get the results back. 我设法得到了结果。
I tried to put this into a program so i could export it to a csv or json. 我试图将其放入程序,以便将其导出到csv或json。 But I am having trouble getting it to work.. This is what i got;
但是,我无法使其正常工作。
import scrapy
class QuotesSpider(scrapy.Spider):
name = "gov2"
def start_requests(self):
start_urls = ['https://beta.companieshouse.gov.uk/search?q=a']
def parse(self, response):
products = response.xpath('//*[@id="results"]').extract()
print(products)
Very simple but tried a lot. 很简单,但是尝试了很多。 Any insight would be appreciated!!
任何见识将不胜感激!
These lines of code are the problem: 这些代码行就是问题所在:
def start_requests(self):
start_urls = ['https://beta.companieshouse.gov.uk/search?q=a']
The start_requests
method should return an iterable of Request
s; start_requests
方法应返回Request
的可迭代; yours returns None
. 您的返回
None
。
The default start_requests
creates this iterable from urls specified in start_urls
, so simply defining that as a class variable (outside of any function) and not overriding start_requests
will work as you want. 默认的
start_requests
从start_urls
指定的url创建此可迭代项,因此只需将其定义为类变量(在任何函数之外),并且不覆盖start_requests
就可以根据需要工作。
Try to do: 试着做:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "gov2"
start_urls = ["https://beta.companieshouse.gov.uk/search?q=a"]
def parse(self, response):
products = response.xpath('//*[@id="results"]').extract()
print(products)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.