![](/img/trans.png)
[英]How to pass a user-defined argument to a scrapy Spider when running it from a script
[英]How to pass two user-defined arguments to a scrapy spider
在如何在scrapy spider中传递用户定义的参数之后 ,我编写了以下简单的spider:
import scrapy
class Funda1Spider(scrapy.Spider):
name = "funda1"
allowed_domains = ["funda.nl"]
def __init__(self, place='amsterdam'):
self.start_urls = ["http://www.funda.nl/koop/%s/" % place]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)
这似乎可行; 例如,如果我从命令行运行
scrapy crawl funda1 -a place=rotterdam
它会生成一个类似于http://www.funda.nl/koop/rotterdam/的rotterdam.html
。 接下来,我想扩展它,以便可以指定一个子页面,例如http://www.funda.nl/koop/rotterdam/p2/ 。 我尝试了以下方法:
import scrapy
class Funda1Spider(scrapy.Spider):
name = "funda1"
allowed_domains = ["funda.nl"]
def __init__(self, place='amsterdam', page=''):
self.start_urls = ["http://www.funda.nl/koop/%s/p%s/" % (place, page)]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename, 'wb') as f:
f.write(response.body)
但是,如果我尝试使用
scrapy crawl funda1 -a place=rotterdam page=2
我收到以下错误:
crawl: error: running 'scrapy crawl' with more than one spider is no longer supported
我不是很理解此错误消息,因为我不是在尝试爬网两个蜘蛛,而是只是试图传递两个关键字参数来修改start_urls
。 我该如何进行这项工作?
提供多个参数时,您需要为每个参数加上-a
前缀。
针对您的情况的正确行是:
scrapy crawl funda1 -a place=rotterdam -a page=2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.