简体   繁体   English

在Scrapy中将列表作为参数传递

[英]Passing list as arguments in Scrapy

I am trying to build an application using Flask and Scrapy. 我正在尝试使用Flask和Scrapy构建应用程序。 I have to pass the list of urls to spider. 我必须将网址list传递给Spider。 I tried using the following syntax: 我尝试使用以下语法:

__init__: in Spider
self.start_urls = ["http://www.google.com/patents/" + x for x in u]

Flask Method
u = ["US6249832", "US20120095946"]
os.system("rm static/s.json; scrapy crawl patents -d u=%s -o static/s.json" % u)

I know similar thing can be done by reading file having required urls, but can I pass list of urls for crawling? 我知道可以通过读取具有必需url的文件来完成类似的操作,但是我可以传递要爬网的url列表吗?

Override spider's __init__() method: 覆盖蜘蛛的__init__()方法:

class MySpider(Spider):
    name = 'my_spider'    

    def __init__(self, *args, **kwargs): 
      super(MySpider, self).__init__(*args, **kwargs) 

      endpoints = kwargs.get('start_urls').split(',')
      self.start_urls = ["http://www.google.com/patents/" + x for x in endpoints]

And pass the list of endpoints through the -a command line argument: 并通过-a命令行参数传递端点列表:

scrapy crawl patents -a start_urls="US6249832,US20120095946" -o static/s.json

See also: 也可以看看:


Note that you can also run Scrapy from script : 请注意,您还可以从脚本运行Scrapy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM