很烂。开始爬行后如何更改蜘蛛设置？

Question

我无法在 parse 方法中更改蜘蛛设置。 但这绝对是一种方式。

例如：

class SomeSpider(BaseSpider):
    name = 'mySpider'
    allowed_domains = ['example.com']
    start_urls = ['http://example.com']
    settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.FirstPipeline']
    print settings['ITEM_PIPELINES'][0]
    #printed 'myproject.pipelines.FirstPipeline'
    def parse(self, response):
        #...some code
        settings.overrides['ITEM_PIPELINES'] = ['myproject.pipelines.SecondPipeline']
        print settings['ITEM_PIPELINES'][0]
        # printed 'myproject.pipelines.SecondPipeline'
        item = Myitem()
        item['mame'] = 'Name for SecondPipeline'

但！ 项目将由 FirstPipeline 处理。 新的 ITEM_PIPELINES 参数不起作用。 开始抓取后如何更改设置？ 提前致谢！

Answer 1

如果您希望不同的蜘蛛拥有不同的管道，您可以为蜘蛛设置管道列表属性，该属性定义该蜘蛛的管道。 比在管道中检查是否存在：

class MyPipeline(object):

    def process_item(self, item, spider):
        if self.__class__.__name__ not in getattr(spider, 'pipelines',[]):
            return item
        ...
        return item

class MySpider(CrawlSpider):
    pipelines = set([
        'MyPipeline',
        'MyPipeline3',
    ])

如果您希望通过不同的管道处理不同的项目，您可以执行以下操作：

    class MyPipeline2(object):
        def process_item(self, item, spider):
            if isinstance(item, MyItem):
                ...
                return item
            return item

Answer 2

基于此信息问题#4196结合telnet 控制台，可以做到这一点，甚至是执行后。

将 telnet 客户端附加到端口（例如1234 ）和启动scrapy crawl命令时记录的密码，并发出以下交互式 Python 语句来修改当前运行的downloader ：

$ telnet  127.0.0.1  6023  # Read the actual port from logs.
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Username: scrapy
Password: <copy-from-logs>

>>> engine.downloader.total_concurrency
8
>>> engine.downloader.total_concurrency = 32
>>> est()
Execution engine status

time()-engine.start_time                        : 14226.62803554535
engine.has_capacity()                           : False
len(engine.downloader.active)                   : 28
engine.scraper.is_idle()                        : False
engine.spider.name                              : <foo>
engine.spider_is_idle(engine.spider)            : False
engine.slot.closing                             : False
len(engine.slot.inprogress)                     : 32
len(engine.slot.scheduler.dqs or [])            : 531
len(engine.slot.scheduler.mqs)                  : 0
len(engine.scraper.slot.queue)                  : 0
len(engine.scraper.slot.active)                 : 0
engine.scraper.slot.active_size                 : 0
engine.scraper.slot.itemproc_size               : 0
engine.scraper.slot.needs_backout()             : False

很烂。开始爬行后如何更改蜘蛛设置？

问题描述

2 个解决方案

解决方案1
3 2015-09-26 21:23:39

解决方案2
0 2020-08-26 20:48:33

很烂。 开始爬行后如何更改蜘蛛设置？

问题描述

2 个解决方案

解决方案1 3 2015-09-26 21:23:39

解决方案2 0 2020-08-26 20:48:33

很烂。开始爬行后如何更改蜘蛛设置？

解决方案1
3 2015-09-26 21:23:39

解决方案2
0 2020-08-26 20:48:33