用多个蜘蛛运行Selenium无头

Question

I have many scrapy spiders which run in parallel using scrapyd. 我有许多使用scrapyd并行运行的scrapy蜘蛛。 What I am doing is something like the following code. 我正在做的是类似下面的代码。

My question is, do I really need to start a display for every spider and how does the driver know to start using which display? 我的问题是，我真的需要为每只蜘蛛开始显示吗？驱动程序如何知道开始使用哪种显示？ Should I just start one display globally and start multiple webdriver instances within the same Display? 我是否应该全局启动一个显示并在同一个显示中启动多个webdriver实例？

def __init__(self):
    dispatcher.connect(self.spider_closed, signals.spider_closed)

def spider_closed(self, spider):
    if self.driver:
        self.driver.quit()

    if self.display:
        self.display.stop()

def parse(self, response):
    self.display = Display(visible=0, size=(1024, 768))
    self.display.start()
    self.driver = webdriver.Firefox()

    self.driver.get(response.url)
    page = Selector(text=self.driver.page_source)

    # doing all parsing etc

Answer 1

I suggest using the splinter browser handler instead; 我建议使用splinter浏览器处理程序; it is a wrapper around selenium. 它是硒的包裹物。 It solves your problem exactly, as the Display handling is done by the package. 它完全解决了您的问题，因为显示处理由包完成。

With a few more package installations, you can also remove the need for a Display altogether, meaning splinter is now headless (the browser window does not open, and it is much faster). 通过更多的软件包安装，您还可以完全取消对显示器的需求，这意味着分割器现在无头（浏览器窗口无法打开，而且速度更快）。 Check out the Splinter docs to know how to make in headless. 查看Splinter文档，了解如何制作无头。 I personally suggest the PhantomJS driver, even though you'll have to install the non-Python PhantomJS program. 我个人建议使用PhantomJS驱动程序，即使你必须安装非Python PhantomJS程序。

用多个蜘蛛运行Selenium无头

问题描述

1 个解决方案

解决方案1
3 2016-03-11 18:28:11

用多个蜘蛛运行Selenium无头

问题描述

1 个解决方案

解决方案1 3 2016-03-11 18:28:11

解决方案1
3 2016-03-11 18:28:11