Python 与 selenium 并行执行

Question

我对使用 selenium 在 python 中的并行执行感到困惑。 似乎有几种方法可以 go 关于它，但有些似乎已经过时了。

有一个名为python-wd-parallel的 python 模块似乎有一些功能可以做到这一点，但它是从 2013 年开始的，现在这还有用吗？ 我也找到了这个例子。
有concurrent.futures ，这似乎更新了很多，但实现起来并不容易。 任何人都有在 selenium 中并行执行的工作示例？
还有只使用线程和执行程序来完成工作，但我觉得这会更慢，因为它没有使用所有内核并且仍然以串行形式运行。

使用 selenium 进行并行执行的最新方法是什么？

Answer 1

使用joblib 的 Parallel模块来做到这一点，它是一个很好的并行执行库。

假设我们有一个名为urls的 url 列表，我们想并行截取每个urls的屏幕截图

首先让我们导入必要的库

from selenium import webdriver
from joblib import Parallel, delayed

现在让我们定义一个将屏幕截图作为 base64 的函数

def take_screenshot(url):
    phantom = webdriver.PhantomJS('/path/to/phantomjs')
    phantom.get(url)
    screenshot = phantom.get_screenshot_as_base64()
    phantom.close()

    return screenshot

现在要并行执行，你要做的是

screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)

当这一行完成执行时，您将在screenshots中看到来自所有运行进程的所有数据。

关于平行的说明

Parallel(n_jobs=-1)意味着使用你可以使用的所有资源
joblib delayed(function)(input)是joblib为您尝试并行运行的函数创建输入的方式

更多信息可以在joblib文档中找到

Answer 2

Python 平行 Wd接缝因 github（最后一次提交 9 年前）而死。 它还为 selenium 实现了一个过时的协议。 最后代码是专有的saucelabs 。

通常最好使用SeleniumBase一个基于 selenium 和 pytest 的 Python 测试框架。 它非常完整地支持性能提升、并行线程等等的一切。 如果那不是你的情况......继续阅读。

Selenium 性能提升（ concurrent.futures ）

简答

threads和processes都将大大加快selenium 代码的速度。

下面给出了简短的例子。 selenium 工作由返回页面标题的selenium_title function 完成。 这不处理每个线程/进程执行期间发生的异常。 对于那个看起来很长的答案-处理异常。

线程池concurrent.futures.ThreadPoolExecutor 。

from selenium import webdriver  
from concurrent import futures

def selenium_title(url):  
  wdriver = webdriver.Chrome() # chrome webdriver
  wdriver.get(url)  
  title = wdriver.title  
  wdriver.quit()
  return title

links = ["https://www.amazon.com", "https://www.google.com"]

with futures.ThreadPoolExecutor() as executor: # default/optimized number of threads
  titles = list(executor.map(selenium_title, links))

进程池工人concurrent.futures.ProcessPoolExecutor 。 只需要将上面代码中的ThreadPoolExecuter替换为ProcessPoolExecutor即可。 它们都源自Executor基础 class。 此外，您必须保护main ，如下所示。

if __name__ == '__main__':
 with futures.ProcessPoolExecutor() as executor: # default/optimized number of processes
   titles = list(executor.map(selenium_title, links))

长答案

为什么使用 Python GIL 的`Threads`有效？

由于 Python GIL 和即使线程将被上下文切换，即使是坚韧的 Python 对线程也有限制。 Selenium 的实现细节将带来性能提升。 Selenium 通过发送诸如POST 、 GET （ HTTP requests ）之类的命令来工作。 这些被发送到浏览器驱动程序服务器。 因此，您可能已经知道 I/O 绑定任务（ HTTP requests ）释放 GIL，因此性能提升。

处理异常

我们可以对上面的示例进行一些小修改，以处理产生的线程上的Exceptions 。 我们不使用executor.map ，而是使用executor.submit 。 这将返回包装在Future实例上的标题。

要访问返回的标题，我们可以使用future_titles[index].result where index size len(links) ，或者简单地使用 a for like bellow。

with futures.ThreadPoolExecutor() as executor:
  future_titles = [ executor.submit(selenium_title, link) for link in links ]
  for future_title, link in zip(future_titles, links): 
    try:        
      title = future_title.result() # can use `timeout` to wait max seconds for each thread               
    except Exception as exc: # this thread migh have had an exception
      print('url {:0} generated an exception: {:1}'.format(link, exc))

请注意，除了对future_titles进行迭代之外，我们还会对links进行迭代，因此如果某个线程中出现Exception ，我们知道哪个url(link)对此负责。

futures.Future class 很酷，因为它们可以让您控制从每个线程收到的结果。 就像它是否正确完成或有异常等等，更多关于这里。

同样重要的是，如果您不关心线程返回项目的顺序， futures.as_completed会更好。 但由于控制异常的语法有点难看，我在这里省略了它。

性能提升和线程

首先为什么我一直使用线程来加速我的 selenium 代码：

在 I/O 绑定任务上，我对 selenium 的经验表明，使用进程池 ( Process ) 或线程池 ( Threads ) 之间的差异很小或没有差异。 这里也得出关于 Python 线程与 I/O 绑定任务上的进程的类似结论。
我们也知道进程使用自己的 memory 空间。 这意味着更多的 memory 消耗。 进程的生成速度也比线程慢一些。

Answer 3

我创建了一个项目来执行此操作，它重用 webdriver 实例以获得更好的性能：

https://github.com/testlabauto/local_selenium_pool

https://pypi.org/project/local-selenium-pool/

Answer 4

对于并行运行 Python 测试，您可以考虑使用pytest-xdist为您处理多个进程： https://github.com/pytest-dev/pytest-xdist 。 这是pytest框架的插件。

And for running Python Selenium tests in parallel with pytest , there's a framework that may simplify the Selenium test multithreading for you, SeleniumBase : https://github.com/seleniumbase/SeleniumBase . It functions as a pytest plugin so you can use the pytest multi-threading args provided by pytest-xdist , and run all your Selenium Python tests multithreaded as needed. 例如： pytest -n 4用于 4 个并行线程。

Python 与 selenium 并行执行

问题描述

2 个解决方案

解决方案1
9 2017-05-10 13:19:54

解决方案2
4 2021-09-19 23:30:48

Selenium 性能提升（ concurrent.futures ）

简答

长答案

为什么使用 Python GIL 的`Threads`有效？

处理异常

性能提升和线程

解决方案3
3 2018-06-25 14:55:22

解决方案4
0 2022-09-12 14:23:49

Python 与 selenium 并行执行

问题描述

2 个解决方案

解决方案1 9 2017-05-10 13:19:54

解决方案2 4 2021-09-19 23:30:48

Selenium 性能提升（ concurrent.futures ）

简答

长答案

为什么使用 Python GIL 的Threads有效？

处理异常

性能提升和线程

解决方案3 3 2018-06-25 14:55:22

解决方案4 0 2022-09-12 14:23:49

解决方案1
9 2017-05-10 13:19:54

解决方案2
4 2021-09-19 23:30:48

为什么使用 Python GIL 的`Threads`有效？

解决方案3
3 2018-06-25 14:55:22

解决方案4
0 2022-09-12 14:23:49