[英]Python parallel execution with selenium
I'm confused about parallel execution in python using selenium.我对使用 selenium 在 python 中的并行执行感到困惑。 There seems to be a few ways to go about it, but some seem out of date.似乎有几种方法可以 go 关于它,但有些似乎已经过时了。
There's a python module called python-wd-parallel
which seems to have some functionality to do this, but it's from 2013, is this still useful now?有一个名为python-wd-parallel
的 python 模块似乎有一些功能可以做到这一点,但它是从 2013 年开始的,现在这还有用吗? I also found this example .我也找到了这个例子。
There's concurrent.futures
, this seems a lot newer, but not so easy to implement.有concurrent.futures
,这似乎更新了很多,但实现起来并不容易。 Anyone have a working example with parallel execution in selenium?任何人都有在 selenium 中并行执行的工作示例?
There's also using just threads and executors to get the job done, but I feel this will be slower, because it's not using all the cores and is still running in serial formation.还有只使用线程和执行程序来完成工作,但我觉得这会更慢,因为它没有使用所有内核并且仍然以串行形式运行。
What is the latest way to do parallel execution using selenium?使用 selenium 进行并行执行的最新方法是什么?
Use joblib's Parallel module to do that, its a great library for parallel execution.使用joblib 的 Parallel模块来做到这一点,它是一个很好的并行执行库。
Lets say we have a list of urls named urls
and we want to take a screenshot of each one in parallel假设我们有一个名为urls
的 url 列表,我们想并行截取每个urls
的屏幕截图
First lets import the necessary libraries首先让我们导入必要的库
from selenium import webdriver
from joblib import Parallel, delayed
Now lets define a function that takes a screenshot as base64现在让我们定义一个将屏幕截图作为 base64 的函数
def take_screenshot(url):
phantom = webdriver.PhantomJS('/path/to/phantomjs')
phantom.get(url)
screenshot = phantom.get_screenshot_as_base64()
phantom.close()
return screenshot
Now to execute that in parallel what you would do is现在要并行执行,你要做的是
screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)
When this line will finish executing, you will have in screenshots
all of the data from all of the processes that ran.当这一行完成执行时,您将在screenshots
中看到来自所有运行进程的所有数据。
Explanation about Parallel关于平行的说明
Parallel(n_jobs=-1)
means use all of the resources you can Parallel(n_jobs=-1)
意味着使用你可以使用的所有资源delayed(function)(input)
is joblib
's way of creating the input for the function you are trying to run on parallel joblib
delayed(function)(input)
是joblib
为您尝试并行运行的函数创建输入的方式More information can be found on the joblib
docs更多信息可以在joblib
文档中找到
Generally it's better to use SeleniumBase a Python test framework based on selenium and pytest.通常最好使用SeleniumBase一个基于 selenium 和 pytest 的 Python 测试框架。 It's very complete supporting everything for performance boost, parallel threads and much more.它非常完整地支持性能提升、并行线程等等的一切。 If that's not your case... keep reading.如果那不是你的情况......继续阅读。
threads
and processes
will give you a considerable speed up on your selenium code . threads
和processes
都将大大加快selenium 代码的速度。 Short examples are given bellow.下面给出了简短的例子。 The selenium work is done by selenium_title
function that return the page title. selenium 工作由返回页面标题的selenium_title
function 完成。 That don't deal with exceptions happening during each thread/process execution.这不处理每个线程/进程执行期间发生的异常。 For that look Long Answer - Dealing with exceptions .对于那个看起来很长的答案-处理异常。
concurrent.futures.ThreadPoolExecutor
.线程池concurrent.futures.ThreadPoolExecutor
。from selenium import webdriver
from concurrent import futures
def selenium_title(url):
wdriver = webdriver.Chrome() # chrome webdriver
wdriver.get(url)
title = wdriver.title
wdriver.quit()
return title
links = ["https://www.amazon.com", "https://www.google.com"]
with futures.ThreadPoolExecutor() as executor: # default/optimized number of threads
titles = list(executor.map(selenium_title, links))
concurrent.futures.ProcessPoolExecutor
.进程池工人concurrent.futures.ProcessPoolExecutor
。 Just need to replace ThreadPoolExecuter
by ProcessPoolExecutor
in the code above.只需要将上面代码中的ThreadPoolExecuter
替换为ProcessPoolExecutor
即可。 They are both derived from the Executor
base class.它们都源自Executor
基础 class。 Also you must protect the main , like below.此外,您必须保护main ,如下所示。if __name__ == '__main__':
with futures.ProcessPoolExecutor() as executor: # default/optimized number of processes
titles = list(executor.map(selenium_title, links))
Threads
with Python GIL works?为什么使用 Python GIL 的Threads
有效? Even tough Python has limitations on threads due the Python GIL and even though threads will be context switched.由于 Python GIL 和即使线程将被上下文切换,即使是坚韧的 Python 对线程也有限制。 Performance gain will come due to implementation details of Selenium. Selenium 的实现细节将带来性能提升。 Selenium works by sending commands like POST
, GET
( HTTP requests
). Selenium 通过发送诸如POST
、 GET
( HTTP requests
)之类的命令来工作。 Those are sent to the browser driver server.这些被发送到浏览器驱动程序服务器。 Consequently you might already know I/O bound tasks ( HTTP requests
) releases the GIL, so the performance gain.因此,您可能已经知道 I/O 绑定任务( HTTP requests
)释放 GIL,因此性能提升。
We can make small modifications on the example above to deal with Exceptions
on the threads spawned.我们可以对上面的示例进行一些小修改,以处理产生的线程上的Exceptions
。 Instead of using executor.map
we use executor.submit
.我们不使用executor.map
,而是使用executor.submit
。 That will return the title wrapped on Future
instances.这将返回包装在Future
实例上的标题。
To access the returned title we can use future_titles[index].result
where index size len(links)
, or simple use a for
like bellow.要访问返回的标题,我们可以使用future_titles[index].result
where index size len(links)
,或者简单地使用 a for
like bellow。
with futures.ThreadPoolExecutor() as executor:
future_titles = [ executor.submit(selenium_title, link) for link in links ]
for future_title, link in zip(future_titles, links):
try:
title = future_title.result() # can use `timeout` to wait max seconds for each thread
except Exception as exc: # this thread migh have had an exception
print('url {:0} generated an exception: {:1}'.format(link, exc))
Note that besides iterating over future_titles
we iterate over links
so in case an Exception
in some thread we know which url(link)
was responsible for that.请注意,除了对future_titles
进行迭代之外,我们还会对links
进行迭代,因此如果某个线程中出现Exception
,我们知道哪个url(link)
对此负责。
The futures.Future
class are cool because they give you control on the results received from each thread. futures.Future
class 很酷,因为它们可以让您控制从每个线程收到的结果。 Like if it completed correctly or there was an exception and others, more about here .就像它是否正确完成或有异常等等,更多关于这里。
Also important to mention is that futures.as_completed
is better if you don´t care which order the threads return items.同样重要的是,如果您不关心线程返回项目的顺序, futures.as_completed
会更好。 But since the syntax to control exceptions with that is a little ugly I omitted it here.但由于控制异常的语法有点难看,我在这里省略了它。
First why I've been always using threads for speeding up my selenium code:首先为什么我一直使用线程来加速我的 selenium 代码:
Process
) or Threads ( Threads
).在 I/O 绑定任务上,我对 selenium 的经验表明,使用进程池 ( Process
) 或线程池 ( Threads
) 之间的差异很小或没有差异。 Here also reach similar conclusions about Python threads vs processes on I/O bound tasks.这里也得出关于 Python 线程与 I/O 绑定任务上的进程的类似结论。I created a project to do this and it reuses webdriver instances for better performance:我创建了一个项目来执行此操作,它重用 webdriver 实例以获得更好的性能:
https://github.com/testlabauto/local_selenium_pool https://github.com/testlabauto/local_selenium_pool
https://pypi.org/project/local-selenium-pool/ https://pypi.org/project/local-selenium-pool/
For running Python tests in parallel, you may consider using pytest-xdist
to handle the multiple processes for you: https://github.com/pytest-dev/pytest-xdist .对于并行运行 Python 测试,您可以考虑使用pytest-xdist
为您处理多个进程: https://github.com/pytest-dev/pytest-xdist 。 That's a plugin for the pytest
framework.这是pytest
框架的插件。
And for running Python Selenium tests in parallel with pytest
, there's a framework that may simplify the Selenium test multithreading for you, SeleniumBase
: https://github.com/seleniumbase/SeleniumBase . And for running Python Selenium tests in parallel with pytest
, there's a framework that may simplify the Selenium test multithreading for you, SeleniumBase
: https://github.com/seleniumbase/SeleniumBase . It functions as a pytest plugin so you can use the pytest multi-threading args provided by pytest-xdist
, and run all your Selenium Python tests multithreaded as needed. It functions as a pytest plugin so you can use the pytest multi-threading args provided by pytest-xdist
, and run all your Selenium Python tests multithreaded as needed. Eg: pytest -n 4
for 4 parallel threads.例如: pytest -n 4
用于 4 个并行线程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.