简体   繁体   English

如何在Python中将多处理与Selenium结合使用

[英]How to use Multiprocessing with Selenium in Python

I am trying to use multiprocessing using selenium in python. 我正在尝试在python中使用硒处理多重处理。 My code is as follows: 我的代码如下:

from selenium import webdriver
from multiprocessing import Pool
import xlwings as xw

driver = webdriver.Chrome('chromedriver.exe')
driver.get("https://example.com")

wb = xw.Book('my_file.xlsm')
sht = wb.sheets["Sheet1"]
final_list = []

search = driver.find_element_by_id("ContentPlaceHolder1_txtByName")
for item in search:
       z = item.find_element_by_class_name("valuetext")
       info = z.find_element_by_tag_name("span")
       final_list.append(info.text)

def automate(num):
    col = num
    list_item = final_list[num]   
    sht.range(1, col).value = each


if __name__ == '__main__':

    p = Pool(processes=4)
    data = p.map(automate,range(1,20))        

The issue I'm having is for each of the 4 processes the web page is re-opened again and I don't understand why. 我遇到的问题是网页的这四个过程中的每个过程都被重新打开了,我不明白为什么。 If p.map is only targeting the automate function then why is the rest of the code run for every process? 如果p.map仅针对automate函数,那么为什么其余代码会在每个进程中运行?

I'm still new to multiprocessing so am not sure if that's just how it works.Is there another way to do this to ensure the processes only target the function itself, or is there a way I could use threading? 我仍然对多处理还不陌生,所以不确定这是否是它的工作方式吗?是否有另一种方法可以确保进程仅针对函数本身,还是可以使用线程?

In the examples in the multiprocessing docs , they suggest using Pool with a context manager, ie, 多处理文档的示例中,他们建议将Pool与上下文管理器一起使用,即

with Pool(processes=4) as pool: print(pool.map(f, range(10))

The's the most stand-out different I immediately see between your use and the docs. 我立即在您的使用和文档之间看到了最突出的区别。 I don't see it spelled out, but I would infer based on your observation, that the framework is "reimporting" (so to speak) your module in each process that it spawned, and that's resulting in the behavior you report: namely, multiple browsers opening. 我看不出它的详细说明,但我根据您的观察推断,该框架正在其所产生的每个过程中“重新导入”(可以这么说)您的模块,这导致了您报告的行为:多个浏览器打开。

To prevent that, I would recommend putting the initialization code within a function; 为了防止这种情况,我建议将初始化代码放在函数中; if you want to share the final_list , you should probably do so with a queue or other data structure supported by multiprocess . 如果你想分享的final_list ,你应该有一个这样做的队列支撑或其他数据结构multiprocess

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM