简体   繁体   English

如何在 Python Selenium 中使用多进程池

[英]How To use multiprocess pool With Python Selenium

I have some code that needs to go and grab data from hundreds of web pages and I would like to speed this up by running multiple instances of Selenium Chrome browser for it.我有一些代码需要从数百个网页中获取数据,我想通过为其运行多个 Selenium Chrome 浏览器实例来加快速度。 For example I have this code here:例如,我在这里有这个代码:

from selenium import webdriver
from multiprocessing import Pool
from tkinter import *

#initiate browser
def browser():
    global driver
    driver = webdriver.Chrome(r"C:\Users\areed\Desktop\p\chromedriver.exe")
    return driver

#test link
def test():
    links = [link1.com, link2.com, link3.com, link4.com]
    browser()
    for l in links:
        driver.get(l)
        dostuff(driver)

#Scrape Data
def dostuff(driver):
    print('doing Stuff')

#multiprocess Function      
def multip():
    pool = Pool(processes=4)
    pool.map(test())

#tkinter Window
if __name__ == "__main__":  
    win = Tk()
    win.title("test")
    win.geometry('300x200')
    btn = Button(win, text="Tester", command=multip)
    btn.pack()
    win.mainloop()

How can i make it to where this code runs multiple selenium chrome browsers?我如何才能到达此代码运行多个 selenium chrome 浏览器的位置? This code works just fine without adding the multi process to it.此代码无需添加多进程即可正常工作。 Can someone please explain to me how to fix this.有人可以向我解释如何解决这个问题。 Thanks!谢谢!

I write the sample code of mulitiprocess.我写了multiprocess的示例代码。

You can set the link as argument of test() function.您可以将链接设置为 test() 函数的参数。

Each browser will navigate to diffrent link.每个浏览器都会导航到不同的链接。

from selenium import webdriver
from multiprocessing import Pool

# I remove global driver because you cannot use shared driver in multiprocess.
def browser():  
    driver = webdriver.Chrome()
    return driver
 
def test_func(link):
    driver = browser()  # Each browser use different driver.
    driver.get(link)

def multip():
    links = ["https://stackoverflow.com/", "https://signup.microsoft.com/"]
    pool = Pool(processes=3)
    for i in range(0, len(links)):  
        pool.apply_async(test_func, args={links[i]})

    pool.close()
    pool.join()
    
 if __name__ == '__main__':
     multip()

I have tried above code and became successful.我已经尝试了上面的代码并成功了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM