简体   繁体   中英

Python Selenium Open Multiple Browsers With Multithreading

I am trying to cycle through a list of about 20000 URLs using Selenium and Chrome. Doing this in one browser will of course take a long time. So I am trying to set it up to open in 5 browsers in this test case. I looked at a few tutorials but I am still struggling to figure it out. Here is my code so far:

def check_all_urls(urls):
    options = Options()
    options.headless = False
    driver = webdriver.Chrome(options=options)

    for url in urls:
        my_urls = ('\n'.join(''.join(el) for el in url))
        driver.get(my_urls)


number_of_threads = 5

threads = []

for number in range(number_of_threads):
    t = threading.Thread(target=check_all_urls(get_all_gdc_urls()), args=(number,))
    t.start()

The list of urls is getting pull in by a function I am passing in there called get_all_gdc_urls()

As it is now, it opens one browser and starts cycling through the list of urls. What do I need to add to get it open more browsers?

All help greatly appreciated.

I think we can use a queue,I hope this works for you

from threading import Thread
from Queue import Queue

concurrent = 5
s=1
def doWork():
    while True:
        url = q.get()
        urlstatus = crawl(url)
        q.task_done()

def crawl(myurl):
    options = Options()
    options.headless = False
    driver = webdriver.Chrome(options=options)
    driver.get(my_url)
    driver.close()

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    with open("urls.txt") as infile:
        for line in infile:
            lin="https://"+line
            q.put(lin.strip())
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

This is just a brief case and allows you to open multiple chrome instances with multi-threading. You need to do your own modification on this code example. Hopefully it will help :)

def task1(url):
    chrome_options = Options()
    driver = webdriver.Chrome(options=chrome_options)
    driver.get(url)
    print('task completed')
url_list = ['http://www.google.com','http://www.spacex.com']
thread_list = []
for i in range(len(url_list)):
    thread_list.append(threading.Thread(target=task1,args=[url_list[i]]))
for i in range(len(thread_list)):
    thread_list[i].start()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM