[英]Python Selenium Open Multiple Browsers With Multithreading
I am trying to cycle through a list of about 20000 URLs using Selenium and Chrome.我正在尝试使用 Selenium 和 Chrome 循环浏览大约 20000 个 URL 的列表。 Doing this in one browser will of course take a long time.
在一个浏览器中执行此操作当然需要很长时间。 So I am trying to set it up to open in 5 browsers in this test case.
所以我试图在这个测试用例中将它设置为在 5 个浏览器中打开。 I looked at a few tutorials but I am still struggling to figure it out.
我看了一些教程,但我仍然在努力弄清楚。 Here is my code so far:
到目前为止,这是我的代码:
def check_all_urls(urls):
options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)
for url in urls:
my_urls = ('\n'.join(''.join(el) for el in url))
driver.get(my_urls)
number_of_threads = 5
threads = []
for number in range(number_of_threads):
t = threading.Thread(target=check_all_urls(get_all_gdc_urls()), args=(number,))
t.start()
The list of urls is getting pull in by a function I am passing in there called get_all_gdc_urls()
我在那里传递的一个名为
get_all_gdc_urls()
的函数正在获取 url 列表
As it is now, it opens one browser and starts cycling through the list of urls.就像现在一样,它打开一个浏览器并开始在 url 列表中循环。 What do I need to add to get it open more browsers?
我需要添加什么才能打开更多浏览器?
All help greatly appreciated.非常感谢所有帮助。
I think we can use a queue,I hope this works for you我想我们可以使用队列,希望这对你有用
from threading import Thread
from Queue import Queue
concurrent = 5
s=1
def doWork():
while True:
url = q.get()
urlstatus = crawl(url)
q.task_done()
def crawl(myurl):
options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)
driver.get(my_url)
driver.close()
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
with open("urls.txt") as infile:
for line in infile:
lin="https://"+line
q.put(lin.strip())
q.join()
except KeyboardInterrupt:
sys.exit(1)
This is just a brief case and allows you to open multiple chrome instances with multi-threading.这只是一个公文包,允许您使用多线程打开多个 chrome 实例。 You need to do your own modification on this code example.
您需要对此代码示例进行自己的修改。 Hopefully it will help :)
希望它会有所帮助:)
def task1(url):
chrome_options = Options()
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
print('task completed')
url_list = ['http://www.google.com','http://www.spacex.com']
thread_list = []
for i in range(len(url_list)):
thread_list.append(threading.Thread(target=task1,args=[url_list[i]]))
for i in range(len(thread_list)):
thread_list[i].start()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.