[英]Multithreading / Multiprocessing in Selenium
I wrote a python script that scrapes the urls from a text file and prints out the href from an element.我编写了一个 python 脚本,该脚本从文本文件中抓取 url 并从元素中打印出 href。 However my goal here is to make it faster being able to do it on a larger scale with Multiprocessing or Multithreading.
然而,我的目标是通过多处理或多线程更快地实现更大规模的操作。
In the workflow each browser process would get the href from the current url and load the next link from the que in the same browser istance (let's say there are 5).在工作流程中,每个浏览器进程都会从当前的 url 中获取 href,并在同一浏览器中从 que 加载下一个链接(假设有 5 个)。 Of couse each link should get scraped 1 time.
当然,每个链接都应该被刮掉 1 次。
Example input File : HNlinks.txt
示例输入文件:
HNlinks.txt
https://news.ycombinator.com/user?id=ingve
https://news.ycombinator.com/user?id=dehrmann
https://news.ycombinator.com/user?id=thanhhaimai
https://news.ycombinator.com/user?id=rbanffy
https://news.ycombinator.com/user?id=raidicy
https://news.ycombinator.com/user?id=svenfaw
https://news.ycombinator.com/user?id=ricardomcgowan
Code:代码:
from selenium import webdriver
driver = webdriver.Chrome()
input1 = open("HNlinks.txt", "r")
urls1 = input1.readlines()
for url in urls1:
driver.get(url)
links=driver.find_elements_by_class_name('athing')
for link in links:
print(link.find_element_by_css_selector('a').get_attribute("href"))
Note: I have not test-run this answer locally.注意:我没有在本地测试运行这个答案。 Please try and give feedback:
请尝试并提供反馈:
from multiprocessing import Pool
from selenium import webdriver
input1 = open("HNlinks.txt", "r")
urls1 = input1.readlines()
def load_url(url):
driver = webdriver.Chrome()
driver.get(url)
links=driver.find_elements_by_class_name('athing')
for link in links:
print(link.find_element_by_css_selector('a').get_attribute("href"))
if __name__ == "__main__":
# how many concurrent processes do you want to span? this is also limited by
the number of cores that your computer has.
processes = len(urls1)
p = Pool(processes )
p.map(load_url, urls1)
p.close()
p.join()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.