I am learning web scraping for a hobby project (Using Selenium in python).
I'm trying to get the information on product listings . There are about a 100 web pages , and I am sequentially loading each , processing the data in each then moving to the next.
This process takes over 5 minutes with the major bottleneck being the loading time of each page.
As I am only "reading" from the pages (not interacting with any of them)... I would like to know is it possible to send requests for all the pages together(as opposed to waiting for one page to load, then requesting the next page) and process the data as they arrive.
PS: Please tell me if there are some other solutions to reduce the loading time
You could use the Python Requests Module and the built-in threading module to make it faster. So for example:
import threading
import requests
list_of_links = [
# All your links here
]
threads = []
all_html = {
# This will be a dictionary of the links as key and the HTML of the links as values
}
def get_html(link):
r = requests.get(link)
all_html[link] = r.text
for link in list_of_links:
thread = threading.Thread(target=get_html, args=(link,))
thread.start()
threads.append(thread)
for t in threads:
t.join()
print(all_html)
print("DONE")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.