简体   繁体   中英

Simultaneous Requests in Selenium Python

I am learning web scraping for a hobby project (Using Selenium in python).

I'm trying to get the information on product listings . There are about a 100 web pages , and I am sequentially loading each , processing the data in each then moving to the next.
This process takes over 5 minutes with the major bottleneck being the loading time of each page.

As I am only "reading" from the pages (not interacting with any of them)... I would like to know is it possible to send requests for all the pages together(as opposed to waiting for one page to load, then requesting the next page) and process the data as they arrive.

PS: Please tell me if there are some other solutions to reduce the loading time

You could use the Python Requests Module and the built-in threading module to make it faster. So for example:

import threading
import requests

list_of_links = [
    # All your links here
]

threads = []

all_html = {
    # This will be a dictionary of the links as key and the HTML of the links as values
}


def get_html(link):
    r = requests.get(link)
    all_html[link] = r.text


for link in list_of_links:
    thread = threading.Thread(target=get_html, args=(link,))
    thread.start()
    threads.append(thread)
   
for t in threads:
    t.join()
    
print(all_html)

print("DONE")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM