简体   繁体   中英

Python Web Scraping | How to scrape data from multiple urls by choosing page number as a range with Beautiful Soup and selenium?

from selenium import webdriver
import time
from bs4 import BeautifulSoup as Soup
driver = webdriver.Firefox(executable_path='C://Downloads//webdrivers//geckodriver.exe')
a = 'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page='
for c in range(8):

    #a = f'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page={c}'

    cd = driver.get(a+str(c))

    page_source = driver.page_source
    bs = Soup(page_source, 'html.parser')

    fetch_data = bs.find_all('div', {'class': 's-expand-height.s-include-content-margin.s-latency-cf-section.s-border-bottom'})

    for f_data in fetch_data:
        product_name = f_data.find('span', {'class': 'a-size-medium.a-color-base.a-text-normal'})
        print(product_name + '\n')

Now The problem here is that, Webdriver successfully visits 7 pages, But doesn't provide any output or an error.

Now I don't know where M in going wrong.

Any suggestions, reference to a article that provides solution about this problem will be always welcomed.

You are not selecting the right div tag to fetch the products using BeautifulSoup, leading to no output.

Try the following snippet:-

#range of pages
for i in range(1,20):

    page_source = driver.page_source
    bs = Soup(page_source, 'html.parser')
    #get search results

    #for each product in search result print product name
    for i in range(0,len(products)):
        for product_name in products[i].find('span',class_="a-size-medium a-color-base a-text-normal"):

You can print bs or fetch_data to debug.


In my opinion, you can use requests or urllib to get page_source instead of selenium

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM