[英]Python Web Scraping | How to scrape data from multiple urls by choosing page number as a range with Beautiful Soup and selenium?
from selenium import webdriver
import time
from bs4 import BeautifulSoup as Soup
driver = webdriver.Firefox(executable_path='C://Downloads//webdrivers//geckodriver.exe')
a = 'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page='
for c in range(8):
#a = f'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page={c}'
cd = driver.get(a+str(c))
page_source = driver.page_source
bs = Soup(page_source, 'html.parser')
fetch_data = bs.find_all('div', {'class': 's-expand-height.s-include-content-margin.s-latency-cf-section.s-border-bottom'})
for f_data in fetch_data:
product_name = f_data.find('span', {'class': 'a-size-medium.a-color-base.a-text-normal'})
print(product_name + '\n')
现在这里的问题是,Webdriver 成功访问了 7 个页面,但没有提供任何输出或错误。
现在我不知道 M 哪里出错了。
任何建议,参考提供有关此问题的解决方案的文章将始终受到欢迎。
您没有选择正确的 div 标签来使用 BeautifulSoup 获取产品,导致没有输出。
尝试以下代码段:-
#range of pages
for i in range(1,20):
driver.get(f'https://www.amazon.com/s?k=Mobile&i=amazon-devices&page={i}')
page_source = driver.page_source
bs = Soup(page_source, 'html.parser')
#get search results
products=bs.find_all('div',{'data-component-type':"s-search-result"})
#for each product in search result print product name
for i in range(0,len(products)):
for product_name in products[i].find('span',class_="a-size-medium a-color-base a-text-normal"):
print(product_name)
您可以打印 bs 或 fetch_data 进行调试。
反正
在我看来,您可以使用requests
或urllib
来获取 page_source 而不是 selenium
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.