简体   繁体   English

当我尝试使用 BeautifulSoup(Web Scraping)从网站提取数据时出现空列表

[英]Getting Empty list when i tried to extract data from website using BeautifulSoup (Web Scraping)

I was trying to extract profile name from the reviews from this link: https://www.amazon.in/Samsung-Midnight-Storage-6000mAh-Battery/dp/B0B4F52B5X/?_encoding=UTF8&pd_rd_w=4JKBg&content-id=amzn1.sym.e0e8ce89-ede3-4c51-b6ad-44989efc8536&pf_rd_p=e0e8ce89-ede3-4c51-b6ad-44989efc8536&pf_rd_r=NEBBF38XJRRBGK0BZBX3&pd_rd_wg=qFxtB&pd_rd_r=0f156162-4690-4ef5-9a8b-8b03e82e194b&ref_=pd_gw_ci_mcx_mr_hp_d&th=1我试图从这个链接的评论中提取配置文件名称: https://www.amazon.in/Samsung-Midnight-Storage-6000mAh-Battery/dp/B0B4F52B5X/?_encoding=UTF8&pd_rd_w=4JKBg&content-id=amzn1.sym .e0e8ce89-ede3-4c51-b6ad-44989efc8536&pf_rd_p=e0e8ce89-ede3-4c51-b6ad-44989efc8536&pf_rd_r=NEBBF38XJRRBGK0BZBX3&pd_rd_wg=qFxtB&pd_rd_r=0f156162-4690-4ef5-9a8b-8b03e82e194b&ref_=pd_gw_ci_mcx_mr_hp_d&th=1

under span and class_="a-profile-name"在 span 和 class_="a-profile-name" 下

but when I tried to print it, It just returned an empty list但是当我尝试打印它时,它只是返回了一个空列表

Below is my code:下面是我的代码:

    
from bs4 import BeautifulSoup as bs 

import requests

link='https://www.amazon.in/Adidas-Unisex-Sogold-cblack-Football/dp/B096NC52HY/ref=sr_1_3_sspa?crid=1HCHWT6Y1WFYU&keywords=football%2Bshoes&qid=1660709102&sprefix=foot%2Caps%2C246&sr=8-3-spons&th=1&psc=1'

soup =bs(requests.get(link).text,"html.parser")

name = soup.find_all("span",class_= "a-profile-name")



print(name)

It is always a good idea to send some headers with your request, eg a user-agent :在您的请求中发送一些标头总是一个好主意,例如user-agent

requests.get(link, headers={'User-Agent': 'Mozilla/5.0'})

Note: amazon really do not like to be scraped, so sooner or later they will detect your activity and may block you.注意:亚马逊真的不喜欢被刮,所以他们迟早会检测到你的活动并可能阻止你。

Example例子

from bs4 import BeautifulSoup as bs 
import requests

link='https://www.amazon.in/Adidas-Unisex-Sogold-cblack-Football/dp/B096NC52HY/ref=sr_1_3_sspa?crid=1HCHWT6Y1WFYU&keywords=football%2Bshoes&qid=1660709102&sprefix=foot%2Caps%2C246&sr=8-3-spons&th=1&psc=1'

soup =bs(requests.get(link, headers={'User-Agent': 'Mozilla/5.0'}).text,"html.parser")

name = soup.find_all("span",class_= "a-profile-name")
print(name)
Output Output
[<span class="a-profile-name">Amazon Customer</span>, <span class="a-profile-name">Shubam Kadam</span>, <span class="a-profile-name">Aditi Sharma</span>, <span class="a-profile-name">Moris lopez</span>, <span class="a-profile-name">tana tubin</span>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM