[英]Different data after scraping with python and bs4
I'm trying to get the number of reviews on Amazon.我正在尝试获取亚马逊上的评论数量。 However, when I take the data it is different from that on the site.但是,当我获取数据时,它与网站上的数据不同。 (131 is after scraping and 655 from Amazon) I attach screenshots of the page and the one after the scraping. (131 是在抓取之后,655 来自亚马逊)我附上页面截图和抓取之后的截图。
import bs4
import requests
import time
url3 = "https://www.amazon.it/dp/B076S8NSCD"
headers = {"User-Agent" : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15'}
res = requests.get(url3, headers = headers)
soup = bs4.BeautifulSoup(res.text, "html.parser")
reviews = soup.find(id = "acrCustomerReviewText").get_text()
print(reviews)
If you aren't using premium rotating residential proxies to scrape Amazon reviews there's a good chance this is a cloaking measure and your IP is flagged for sending too many requests.如果您没有使用高级轮换住宅代理来抓取亚马逊评论,那么这很可能是一种伪装措施,您的 IP 被标记为发送过多请求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.