[英]How to parse span with beautiful soup if in final html it's hidden by javascript?
The goal is to get clothe's rating (stars) via beautiful soup.目标是通过漂亮的汤获得衣服的评级(星星)。
For more clear detail this is part of python code, and in the past it worked:为了更清楚的细节,这是 python 代码的一部分,在过去它有效:
url = f"https://www.wildberries.ru/catalog/18645227/detail.aspx?targetUrl=IN"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
rating = soup.find('span', {'data-link': 'text{: product^star}'})
in inspector google chrome can see html:在检查员谷歌浏览器中可以看到 html:
<span data-link="text{: product^star}">5</span>
but if to see it via print (or via view-source in chrome):但如果通过打印(或通过 chrome 中的查看源)查看它:
print(soup)
we'll see nothing like this:我们不会看到这样的:
<span data-link="text{: product^star}">5</span>
in that place of html (via print(soup)) where must be body in html i can see just something like react stuff:在 html 的那个地方(通过 print(soup))必须是 html 中的主体,我可以看到类似反应的东西:
<div id="mainContainer" class="main__container">
<div id="app">
</div>
<button class="btn-quick-nav j-quicknav" type="button">to the
top</button>
</div>
and huge bunch of javascript stuff in footer, so i can't pull that span以及页脚中的大量 javascript 内容,所以我无法拉开那个跨度
concrete url for example:具体网址例如:
https://www.wildberries.ru/catalog/18645227/detail.aspx?targetUrl=IN
concrete to parse具体解析
<span data-link="text{: product^star}">4</span>
is it new tecnology comufliaging code protecting from parsing)?是新的技术 comufliaging 代码保护免受解析)? is it any way to get "old-school html")?
有没有办法获得“老派html”)?
The short answer is you can't parse and/or get that data with bs4
.简短的回答是您无法使用
bs4
解析和/或获取该数据。
As you've noticed, all of the product's data is generated dynamically, which means you need to have a way of running JavaScript
, which bs4
doesn't do.正如您所注意到的,所有产品的数据都是动态生成的,这意味着您需要有一种运行
JavaScript
的方法,而bs4
没有。
If you want to get the old school HTML , use automated tools like selenium
with, for example, Chrome driver .如果您想获得老式的 HTML ,请使用
selenium
之类的自动化工具,例如Chrome 驱动程序。
However, you can get the data without selenium
, if you know the product's id.但是,如果您知道产品的 ID,则无需
selenium
即可获取数据。
Here's an example (the product id is the last value in the url nm=51728993
):这是一个示例(产品 id 是 url
nm=51728993
中的最后一个值):
import requests
url = "https://wbxcatalog-ru.wildberries.ru/nm-2-card/catalog?spp=0&pricemarginCoeff=1.0®=0&appType=1&emp=0&locale=ru&lang=ru&curr=rub&nm=51728993"
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
}
data = requests.get(url, headers=headers).json()["data"]["products"][0]
print(f"{data['name']}\n{data['rating']} stars from {data['feedbacks']} reviews.")
Outputs:输出:
Смартфон Poco M4 Pro / 6.6'' / 1080x2400 / IPS / 8 ГБ / 128 ГБ / 5000 мА*ч
5 stars from 414 reviews
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.