简体   繁体   English

如果在最终的 html 中它被 javascript 隐藏,如何用漂亮的汤解析 span?

[英]How to parse span with beautiful soup if in final html it's hidden by javascript?

The goal is to get clothe's rating (stars) via beautiful soup.目标是通过漂亮的汤获得衣服的评级(星星)。

For more clear detail this is part of python code, and in the past it worked:为了更清楚的细节,这是 python 代码的一部分,在过去它有效:

    url = f"https://www.wildberries.ru/catalog/18645227/detail.aspx?targetUrl=IN"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser') 
    print(soup)
    rating = soup.find('span', {'data-link': 'text{: product^star}'})

in inspector google chrome can see html:在检查员谷歌浏览器中可以看到 html:

<span data-link="text{: product^star}">5</span>

but if to see it via print (or via view-source in chrome):但如果通过打印(或通过 chrome 中的查看源)查看它:

print(soup)

we'll see nothing like this:我们不会看到这样的:

 <span data-link="text{: product^star}">5</span>

in that place of html (via print(soup)) where must be body in html i can see just something like react stuff:在 html 的那个地方(通过 print(soup))必须是 html 中的主体,我可以看到类似反应的东西:

    <div id="mainContainer" class="main__container">
    
    <div id="app">
    </div>

    <button class="btn-quick-nav j-quicknav" type="button">to the 
    top</button>

    </div>

and huge bunch of javascript stuff in footer, so i can't pull that span以及页脚中的大量 javascript 内容,所以我无法拉开那个跨度

concrete url for example:具体网址例如:

https://www.wildberries.ru/catalog/18645227/detail.aspx?targetUrl=IN

concrete to parse具体解析

<span data-link="text{: product^star}">4</span>

is it new tecnology comufliaging code protecting from parsing)?是新的技术 comufliaging 代码保护免受解析)? is it any way to get "old-school html")?有没有办法获得“老派html”)?

The short answer is you can't parse and/or get that data with bs4 .简短的回答是您无法使用bs4解析和/或获取该数据。

As you've noticed, all of the product's data is generated dynamically, which means you need to have a way of running JavaScript , which bs4 doesn't do.正如您所注意到的,所有产品的数据都是动态生成的,这意味着您需要有一种运行JavaScript的方法,而bs4没有。

If you want to get the old school HTML , use automated tools like selenium with, for example, Chrome driver .如果您想获得老式的 HTML ,请使用selenium之类的自动化工具,例如Chrome 驱动程序

However, you can get the data without selenium , if you know the product's id.但是,如果您知道产品的 ID,则无需selenium即可获取数据。

Here's an example (the product id is the last value in the url nm=51728993 ):这是一个示例(产品 id 是 url nm=51728993中的最后一个值):

import requests

url = "https://wbxcatalog-ru.wildberries.ru/nm-2-card/catalog?spp=0&pricemarginCoeff=1.0&reg=0&appType=1&emp=0&locale=ru&lang=ru&curr=rub&nm=51728993"

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
}
data = requests.get(url, headers=headers).json()["data"]["products"][0]
print(f"{data['name']}\n{data['rating']} stars from {data['feedbacks']} reviews.")

Outputs:输出:

Смартфон Poco M4 Pro / 6.6'' / 1080x2400 / IPS / 8 ГБ / 128 ГБ / 5000 мА*ч
5 stars from 414 reviews

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM