简体   繁体   中英

How to parse span with beautiful soup if in final html it's hidden by javascript?

The goal is to get clothe's rating (stars) via beautiful soup.

For more clear detail this is part of python code, and in the past it worked:

    url = f"https://www.wildberries.ru/catalog/18645227/detail.aspx?targetUrl=IN"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser') 
    print(soup)
    rating = soup.find('span', {'data-link': 'text{: product^star}'})

in inspector google chrome can see html:

<span data-link="text{: product^star}">5</span>

but if to see it via print (or via view-source in chrome):

print(soup)

we'll see nothing like this:

 <span data-link="text{: product^star}">5</span>

in that place of html (via print(soup)) where must be body in html i can see just something like react stuff:

    <div id="mainContainer" class="main__container">
    
    <div id="app">
    </div>

    <button class="btn-quick-nav j-quicknav" type="button">to the 
    top</button>

    </div>

and huge bunch of javascript stuff in footer, so i can't pull that span

concrete url for example:

https://www.wildberries.ru/catalog/18645227/detail.aspx?targetUrl=IN

concrete to parse

<span data-link="text{: product^star}">4</span>

is it new tecnology comufliaging code protecting from parsing)? is it any way to get "old-school html")?

The short answer is you can't parse and/or get that data with bs4 .

As you've noticed, all of the product's data is generated dynamically, which means you need to have a way of running JavaScript , which bs4 doesn't do.

If you want to get the old school HTML , use automated tools like selenium with, for example, Chrome driver .

However, you can get the data without selenium , if you know the product's id.

Here's an example (the product id is the last value in the url nm=51728993 ):

import requests

url = "https://wbxcatalog-ru.wildberries.ru/nm-2-card/catalog?spp=0&pricemarginCoeff=1.0&reg=0&appType=1&emp=0&locale=ru&lang=ru&curr=rub&nm=51728993"

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
}
data = requests.get(url, headers=headers).json()["data"]["products"][0]
print(f"{data['name']}\n{data['rating']} stars from {data['feedbacks']} reviews.")

Outputs:

Смартфон Poco M4 Pro / 6.6'' / 1080x2400 / IPS / 8 ГБ / 128 ГБ / 5000 мА*ч
5 stars from 414 reviews

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM