简体   繁体   中英

Get data-product element with beautiful soup

I'm trying to get data from a site with Beautiful Soup. I have this part of the code, and I want to get the JSON part inside data-product element. How can I do this?

This code:

soup_catalog.find('a',class_="product-li")

Returns this:

<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>

Then I tried:

soup_catalog.find('a',class_="product-li").find('data-product')

But the data-product is not being returned. How can I do this?

This should help

from bs4 import BeautifulSoup

s = """<a class="product-li" data-product='{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}' href="https://www.magazineluiza.com.br/pes-2018-para-ps3-konami/p/0431772/ga/gpes/" itemprop="url" title="PES 2018 para PS3">\n<span class="js-wishlist-action wishlist__simple-text">\n<i class="wishlist__favorite-icon js-add-wishlist"></i>\n</span>\n<div class="alignment-image">\n<img alt="PES 2018 para PS3 - Konami" class="product-image" data-original="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" height="210" src="https://d25zlb44gqlazw.cloudfront.net/static/img/default/white1x1-e0a7e4ed.gif" title="PES 2018 para PS3 - Konami" width="210"/>\n</div>\n<noscript>\n<img alt="PES 2018 para PS3 - Konami" height="210" itemprop="image" src="https://c.mlcdn.com.br//pes-2018-para-ps3-konami/v/210x210/043177500.jpg" title="PES 2018 para PS3 - Konami" width="210"/>\n</noscript>\n<span class="product-content-other-informations">\n<span class="rating-container">\n<span class="rateing sprite-stars star-medium" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">\n<em class="sprite-stars" style="width:90.0%"></em>\n<meta content="4.5" itemprop="ratingValue">\n<meta content="78" itemprop="reviewCount">\n</meta></meta></span>\n</span>\n</span>\n<h3 class="productTitle" itemprop="name">PES 2018 para PS3 - Konami</h3>\n<meta content="0431772" itemprop="productID">\n<meta content="None" itemprop="description">\n<p itemscope="" itemtype="http://schema.org/Brand"><meta content="konami" itemprop="name"/></p>\n<span class="productPrice" itemprop="offers" itemscope="" itemtype="http://schema.org/Offer">\n<span class="priceContent color-green none-product-showcase">desconto de R$ 79,10</span>\n<meta content="BRL" itemprop="priceCurrency">\n<meta content="89,90" itemprop="price">\n<span class="originalPrice">de R$ 169,00</span>\n<span class="price">\n                        por R$ 89,90\n                    </span>\n<meta content="InStock" itemprop="availability"/>\n</meta></meta></span>\n</meta></meta></a>"""
soup = BeautifulSoup(s, "html.parser")
i = soup.find("a",class_="product-li")
print(i["data-product"])

Output:

{"product":"0431772", "basketId":"043177500", "type":"product", "category":"ga", "subCategory":"gpes", "webVideoUrl": "None", "brand":"konami", "title_url": "pes-2018-para-ps3-konami", "title": "PES 2018 para PS3", "reference": "Konami", "stockTypes": {"043177500": "F"}, "price": "89.9"}

您可以从标签的属性获取数据,如下所示:

soup_catalog.find('a',class_='product-li').get('data-provider')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM