我正在使用 requests_html 從網站上抓取包含產品信息的頁面,我需要的一小部分 HTML 位於<script>
標記中。
這是返回 JavaScript 的代碼:
from requests_html import HTMLSession
link = 'https://www.rimi.lv/e-veikals/en/products/vegan-and-vegetarian-/plant-based-beverages/auzu-dzeriens-barista-kafijai-bezglut-uht-1l/p/957905'
s = HTMLSession()
r = s.get(link)
script_html = r.html.find('div.cart-layout__main', first=True).find('script')[1].html
print(script_html)
有沒有辦法解析它的html部分以返回所有文本? 我的意思是 tabs[0].html 中的那個。
<script>
Config.product_details_page = {
texts: {
tab_loading_title: 'Loading',
tab_loading_text: 'Loading data',
},
tabs: [
{
index: 0,
identifier: 'details',
name: "About the product",
icon: '<svg class="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 48 48"><g fill="none" stroke="currentColor" stroke-width="2" stroke-miterlimit="10"><circle cx="24" cy="24" r="23"/><path d="M24 30v-1.6c0-2.1 1.1-4.1 3-5.2 2.9-1.7 3.9-5.3 2.2-8.2-1.7-2.9-5.3-3.9-8.2-2.2-1.8 1.1-3 3-3 5.2"/><circle cx="24" cy="35" r="2"/></g></svg>',
html: "<div class=\u0022product__details\u0022>\n <div class=\u0022container\u0022>\n <div class=\u0022product-details\u0022>\n <div class=\u0022product__list-wrapper\u0022>\n <ul class=\u0022list\u0022>\n <li class=\u0022item\u0022>\n <span>Country of origin<\/span>\n <p>Finland<\/p>\n <\/li>\n <li class=\u0022item\u0022>\n <span>Brand<\/span>\n <p>Valio<\/p>\n <\/li>\n <li class=\u0022item\u0022>\n <span>Producer<\/span>\n <p>VALIO OY<\/p>\n <\/li>\n <li class=\u0022item\u0022>\n <span>Amount<\/span>\n <p>1 kg<\/p>\n <\/li>\n <\/ul>\n<\/div>\n <div class=\u0022product__list-wrapper\u0022>\n <p class=\u0022heading\u0022>Ingredients<\/p>\n <ul class=\u0022list\u0022>\n <li class=\u0022item\u0022>\n <p>AUZU b\u0101ze ( \u016bdens, bezglut\u0113na AUZU milti, kalcijs, s\u0101ls ), \u016bdens, rap\u0161a e\u013c\u013ca, sk\u0101buma regul\u0113t\u0101ji ( k\u0101lija fosf\u0101ti ), jods, vitam\u012bni ( riboflav\u012bns ( B2 ), B12 un D2 ) \n\n<\/p>\n <\/li>\n <\/ul>\n<\/div>\n <div class=\u0022product__list-wrapper -simple\u0022>\n <p class=\u0022heading\u0022>Additional information<\/p>\n <ul class=\u0022list\u0022>\n <li class=\u0022item\u0022>\n <p>Auzu saturs 10%<\/p>\n <\/li>\n <li class=\u0022item\u0022>\n <p>Min storage temp.: 2\u00b0 C<\/p>\n <\/li>\n <li class=\u0022item\u0022>\n <p>Max storage temp.: 25\u00b0 C<\/p>\n <\/li>\n <\/ul>\n<\/div>\n <div class=\u0022product__list-wrapper\u0022>\n <p class=\u0022heading\u0022>Nutrition Facts<\/p>\n <ul class=\u0022list\u0022>\n <li class=\u0022item\u0022>\n <p>Amount per 100g<\/p>\n <\/li>\n <\/ul>\n<\/div>\n <div class=\u0022product__table\u0022>\n <div>\n <table>\n <thead>\n <tr>\n <th>Nutrition<\/th>\n <th>Amount per 100g\/ml<\/th>\n <\/tr>\n <\/thead>\n <tbody>\n <tr>\n <td >\n energy\n <\/td>\n <td>\n 243 kJ\/ 58 kcal\n <\/td>\n <\/tr>\n <tr>\n <td >\n fat\n <\/td>\n <td>\n 3 g\n <\/td>\n <\/tr>\n <tr>\n <td class=\u0022indent\u0022>\n of which saturates\n <\/td>\n <td>\n 0.3 g\n <\/td>\n <\/tr>\n <tr>\n <td >\n carbohydrate\n <\/td>\n <td>\n 6.6 g\n <\/td>\n <\/tr>\n <tr>\n <td class=\u0022indent\u0022>\n of which sugars\n <\/td>\n <td>\n 3.5 g\n <\/td>\n <\/tr>\n <tr>\n <td >\n protein\n <\/td>\n <td>\n 1.2 g\n <\/td>\n <\/tr>\n <tr>\n <td >\n salt\n <\/td>\n <td>\n 0.1 g\n <\/td>\n <\/tr>\n <\/tbody>\n <\/table>\n <\/div>\n<\/div> <div class=\u0022product__list-wrapper\u0022>\n <p class=\u0022heading\u0022>Allergens<\/p>\n <ul class=\u0022list\u0022>\n <li class=\u0022item\u0022>\n <p>Cereals<\/p>\n <\/li>\n <\/ul>\n<\/div>\n <p class=\u0022product__disclaimer\u0022>While every care has been taken to ensure product information is correct, food products are constantly being reformulated, so ingredients, nutrition content, dietary and allergens may change. You should always read the product label and not rely solely on the information provided on the website. Base price and offer may be different in other Rimi stores.<\/p><\/div>\n\n <div class=\u0022product__card\u0022>\n <div data-product-code=\u0022957905\u0022\n class=\u0022js-product-container card\n -horizontal-for-mobile\u0022\n data-gtms-banner-title=\u0022Auzu dz\u0113riens Barista kafijai bezglut. UHT 1l\u0022\n data-gtms-click-name=\u0022Auzu dz\u0113riens Barista kafijai bezglut. UHT 1l\u0022\n data-gtms-product-id=\u0022957905\u0022\n data-gtm-eec-product='{\u0022id\u0022:\u0022957905\u0022,\u0022name\u0022:\u0022Auzu dz\\u0113riens Barista kafijai bezglut. UHT 1l\u0022,\u0022category\u0022:\u0022SH-11-10-2\\\/SH-16\\\/SH\u0022,\u0022brand\u0022:\u0022Valio\u0022,\u0022price\u0022:2.69,\u0022currency\u0022:\u0022EUR\u0022}'\n >\n <a class=\u0022card__url js-gtm-eec-product-click\u0022 href=\u0022\/e-veikals\/en\/products\/vegan-and-vegetarian-\/plant-based-beverages\/auzu-dzeriens-barista-kafijai-bezglut-uht-1l\/p\/957905\u0022\n aria-label=\u0022Go to product page\u0022><\/a>\n <div class=\u0022card__image-wrapper\u0022>\n <div>\n <img src=\u0022https:\/\/rimibaltic-res.cloudinary.com\/image\/upload\/b_white,c_fit,f_auto,h_480,q_auto,w_480\/d_ecommerce:backend-fallback.png\/MAT_957905_PCE_LV\u0022 alt=\u0022Auzu dz\u0113riens Barista kafijai bezglut. UHT 1l\u0022>\n <span class=\u0022type-badge\u0022>\n <img src=\u0022https:\/\/rimibaltic-web-res.cloudinary.com\/image\/upload\/f_png,h_32,q_auto\/v1\/ecom-cms\/b821da9405a9fe157949ca40850238c81d90542f\u0022 title=\u0022Suitable for Vegans\u0022 >\n <img src=\u0022https:\/\/rimibaltic-web-res.cloudinary.com\/image\/upload\/f_png,h_32,q_auto\/v1\/ecom-cms\/91c5d4f7982c687e299aaf2e8c985d63f66631dd\u0022 title=\u0022Gluten Free\u0022 >\n <img src=\u0022https:\/\/rimibaltic-web-res.cloudinary.com\/image\/upload\/f_png,h_32,q_auto\/v1\/ecom-cms\/2e1c205f284be9cb954d044ffcfc33afe873ea08\u0022 title=\u0022Lactose Free\u0022 >\n <img src=\u0022https:\/\/rimibaltic-web-res.cloudinary.com\/image\/upload\/f_png,h_32,q_auto\/v1\/ecom-cms\/e94c4a7ccc9aabb3b6ce9382a536f514acf72616\u0022 title=\u0022Dairy Free\u0022 >\n <\/span> <\/div>\n <\/div>\n <div class=\u0022card__details\u0022>\n <p class=\u0022card__name\u0022>Auzu dz\u0113riens Barista kafijai bezglut. UHT 1l<\/p>\n <div class=\u0022card__details-inner\u0022>\n\n <div class=\u0022card__price-wrapper\u0022>\n \n <div class=\u0022price-tag card__price\u0022>\n <span>2<\/span>\n <div>\n <sup>69<\/sup>\n <sub>\u20ac\/pcs.<\/sub>\n <\/div>\n<\/div>\n <div>\n\n \n <p class=\u0022card__price-per\u0022>\n 2,69\n \u20ac\n \/kg\n <\/p>\n \n <\/div>\n <\/div>\n\n\n <form class=\u0022favorite card__favorite js-login-prompt\u0022\n action=\u0022\/e-veikals\/account\/login\/prompt\u0022>\n <input type=\u0022hidden\u0022 name=\u0022_token\u0022 value=\u002267RNG9eJsKaHhthRxGbeoL97AiwFKSkcCd6RUaoR\u0022> <input type=\u0022checkbox\u0022 name=\u0022favorite\u0022 value=\u0022\u0022 >\n <button class=\u0022js-tooltip\u0022 type=\u0022submit\u0022\n aria-label=\u0022Add to favorites\u0022\n data-title=\u0022Add to favorites\u0022\n data-add-name=\u0022Add to favorites\u0022\n data-remove-name=\u0022Add to favorites\u0022\n data-gtm-click-name=\u0022Add to favorites\u0022>\n <span><svg class=\u0022\u0022 xmlns=\u0022http:\/\/www.w3.org\/2000\/svg\u0022 viewBox=\u00220 0 48 48\u0022><path d=\u0022M24 4l5.05 16L45 19.98l-12.83 8.79L36.98 44 24 34.71 11.02 44l4.81-15.23L3 19.98l15.95.02L24 4z\u0022 fill=\u0022none\u0022 stroke=\u0022currentColor\u0022 stroke-miterlimit=\u002210\u0022 stroke-width=\u00222\u0022\/><\/svg><\/span>\n <\/button>\n<\/form>\n\n \n \n <form method=\u0022post\u0022 action=\u0022\/e-veikals\/cart\/change\u0022\n class=\u0022js-add-to-cart card__cart-btn\u0022>\n <input type=\u0022hidden\u0022 name=\u0022_token\u0022 value=\u002267RNG9eJsKaHhthRxGbeoL97AiwFKSkcCd6RUaoR\u0022> <input type=\u0022hidden\u0022 name=\u0022_method\u0022 value=\u0022put\u0022> <input type=\u0022hidden\u0022 name=\u0022product\u0022 value=\u0022957905\u0022>\n <input type=\u0022hidden\u0022 name=\u0022amount\u0022 value=\u00221\u0022>\n <button class=\u0022button -with-right-icon -cart gtm -small\u0022\n type=\u0022submit\u0022\n data-gtm-product-id=\u0022957905\u0022\n data-gtm-event-category=\u0022addToBasket\u0022\n >\n Add to cart\n <svg class=\u0022\u0022 xmlns=\u0022http:\/\/www.w3.org\/2000\/svg\u0022 viewBox=\u00220 0 48 48\u0022><g fill=\u0022none\u0022 stroke=\u0022currentColor\u0022 stroke-miterlimit=\u002210\u0022 stroke-width=\u00222\u0022><path d=\u0022M44 36H19.2c-3.9 0-7.2-2.8-7.9-6.6L6.5 1H0\u0022\/><path d=\u0022M8 9h39l-2.4 11.6c-.9 4.4-4.7 7.6-9.1 7.9l-24 1.5\u0022\/><circle cx=\u002215.5\u0022 cy=\u002243.5\u0022 r=\u00223.5\u0022\/><circle cx=\u002239.5\u0022 cy=\u002243.5\u0022 r=\u00223.5\u0022\/><\/g><\/svg> <\/button>\n<\/form>\n\n <form class=\u0022counter js-counter\u0022\n method=\u0022post\u0022\n action=\u0022\/e-veikals\/cart\/change\u0022\n>\n <input type=\u0022hidden\u0022 name=\u0022_method\u0022 value=\u0022put\u0022> <input type=\u0022hidden\u0022 name=\u0022_token\u0022 value=\u002267RNG9eJsKaHhthRxGbeoL97AiwFKSkcCd6RUaoR\u0022> <input type=\u0022hidden\u0022 name=\u0022amount\u0022\n value=\u00221\u0022\n min=\u00221\u0022\n max=\u002210\u0022\n data-unit=\u0022Piece\u0022\n >\n <input type=\u0022hidden\u0022 name=\u0022step\u0022 value=\u00221\u0022>\n <input type=\u0022hidden\u0022 name=\u0022product\u0022 value=\u0022957905\u0022>\n <button name=\u0022decrease\u0022\n class=\u0022counter__subtract js-subtract\u0022\n type=\u0022submit\u0022\n aria-label=\u0022Decrease\u0022\n data-gtm-ignore>\n <svg class=\u0022\u0022 xmlns=\u0022http:\/\/www.w3.org\/2000\/svg\u0022 viewBox=\u00220 0 48 48\u0022><path d=\u0022M8 24h32\u0022 fill=\u0022none\u0022 stroke=\u0022currentColor\u0022 stroke-width=\u00222\u0022 stroke-miterlimit=\u002210\u0022\/><\/svg> <\/button>\n <span class=\u0022counter__number\u0022>\n 1 <\/span>\n <button name=\u0022increase\u0022\n class=\u0022counter__add js-add\u0022\n type=\u0022submit\u0022\n aria-label=\u0022Increase\u0022\n data-gtm-ignore\n >\n <svg class=\u0022\u0022 xmlns=\u0022http:\/\/www.w3.org\/2000\/svg\u0022 viewBox=\u00220 0 48 48\u0022><path d=\u0022M6 24h36M24 42V5.9\u0022 fill=\u0022none\u0022 stroke=\u0022currentColor\u0022 stroke-width=\u00222\u0022 stroke-miterlimit=\u002210\u0022\/><\/svg> <\/button>\n\n<\/form>\n\n <form class=\u0022js-delete-from-cart delete-form\u0022 method=\u0022post\u0022 action=\u0022\/e-veikals\/cart\/change\u0022>\n <input type=\u0022hidden\u0022 name=\u0022_method\u0022 value=\u0022put\u0022> <input type=\u0022hidden\u0022 name=\u0022_token\u0022 value=\u002267RNG9eJsKaHhthRxGbeoL97AiwFKSkcCd6RUaoR\u0022> <input type=\u0022hidden\u0022 value=\u0022957905\u0022 name=\u0022product\u0022>\n <button class=\u0022cart-card__delete js-delete js-remove-from-cart \u0022\n aria-label=\u0022Remove\u0022>\n <svg class=\u0022\u0022 xmlns=\u0022http:\/\/www.w3.org\/2000\/svg\u0022 viewBox=\u00220 0 48 48\u0022><path d=\u0022M10 10l28 28m-28 0l28-28\u0022 fill=\u0022none\u0022 stroke=\u0022currentColor\u0022 stroke-width=\u00222\u0022 stroke-miterlimit=\u002210\u0022\/><\/svg> <\/button>\n<\/form>\n \n\n <\/div>\n\n <p class=\u0022card__error\u0022>\n Maximum amount is reached\n <\/p>\n\n <\/div>\n<\/div>\n <\/div>\n <\/div>\n<\/div>\n",
},
{
index: 1,
identifier: 'recommendations',
name: "Others have also bought",
api_url: "/e-veikals/en/products/957905/recommendations",
icon: '<svg class="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 48 48"><path fill="none" stroke="currentColor" stroke-miterlimit="10" stroke-width="2" d="M8 1h32v40c0 3.3-2.7 6-6 6H14c-3.3 0-6-2.7-6-6V1zm0 26h32m-5-3v-6m0 18v-6"/></svg>',
html: null,
},
]
};
Config.product_details_page.tabs.push({
index: 2,
identifier: 'recipes',
name: "Recipes",
api_url: "/e-veikals/en/products/957905/recipes",
icon: '<svg class="" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 48 48"><path fill="none" stroke="currentColor" stroke-miterlimit="10" stroke-width="2" d="M38 47c-1.7 0-3-1.3-3-3V25.5l-1.7-1.7c-1.5-1.5-2.3-3.5-2.3-5.6V11c0-5.5 4.5-10 10-10v43c0 1.7-1.3 3-3 3zM24 1l1 13.1c0 1.9-1.2 3.7-2.4 5.1L19 23v21c0 1.7-1.3 3-3 3s-3-1.3-3-3V23l-3.6-3.8C8 17.8 7.2 16 7 14L8 1m5 0v14m6-14v14"/></svg>',
html: null,
});
</script>
我嘗試將其加載為文本(text[30:-2] 僅獲取 JavaScript 對象),然后通過 demjson.decode() 加載它,但似乎必須以特定方式加載該字符串(作為字面),我不知道該怎么做。
謝謝!