[英]how to access the attributes after scraping down a website using beautifulsoup
import requests,json
from bs4 import BeautifulSoup
from flask import Flask
from flask import request, jsonify
import os
from selenium import webdriver
def checkPriceMyntra(URL):
headers = {'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}
a = requests.Session()
res = a.get(URL, headers=headers, verify=False)
soup = BeautifulSoup(res.text,features="html.parser")
script = None
#d = soup.find_all("script")
for s in soup.find_all("script"):
print(s)
checkPriceMyntra("https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy")
here is a part of the soup.find_all("script"):这是 soup.find_all("script") 的一部分:
<script type="application/ld+json">
{
"@context" : "https://schema.org",
"@type" : "Product",
"name" : "TAG 7 Women Pack Of 2 Solid Ankle-Length Straight-Fit Leggings",
"image" : "https://assets.myntassets.com/h_1440,q_100,w_1080/v1/assets/images/productimage/2020/8/22/be9d5664-5467-475b-b4ea-470a5d64a5481598047122543-1.jpg",
"sku" : "12335860",
"mpn" : "12335860",
"description" : "TAG 7 Women Pack Of 2 Solid Ankle-Length Straight-Fit Leggings",
"offers": {
"@type": "Offer",
"priceCurrency": "INR",
"availability": "InStock",
"price" : "899",
"url": "https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy"
},
"brand" : {
"@type" : "Thing",
"name" : "TAG 7"
}
}
</script>
i want to access the price of the product it this script, how to do that..??我想通过这个脚本访问产品的价格,该怎么做..?? i have tried using s.get("price"), s.price, s["price"] but nothing is working我尝试使用 s.get("price"), s.price, s["price"] 但没有任何效果
The value of the price
key ( "price": "899"
) is under the second script
tag, so try using a CSS Selector script:nth-of-type(2)
to select the second script
tag, and convert it to a dict
using the json
module. price
键的值 ( "price": "899"
) 在第二个script
标签下,所以尝试使用 CSS 选择器script:nth-of-type(2)
到 select 第二个script
标签,并将其转换为dict
使用json
模块。
import json
import requests
from bs4 import BeautifulSoup
def checkPriceMyntra(url):
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
}
soup = BeautifulSoup(
requests.get(url, headers=headers, verify=False).content, "html.parser"
)
json_data = json.loads(soup.select_one("script:nth-of-type(2)").string)
print(json_data["offers"]["price"])
checkPriceMyntra(
"https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy"
)
Output: Output:
899
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.