简体   繁体   English

使用 beautifulsoup 抓取网站后如何访问属性

[英]how to access the attributes after scraping down a website using beautifulsoup

import requests,json
from bs4 import BeautifulSoup
from flask import Flask
from flask import request, jsonify
import os
from selenium import webdriver

def checkPriceMyntra(URL):
    headers = {'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'}
    a = requests.Session()
    res = a.get(URL, headers=headers, verify=False)
    soup = BeautifulSoup(res.text,features="html.parser")
    script = None
    #d =  soup.find_all("script")
    for s in soup.find_all("script"):
        print(s)

checkPriceMyntra("https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy")

here is a part of the soup.find_all("script"):这是 soup.find_all("script") 的一部分:

    <script type="application/ld+json">
                {
                        "@context" : "https://schema.org",
                    "@type" : "Product",
                    "name" : "TAG 7 Women Pack Of 2 Solid Ankle-Length Straight-Fit Leggings",
                    "image" : "https://assets.myntassets.com/h_1440,q_100,w_1080/v1/assets/images/productimage/2020/8/22/be9d5664-5467-475b-b4ea-470a5d64a5481598047122543-1.jpg",
                                "sku" : "12335860",
                                "mpn" : "12335860",
                                "description" : "TAG 7 Women Pack Of 2 Solid Ankle-Length Straight-Fit Leggings",
                                "offers": {
                    "@type": "Offer",
                                        "priceCurrency": "INR",
                                        "availability": "InStock",
                                        "price" : "899",
                                        "url": "https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy"
                },
                    "brand" : {
                        "@type" : "Thing",
                        "name" : "TAG 7"
                                }


                }
            </script>

i want to access the price of the product it this script, how to do that..??我想通过这个脚本访问产品的价格,该怎么做..?? i have tried using s.get("price"), s.price, s["price"] but nothing is working我尝试使用 s.get("price"), s.price, s["price"] 但没有任何效果

The value of the price key ( "price": "899" ) is under the second script tag, so try using a CSS Selector script:nth-of-type(2) to select the second script tag, and convert it to a dict using the json module. price键的值 ( "price": "899" ) 在第二个script标签下,所以尝试使用 CSS 选择器script:nth-of-type(2)到 select 第二个script标签,并将其转换为dict使用json模块。

import json
import requests
from bs4 import BeautifulSoup


def checkPriceMyntra(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
    }

    soup = BeautifulSoup(
        requests.get(url, headers=headers, verify=False).content, "html.parser"
    )

    json_data = json.loads(soup.select_one("script:nth-of-type(2)").string)
    print(json_data["offers"]["price"])


checkPriceMyntra(
    "https://www.myntra.com/leggings/tag-7/tag-7-women-pack-of-2-solid-ankle-length-straight-fit-leggings/12335860/buy"
)

Output: Output:

899

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM