簡體   English   中英

如何從 BeautifulSoup Object 中提取 JSON?

[英]How to extract JSON from a BeautifulSoup Object?

我已經為使用 python-requests 的網頁下載了 HTML。 我現在需要從這個內容中提取一個 JSON object 。 我用一些 BS4 方法找到了 JSON object。 但是,我不知道如何從 BS4 object 中提取它。 這是我的代碼

from bs4 import BeautifulSoup
import requests
import json

url = "https://matmatch.com/materials?materialPath=mitf1194-astm-b196-grade-c17200-tb00"

html_content = requests.get(url).text
soup = BeautifulSoup(html_content,features="html.parser")
body = soup.find('body')
the_contents_of_body_without_body_tags = body.findChildren(recursive=False)
#print(the_contents_of_body_without_body_tags)


element = soup.find_all("script",type="application/ld+json")
print(element[2])
#print(type(soup.find_all("script", {"type":"application/ld+json"})[2]))
js = json.loads(element[2])

這是此代碼的 output:

<script type="application/ld+json">{
      "@context": ["https://schema.org", {"csvw": "http://www.w3.org/ns/csvw#"}],
      "@type": "Dataset",
      "name":"ASTM B196 Grade C17200 TB00",
      "description": "Chemical composition and material properties of ASTM B196 Grade C17200 TB00. Also available for download in XLSX and PDF. Data provided by MakeItFrom.com,Matmatch,Materion Brush GmbH",
      "license": "https://matmatch.com/imprint",
      "publisher": {
        "@type": "Organization",
        "name": "Matmatch"
      },
      "mainEntity" : {
        "@type" : "csvw:Table",
        "csvw:tableSchema": {
          "csvw:columns": [
            {
              "csvw:name": "Property Name",
              "csvw:datatype": "string",
              "csvw:cells": [{"csvw:value":"Density","csvw:primaryKey":"Density"},{"csvw:value":"Outside diameter","csvw:primaryKey":"Outside diameter"},{"csvw:value":"Thickness","csvw:primaryKey":"Thickness"},{"csvw:value":"Width","csvw:primaryKey":"Width"},{"csvw:value":"Bendability 90°, bw","csvw:primaryKey":"Bendability 90°, bw"},{"csvw:value":"Bendability 90°, gw","csvw:primaryKey":"Bendability 90°, gw"},{"csvw:value":"Elastic modulus","csvw:primaryKey":"Elastic modulus"},{"csvw:value":"Elongation","csvw:primaryKey":"Elongation"},{"csvw:value":"Hardness, Rockwell C","csvw:primaryKey":"Hardness, Rockwell C"},{"csvw:value":"Hardness, Vickers","csvw:primaryKey":"Hardness, Vickers"},{"csvw:value":"Shear modulus","csvw:primaryKey":"Shear modulus"},{"csvw:value":"Tensile strength","csvw:primaryKey":"Tensile strength"},{"csvw:value":"Yield strength","csvw:primaryKey":"Yield strength"},{"csvw:value":"Yield strength Rp0.2","csvw:primaryKey":"Yield strength Rp0.2"},{"csvw:value":"Coefficient of thermal expansion","csvw:primaryKey":"Coefficient of thermal expansion"},{"csvw:value":"Melting point","csvw:primaryKey":"Melting point"},{"csvw:value":"Specific heat capacity","csvw:primaryKey":"Specific heat capacity"},{"csvw:value":"Thermal conductivity","csvw:primaryKey":"Thermal conductivity"},{"csvw:value":"Electrical resistivity","csvw:primaryKey":"Electrical resistivity"},{"csvw:value":"Specific Electrical conductivity","csvw:primaryKey":"Specific Electrical conductivity"},{"csvw:value":"Relative magnetic permeability","csvw:primaryKey":"Relative magnetic permeability"}]
            },
            {
              "csvw:name": "Value",
              "csvw:datatype": "string",
              "csvw:cells": [{"csvw:value":8.26,"csvw:primaryKey":"Density"},{"csvw:value":19.1,"csvw:primaryKey":"Outside diameter"},{"csvw:value":0.05,"csvw:primaryKey":"Thickness"},{"csvw:value":1.27,"csvw:primaryKey":"Width"},{"csvw:value":0,"csvw:primaryKey":"Bendability 90°, bw"},{"csvw:value":0,"csvw:primaryKey":"Bendability 90°, gw"},{"csvw:value":130,"csvw:primaryKey":"Elastic modulus"},{"csvw:value":1,"csvw:primaryKey":"Elongation"},{"csvw:value":36,"csvw:primaryKey":"Hardness, Rockwell C"},{"csvw:value":210,"csvw:primaryKey":"Hardness, Vickers"},{"csvw:value":50,"csvw:primaryKey":"Shear modulus"},{"csvw:value":410,"csvw:primaryKey":"Tensile strength"},{"csvw:value":220,"csvw:primaryKey":"Yield strength"},{"csvw:value":130,"csvw:primaryKey":"Yield strength Rp0.2"},{"csvw:value":0.0000175,"csvw:primaryKey":"Coefficient of thermal expansion"},{"csvw:value":870,"csvw:primaryKey":"Melting point"},{"csvw:value":360,"csvw:primaryKey":"Specific heat capacity"},{"csvw:value":84,"csvw:primaryKey":"Thermal conductivity"},{"csvw:value":6.2e-8,"csvw:primaryKey":"Electrical resistivity"},{"csvw:value":17,"csvw:primaryKey":"Specific Electrical conductivity"},{"csvw:value":1.0006,"csvw:primaryKey":"Relative magnetic permeability"}]
            },
            {
              "csvw:name": "Unit",
              "csvw:datatype": "string",
              "csvw:cells": [{"csvw:value":"g/cm³","csvw:primaryKey":"Density"},{"csvw:value":"mm","csvw:primaryKey":"Outside diameter"},{"csvw:value":"mm","csvw:primaryKey":"Thickness"},{"csvw:value":"mm","csvw:primaryKey":"Width"},{"csvw:value":"[-]","csvw:primaryKey":"Bendability 90°, bw"},{"csvw:value":"[-]","csvw:primaryKey":"Bendability 90°, gw"},{"csvw:value":"GPa","csvw:primaryKey":"Elastic modulus"},{"csvw:value":"%","csvw:primaryKey":"Elongation"},{"csvw:value":"[-]","csvw:primaryKey":"Hardness, Rockwell C"},{"csvw:value":"[-]","csvw:primaryKey":"Hardness, Vickers"},{"csvw:value":"GPa","csvw:primaryKey":"Shear modulus"},{"csvw:value":"MPa","csvw:primaryKey":"Tensile strength"},{"csvw:value":"MPa","csvw:primaryKey":"Yield strength"},{"csvw:value":"MPa","csvw:primaryKey":"Yield strength Rp0.2"},{"csvw:value":"1/K","csvw:primaryKey":"Coefficient of thermal expansion"},{"csvw:value":"°C","csvw:primaryKey":"Melting point"},{"csvw:value":"J/(kg·K)","csvw:primaryKey":"Specific heat capacity"},{"csvw:value":"W/(m·K)","csvw:primaryKey":"Thermal conductivity"},{"csvw:value":"Ω·m","csvw:primaryKey":"Electrical resistivity"},{"csvw:value":" % IACS","csvw:primaryKey":"Specific Electrical conductivity"},{"csvw:value":"[-]","csvw:primaryKey":"Relative magnetic permeability"}]
            }]
        }
      }
    }</script>

代碼的最后一行返回此錯誤:

TypeError: the JSON object must be str, bytes or bytearray, not 'Tag'

我曾嘗試在 BS4 object 上使用.text.content方法,但這也會導致錯誤。

如何從此 output 中提取 JSON object?

調用.string方法:

如果標簽只有一個孩子,並且該孩子是NavigableString ,則該孩子可作為.string


在您的示例中:

from bs4 import BeautifulSoup
import requests
import json

url = "https://matmatch.com/materials?materialPath=mitf1194-astm-b196-grade-c17200-tb00"

html_content = requests.get(url).text
soup = BeautifulSoup(html_content,features="html.parser")
body = soup.find('body')
the_contents_of_body_without_body_tags = body.findChildren(recursive=False)

element = soup.find_all("script",type="application/ld+json")

js = json.loads(element[2].string) # <- Calling `.string` to get the JSON
print(js)

示例 output(截斷):

 {'@context': ['https://schema.org', {'csvw': 'http://www.w3.org/ns/csvw#'}], '@type': 'Dataset', 'name': 'ASTM B196 Grade C17200 TB00', ...., {'csvw:value': '[-]', 'csvw:primaryKey': 'Relative magnetic permeability'}]}]}}}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM