简体   繁体   English

如何使用python将html文件转换为json

[英]how to convert html file to json using python

i want to fetch html file from some location and convert it to Json format using python.我想从某个位置获取 html 文件并使用 python 将其转换为 Json 格式。

for below code im getting output just a text.对于下面的代码,我得到的只是一个文本输出。

from bs4 import BeautifulSoup
import json
html = '<p>Hello</p><p>world</p>'
soup = BeautifulSoup(html, 'html.parser')
things = soup.find_all(text=True)
print(things)
 jsonD = json.dumps(htmlContent.text) converts the raw HTML content into a JSON 
 string representation. jsonL = json.loads(jsonD) parses the JSON string back into a 
 regular string/unicode object. This results in a no-op, as any escaping done by 
 dumps() is reverted by loads(). jsonL contains the same data as htmlContent.text.

 Try to use json.dumps to generate your final JSON instead of building the JSON by 
 hand:

 ContentUrl = json.dumps({
'url': str(urls),
'uid': str(uniqueID),
'page_content': htmlContent.text,
'date': finalDate
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM