简体   繁体   English

json编码为UTF-8字符。 如何在Python请求中作为json处理

[英]json encoded as UTF-8 characters. How do I process as json in Python Requests

I am scraping a website that is rendering a JavaScript/JSON Object that looks like this: 我正在抓取一个呈现如下JavaScript / JSON对象的网站:

{ "company": "\r\n            \x3cdiv class=\"page-heading\"\x3e\x3ch1\x3eSEARCH
 RESULTS 1 - 40 OF 200\x3c/h1\x3e\x3c/div\x3e\r\n\r\n             
\x3cdiv class=\"right-content-list\"\x3e\r\n\r\n                
\x3cdiv class=\"top-buttons-adm-lft\"\x3e\r\n   

I am attempting to process this as a JSON Object (which is what this looks like) using Python's Requests library. 我正在尝试使用Python的Requests库将其作为JSON对象(看起来像这样)进行处理。

I have used the following methods to encode/process the text: 我使用以下方法来编码/处理文本:

unicodedata.normalize("NFKD", get_city_json.text).encode('utf-8', 'ignore')
unicodedata.normalize("NFKD", get_city_json.text).encode('ascii', 'ignore')
unicode(get_city_json.text)

However, even after repeated attempts, the text is rendered with the UTF-8 encoding and its characters. 但是,即使反复尝试,该文本仍使用UTF-8编码及其字符来呈现。 The Content-Type returned by the web app is "text/javascript; charset=utf-8" Web应用程序返回的Content-Type是"text/javascript; charset=utf-8"

I want to be able to process it as a regular JSON/JavaScript Object for parsing and reading. 我希望能够将其作为常规的JSON / JavaScript对象进行解析和读取。

Help would be greatly appreciated! 帮助将不胜感激!

That isn't UTF-8 . 那不是UTF-8 It HTML encoded text. 它是HTML编码的文本。

You can decode it using the following: 您可以使用以下方法对其进行解码:

Python 2 Python 2

import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(json_value)
print unescaped

Python 3 Python 3

import html.parser    
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(json_value)
print unescaped

If you unescape your string with these you should get 如果您用这些字符串解串,您应该得到

<div class="page-heading"><h1>SEARCH RESULTS 1 - 40 OF 200</h1></div>
<div class="right-content-list">
<div class="top-buttons-adm-lft">

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python-将编码的json转换为utf-8 - python - convert encoded json into utf-8 用python读取带有utf-8字符的json文件 - Reading json files with utf-8 characters with python 如何在 python(从 utf-8 编码的文本文件导入)中将组合变音符号 ɔ̃、ɛ̃ 和 ɑ̃ 的字符与非重音字符进行比较? - How do I compare characters with combining diacritic marks ɔ̃, ɛ̃ and ɑ̃ to unaccented ones in python (imported from a utf-8 encoded text file)? Python:将编码的字符串解码为 json 文件中的 utf-8 - Python: decode encoded string to utf-8 inside json file 为什么我使用使用utf-8编码的Windows txt文件在Json中收到ValueError? - Why do I get a ValueError with Json using a windows txt file encoded with utf-8? 如何在Python中将\\ xXY编码的字符转换为UTF-8? - How to convert \xXY encoded characters to UTF-8 in Python? Python请求:以非utf-8编码提交JSON - Python Requests: submit JSON in non utf-8 encoding 如何在 python 2.4 中检查 UTF-8 编码数据(字节) - How do I check for UTF-8 encoded data (bytes) in python 2.4 如何检测文件是否使用UTF-8编码? - How do I detect if a file is encoded using UTF-8? 如何检查是否已成功在utf-8中进行编码 - How do I check whether have encoded in utf-8 successfully
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM