简体   繁体   English

用python读取带有utf-8字符的json文件

[英]Reading json files with utf-8 characters with python

I have a large json file with utf-8 encoded characters.我有一个带有 utf-8 编码字符的大型 json 文件。 How can I read this file and convert these characters to more readable version?如何读取此文件并将这些字符转换为更易读的版本? I have something like this:我有这样的事情:

{
    "name": "Wroc\u00c5\u0082aw"
}

and i want to have this:我想要这个:

{
    "name": "Wrocław"
}

If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8.如果您的 JSON 数据包含这样的mojibake ,您可以通过将字符串转换为 Latin-1,然后将结果解码为 UTF-8 来将其转换为正确的 Unicode。 This reverses whichever process produced the mojibake.这会反转产生 mojibake 的任何过程。 (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.) (字符串来自 JSON 的事实无关紧要;这适用于任何这种类型的 mojibake 字符串。)

>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'

In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).在一般情况下,您必须对产生 mojibake 的原因进行逆向工程,但这种特殊情况很容易识别和排除故障,因为特别是 Latin-1 编码是明显和透明的(每个字节都按照其本身进行编码)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM