用python读取带有utf-8字符的json文件

Question

I have a large json file with utf-8 encoded characters.我有一个带有 utf-8 编码字符的大型 json 文件。 How can I read this file and convert these characters to more readable version?如何读取此文件并将这些字符转换为更易读的版本？ I have something like this:我有这样的事情：

{
    "name": "Wroc\u00c5\u0082aw"
}

and i want to have this:我想要这个：

{
    "name": "Wrocław"
}

Answer 1

If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8.如果您的 JSON 数据包含这样的mojibake ，您可以通过将字符串转换为 Latin-1，然后将结果解码为 UTF-8 来将其转换为正确的 Unicode。 This reverses whichever process produced the mojibake.这会反转产生 mojibake 的任何过程。 (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.) （字符串来自 JSON 的事实无关紧要；这适用于任何这种类型的 mojibake 字符串。）

>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'

In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).在一般情况下，您必须对产生 mojibake 的原因进行逆向工程，但这种特殊情况很容易识别和排除故障，因为特别是 Latin-1 编码是明显和透明的（每个字节都按照其本身进行编码）。

用python读取带有utf-8字符的json文件

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-26 10:22:44

用python读取带有utf-8字符的json文件

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-26 10:22:44

解决方案1
2 已采纳 2021-04-26 10:22:44