将Unicode字符串转换为UTF-8，然后转换为JSON

Question

I want to encode a string in UTF-8 and view the corresponding UTF-8 bytes individually. 我想用UTF-8编码一个字符串，并分别查看相应的UTF-8字节。 In the Python REPL the following seems to work fine: 在Python REPL中，以下似乎工作正常：

>>> unicode('©', 'utf-8').encode('utf-8')
'\xc2\xa9'

Note that I'm using U+00A9 COPYRIGHT SIGN as an example here. 请注意，我在这里使用U + 00A9 COPYRIGHT SIGN作为示例。 The '\\xC2\\xA9' looks close to what I want — a string consisting of two separate code points: U+00C2 and U+00A9. '\\xC2\\xA9'看起来接近我想要的 - 一个由两个独立代码点组成的字符串：U + 00C2和U + 00A9。 (When UTF-8-decoded, it gives back the original string, '\\xA9' .) （当UTF-8解码时，它会返回原始字符串'\\xA9' 。）

Then, I want the UTF-8-encoded string to be converted to a JSON-compatible string. 然后，我希望将UTF-8编码的字符串转换为JSON兼容的字符串。 However, the following doesn't seem to do what I want: 但是，以下似乎没有做我想要的：

>>> import json; json.dumps('\xc2\xa9')
'"\\u00a9"'

Note that it generates a string containing U+00A9 (the original symbol). 请注意，它会生成一个包含U + 00A9（原始符号）的字符串。 Instead, I need the UTF-8-encoded string, which would look like "\Â\©" in valid JSON. 相反，我需要UTF-8编码的字符串，它在有效的JSON中看起来像"\Â\©" 。

TL;DR How can I turn '©' into "\Â\©" in Python? TL; DR如何在Python "\Â\©" '©'变成"\Â\©" ？ I feel like I'm missing something obvious — is there no built-in way to do this? 我觉得我错过了一些明显的东西 - 有没有内置的方法来做到这一点？

Answer 1

If you really want "\Â\©" as the output, give json a Unicode string as input. 如果你真的想要"\Â\©"作为输出，请给json一个Unicode字符串作为输入。

>>> print json.dumps(u'\xc2\xa9')
"\u00c2\u00a9"

You can generate this Unicode string from the raw bytes: 您可以从原始字节生成此Unicode字符串：

s = unicode('©', 'utf-8').encode('utf-8')
s2 = u''.join(unichr(ord(c)) for c in s)

I think what you really want is "\\xc2\\xa9" as the output, but I'm not sure how to generate that yet. 我认为你真正想要的是"\\xc2\\xa9"作为输出，但我不知道如何生成它。

将Unicode字符串转换为UTF-8，然后转换为JSON

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-06-19 19:43:01

将Unicode字符串转换为UTF-8，然后转换为JSON

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-06-19 19:43:01

解决方案1
2 已采纳 2013-06-19 19:43:01