I want to encode a string in UTF-8 and view the corresponding UTF-8 bytes individually. In the Python REPL the following seems to work fine:
>>> unicode('©', 'utf-8').encode('utf-8')
'\xc2\xa9'
Note that I'm using U+00A9 COPYRIGHT SIGN as an example here. The '\\xC2\\xA9'
looks close to what I want — a string consisting of two separate code points: U+00C2 and U+00A9. (When UTF-8-decoded, it gives back the original string, '\\xA9'
.)
Then, I want the UTF-8-encoded string to be converted to a JSON-compatible string. However, the following doesn't seem to do what I want:
>>> import json; json.dumps('\xc2\xa9')
'"\\u00a9"'
Note that it generates a string containing U+00A9 (the original symbol). Instead, I need the UTF-8-encoded string, which would look like "\Â\©"
in valid JSON.
TL;DR How can I turn '©'
into "\Â\©"
in Python? I feel like I'm missing something obvious — is there no built-in way to do this?
If you really want "\Â\©"
as the output, give json
a Unicode string as input.
>>> print json.dumps(u'\xc2\xa9')
"\u00c2\u00a9"
You can generate this Unicode string from the raw bytes:
s = unicode('©', 'utf-8').encode('utf-8')
s2 = u''.join(unichr(ord(c)) for c in s)
I think what you really want is "\\xc2\\xa9"
as the output, but I'm not sure how to generate that yet.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.