简体繁体 English

通过强制转换为str可逆地转换Python unicode吗？

[英]Is converting Python unicode by casting to str reversible?

原文 2015-02-20 13:43:53 3 1 python/ string/ unicode

The proper way to convert a unicode string u to a (byte)string in Python is by calling u.encode(someencoding) . 在Python u.encode(someencoding) unicode字符串u转换为（字节）字符串的正确方法是调用u.encode(someencoding) 。

Unfortunately, I didn't know that before and I had used str(u) for conversion. 不幸的是，我以前并不知道，我使用过str(u)进行转换。 In particular, I called str(u) to coerce u to be a string so that I can make it a valid shelve key (which must be a str). 特别是，我调用str(u)将u强制为字符串，以便可以使其成为有效的搁置键（必须为str）。

Since I didn't encounter any UnicodeEncodeError , I wonder if this process is reversible/lossless. 由于我没有遇到任何UnicodeEncodeError ，所以我想知道此过程是否可逆/无损。 That is, can I do u = str(converted_unicode) (or u = bytes(converted_unicode) in Python 3) to get the original u ? 也就是说，我是否可以通过u = str(converted_unicode) （或Python 3中的u = bytes(converted_unicode) ）来获取原始u ？

1 个解决方案

In Python 2, if the conversion with str() was successful, then you can reverse the result. 在Python 2中，如果使用str()的转换成功，则可以反转结果。 Using str() on a unicode value is the equivalent of using unicode_value.encode('ascii') and the reverse is to simply use str_value.decode('ascii') . 在unicode值上使用str()等同于使用unicode_value.encode('ascii') ，相反就是简单地使用str_value.decode('ascii') 。 Using unicode(str_value) will use the same implicit ASCII codec to decode. 使用unicode(str_value)将使用相同的隐式ASCII编解码器进行解码。

In Python 3, calling str() on a unicode value simply gives you the same object back, since in Python 3 str() is the Unicode type. 在Python 3中，对unicode值调用str()只会给您返回相同的对象，因为在Python 3中str() 是 Unicode类型。 Using bytes() on a Unicode value without an encoding fails, you always have to use explicit codecs in Python 3 to convert between str and bytes . 在没有编码的情况下对Unicode值使用bytes()失败，您始终必须在Python 3中使用显式编解码器在str和bytes之间进行转换。