简体   繁体   中英

Distinction between str and unicode: why does Redis return binary data when passed unicode?

After two questions regarding the distinction between the datatypes str and unicode , I'm still puzzled at the following.

In Block 1 we see that the type of the city is unicode , as we're expecting.

Yet in Block 2, after a round-trip through disk (redis), the type of the city is str (and the representation is different).

The dogma of storing utf-8 on disk, reading into unicode , and writing back in utf-8 is failing somewhere.

Why is the second instance of type(city) str rather than unicode ?

Just as importantly, does it matter? Do you care whether your variables are unicode or str , or are you oblivious to the difference just so long as the code "does the right thing"?

# -*- coding: utf-8 -*-

# Block 1
city = u'Düsseldorf'
print city, type(city), repr(city)
# Düsseldorf <type 'unicode'> u'D\xfcsseldorf'

# Block 2
import redis
r_server = redis.Redis('localhost')
r_server.set('city', city)
city = r_server.get('city')
print city, type(city), repr(city)
# Düsseldorf <type 'str'> 'D\xc3\xbcsseldorf'

Dogma?

It's not dogmatic why character sets and encodings are used - it's a necessity. Hopefully, you will have read enough to understand why we have so many character sets in use. Unicode is obviously the way forward (having all characters mapped), but how do you transfer a Unicode character from one machine to another, or save it to disk?

We could use the Unicode point value, but as Unicode points are effectively 32bits, each character would need to be saved/transferred as the whole 32bits (aka UTF-32). a would be encoded as 0x00000061 - that's a lot of wasted bits just for one character. UTF-16 is a little less wasteful when dealing with mostly ASCII, but UTF-8 is the best compromise by using the least amount of bits.

Using decoded Unicode within code obviously frees developers from having to consider the intricacies of encoding, such as how many bytes equal a character.

Solutions

Redis Client

As suggested by @JFSebastian, the redis-py driver includes a decode_responses option on the Redis and Connection classes. When set to True the client will decode the responses using the encoding option. By default encoding = utf-8 .

Eg

r_server = redis.Redis('localhost', decode_responses=True)
city = r_server.get('city')
# city = <type 'unicode'>

Wrapper Class

No longer required since discovery of decode_responses .

It would appear that the Redis driver is rather simplistic - it so happens that if you send a Unicode it'll convert it to the default encoding (UTF-8 is most cases). On response, Redis doesn't know the encoding so returns an str for you to decode as appropriate.

Therefore, if would be safer to encode your strings to UTF-8 before sending to Redis and decode as UTF-8 on response. Other DB drivers are more advanced, so receive and return Unicodes.

But of course, you shouldn't be peppering your code with .encode() and .decode() . The common approach is to form "Unicode sandwiches", so that external data is decoded to Unicode on input and encoded on output. So how does that work for you? Wrap the Redis driver so that it returns what you want, thereby pushing the decoding back into the periphery of your code.

For example, it should be as simple as:

 
 
 
  
  class UnicodeRedis(redis.Redis): def __init__(self, *args, **kwargs): if "encoding" in kwargs: self.encoding = kwargs["encoding"] else: self.encoding = "utf-8" super(UnicodeRedis, self).__init__(*args, **kwargs) def get(self, *args, **kwargs): result = super(UnicodeRedis, self).get(*args, **kwargs) if isinstance(result, str): return result.decode(self.encoding) else: return result
 
  

You can then interact with it as normal except that you can pass an encoding argument that changes how strings are decoded. If you don't set encoding , this code will assume utf-8 .

Eg

 
 
 
  
  r_server = UnicodeRedis('localhost') city = r_server.get('city')
 
  

正如JF Sebastian所说,redis-py API通过在redis.Redis类的init方法中设置decode_response=True来支持解码对unicode的响应。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM