简体   繁体   中英

Query by non-ascii characters

I am using Python, on Google App Engine platform. Let's say I have in my Data Store the following code :

class names(db.Model):
    name = db.StringProperty(multiline=True)

and there are names like :

name1 = Beyoncé
name2 = El Súper Clásico

with non-ascii charachters.

When I make a query like :

q_1 = names.all().filter('name =', name1)

It doesn't work, the comparison is wrong.

Do you have any idea how can I solve this problem? I tried encoding the "name" to UTF-8, but it didn't work also.

There should be no problems with exact matches when correctly decoding input strings (that you get from web request parameters) and correctly encoding output strings (that you save in GAE data storage) in Unicode.

I've tried this snippet in the GAE SDK Interactive Console and it works:

from google.appengine.ext import db

class names(db.Model):
    name = db.StringProperty(multiline=True)

some_name = 'Beyonc\xc3\xa9'.decode('utf-8')
    # same as: some_name = u'Beyoncé'
    # same as: some_name = u'Beyonc\u00e9'

n = names(name=some_name)
n.put()

q = names.all().filter('name =', some_name)
print q.get().name.encode('utf-8')
    # prints Beyoncé

You should debug what is the raw value of the strings you are comparing, ie, the string saved in the storage and the string passed to the query.

I recommend reading this article about Unicode by Joel Spolsky and the Python Unicode HOWTO if you're not familiar with handling Unicode strings.

In addition to this, if you're running search queries that should match Unicode characters like u'é' when input is 'e' , consider comparing normalized strings:

some_name = u'El S\u00faper Cl\u00e1sico' # El Súper Clásico
normalized_name = unicodedata.normalize('NFKD', some_name).encode('ascii', 'ignore') # El Super Clasico

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM