简体   繁体   中英

AppEngine datastore 'ends with' query

On this answer: https://stackoverflow.com/a/1554837/1135424 I found that an ' starts with ' can be done using something like:

MyModel.all().filter('prop >=', prefix).filter('prop <', prefix + u'\ufffd')

It mentions that for doing an ' ends with ' query would require storing the reverse of the string, then applying the same tactic as above.

So for example, if my current data are domain strings, something like:

domains | reverse_domain
------- | --------------
.com.io | oi.moc.
.com.eu | ue.moc.
.com.mx | xm.moc.

If I want to query for domains ending with '.io' I should do:

suffix = '.io'
MyModel.all().filter(
    'reverse_domain >=', suffix).filter(
    'reserve_domain <', suffix + u'\ufffd')

But when testing, doing a string comparison on a python command line i get this:

>>> '.com.io'[::-1] >= '.io'
True
>>> '.com.io'[::-1] < '.io' +  u'\ufffd'
False

Changing the order, first u'\�' next the suffix

>>> '.com.io'[::-1] < u'\ufffd' + '.io'
True

So wondering if when doing an ' ends with ', besides reversing the order of the stored data, the u'\�' should go first, something like this:

MyModel.all().filter(
    'reverse_prop >=', suffix).filter(
    'reverse_prop <', u'\ufffd' + suffix)

Does the datastore filter follows the same lexicographical ordering that python does when comparing strings?

Basically how to do an:

SELECT domain FROM domains WHERE <domain name> LIKE CONCAT('%', domain)

For example, If I search for google.com.io , I could get the domain ' .com.io ', so, how to get a list of existing domains/strings that end with something?

Update :

While testing seems that I only need to change the operator >= to <= , that gives me the LIKE '%string' :

suffix = '.io'[::-1]
MyModel.all().filter(
    'reverse_domain <=', suffix).filter(
    'reserve_domain <', suffix + u'\ufffd')

If I want to search if a string ends with some record that I already have:

>>> assert('.com.io'[::-1] <= '.com.io'[::-1] and '.com.io'[::-1] < '.com.io'[::-1] + u'\ufffd')

>>> assert('.com.io'[::-1] <= 'google.com.io'[::-1] and '.com.io'[::-1] < 'google.com.io'[::-1] + u'\ufffd') 

>>> assert('.com.io'[::-1] <= 'gle.com.io'[::-1] and '.com.io'[::-1] < 'gle.com.io'[::-1] + u'\ufffd')

If your use case involves searching for top-level domains, I would recommend splitting the URL into two separate properties. That will make it easy to find all records with a given TLD, and allow more flexibility for other searches.

You may also consider using an integer to represent each TLD, if you have millions of records. It will reduce the size of the data.

This approach may also be a little faster as you use a single equality filter instead of two inequality filters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM