简体   繁体   English

AppEngine数据存储区“以”结尾

[英]AppEngine datastore 'ends with' query

On this answer: https://stackoverflow.com/a/1554837/1135424 I found that an ' starts with ' can be done using something like: 关于这个答案: https : //stackoverflow.com/a/1554837/1135424我发现可以使用类似以下内容来完成'以' 开头

MyModel.all().filter('prop >=', prefix).filter('prop <', prefix + u'\ufffd')

It mentions that for doing an ' ends with ' query would require storing the reverse of the string, then applying the same tactic as above. 它提到进行“ 以...结尾 ”查询将需要存储字符串的反向,然后应用与上述相同的策略。

So for example, if my current data are domain strings, something like: 因此,例如,如果我当前的数据是域字符串,则类似于:

domains | reverse_domain
------- | --------------
.com.io | oi.moc.
.com.eu | ue.moc.
.com.mx | xm.moc.

If I want to query for domains ending with '.io' I should do: 如果我想查询以“ .io”结尾的域名,我应该这样做:

suffix = '.io'
MyModel.all().filter(
    'reverse_domain >=', suffix).filter(
    'reserve_domain <', suffix + u'\ufffd')

But when testing, doing a string comparison on a python command line i get this: 但是在测试时,在python命令行上进行字符串比较时,我得到以下信息:

>>> '.com.io'[::-1] >= '.io'
True
>>> '.com.io'[::-1] < '.io' +  u'\ufffd'
False

Changing the order, first u'\�' next the suffix 更改顺序,先u'\\ ufffd' 后缀

>>> '.com.io'[::-1] < u'\ufffd' + '.io'
True

So wondering if when doing an ' ends with ', besides reversing the order of the stored data, the u'\�' should go first, something like this: 因此,想知道是否在执行' ' 结尾 '时,除了反转存储数据的顺序之外,还应该首先使用u'\\ ufffd' ,如下所示:

MyModel.all().filter(
    'reverse_prop >=', suffix).filter(
    'reverse_prop <', u'\ufffd' + suffix)

Does the datastore filter follows the same lexicographical ordering that python does when comparing strings? 在比较字符串时, 数据存储区过滤器是否遵循python相同的词典顺序?

Basically how to do an: 基本上该怎么做:

SELECT domain FROM domains WHERE <domain name> LIKE CONCAT('%', domain)

For example, If I search for google.com.io , I could get the domain ' .com.io ', so, how to get a list of existing domains/strings that end with something? 例如,如果我搜索google.com.io ,则可以获取域“ .com.io ”,那么,如何获取以某些结尾的现有域/字符串列表?

Update : 更新

While testing seems that I only need to change the operator >= to <= , that gives me the LIKE '%string' : 尽管测试似乎只需要将运算符> =更改为<= ,这给了我LIKE'%string'

suffix = '.io'[::-1]
MyModel.all().filter(
    'reverse_domain <=', suffix).filter(
    'reserve_domain <', suffix + u'\ufffd')

If I want to search if a string ends with some record that I already have: 如果我想搜索字符串是否以我已经拥有的某些记录结尾:

>>> assert('.com.io'[::-1] <= '.com.io'[::-1] and '.com.io'[::-1] < '.com.io'[::-1] + u'\ufffd')

>>> assert('.com.io'[::-1] <= 'google.com.io'[::-1] and '.com.io'[::-1] < 'google.com.io'[::-1] + u'\ufffd') 

>>> assert('.com.io'[::-1] <= 'gle.com.io'[::-1] and '.com.io'[::-1] < 'gle.com.io'[::-1] + u'\ufffd')

If your use case involves searching for top-level domains, I would recommend splitting the URL into two separate properties. 如果您的用例涉及搜索顶级域,则建议将URL分为两个单独的属性。 That will make it easy to find all records with a given TLD, and allow more flexibility for other searches. 这样可以轻松查找具有给定TLD的所有记录,并为其他搜索提供更大的灵活性。

You may also consider using an integer to represent each TLD, if you have millions of records. 如果您有数百万条记录,则也可以考虑使用整数来表示每个TLD。 It will reduce the size of the data. 它将减少数据的大小。

This approach may also be a little faster as you use a single equality filter instead of two inequality filters. 当您使用单个相等过滤器而不是两个不相等过滤器时,此方法也可能会更快一些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM