简体   繁体   English

在App Engine中查询Data Store的最有效方法

[英]Most efficient way to query Data Store in App Engine

I have a data store with about 150,000 entities in it. 我有一个数据存储,其中包含大约150,000个实体。 When I query the store using filters, my queries are REALLY slow. 当我使用过滤器查询商店时,我的查询真的很慢。 My structure is completely flat, ie every entity is a sibling of every other. 我的结构是完全平坦的,即每个实体都是彼此的兄弟。

1: Is it better to use GQL instead of filters? 1:使用GQL代替过滤器更好吗?

2: Is this not the best use-case for Data Store, and should I use a SQL database instead? 2:这不是Data Store的最佳用例,我应该使用SQL数据库吗?

Here's an example of my code: 这是我的代码示例:

// Look for a buy opportunity
dateFilter = new FilterPredicate("date", FilterOperator.EQUAL, dt);
scoreFilter = new FilterPredicate("score", FilterOperator.LESS_THAN_OR_EQUAL, 10.0);
safetyFilter = new FilterPredicate("score", FilterOperator.GREATER_THAN_OR_EQUAL, -1.0);
mainFilter = CompositeFilterOperator.and(dateFilter,scoreFilter,safetyFilter);
q = new Query("StockEntity",stockKey).setFilter(mainFilter);
q.addSort("score", Query.SortDirection.ASCENDING);

stocks = datastore.prepare(q).asList(FetchOptions.Builder.withLimit(availableSlots));

Some more details: 更多细节:

  1. 150,000ish records, divided amongst 500 stocks, so about 300 records per stock, one for each day in a date range. 150,000个记录,分为500个股票,每个股票约300个记录,日期范围内每天一个。

  2. Query like that above, where a specific date is passed in, and the 500 stocks are effectively filtered based on a 'score', with the number of records desired to return is between 10 and 20 takes upwards of 30 seconds to complete, on my development machine. 如上所述查询,其中传递了特定日期,并且基于“得分”有效地过滤了500个股票,期望返回的记录数量在10到20之间需要超过30秒才能完成,在我的开发机器。

Haven't tried pushing to production yet, but I guess I will try that next -- I figured that there wouldn't be a huge difference. 还没有尝试推动生产,但我想我会尝试下一步 - 我认为不会有巨大的差异。 My dev machine is quite a high spec iMac. 我的开发机器是一个相当高的规格iMac。

https://developers.google.com/appengine/docs/java/datastore/queries#Java_Restrictions_on_queries https://developers.google.com/appengine/docs/java/datastore/queries#Java_Restrictions_on_queries

Inequality filters are limited to at most one property 不等式过滤器最多只能限制一个属性

To avoid having to scan the entire index table, the query mechanism relies on all of a query's potential results being adjacent to one another in the index. 为了避免必须扫描整个索引表,查询机制依赖于所有查询的潜在结果在索引中彼此相邻。 To satisfy this constraint, a single query may not use inequality comparisons (LESS_THAN, LESS_THAN_OR_EQUAL, GREATER_THAN, GREATER_THAN_OR_EQUAL, NOT_EQUAL) on more than one property across all of its filters. 为了满足此约束,单个查询可能不会在其所有过滤器上的多个属性上使用不等式比较(LESS_THAN,LESS_THAN_OR_EQUAL,GREATER_THAN,GREATER_THAN_OR_EQUAL,NOT_EQUAL)。 For example, the following query is valid, because both inequality filters apply to the same property: 例如,以下查询有效,因为两个不等式过滤器都适用于同一属性:

Short answer is that you really can't quite do what you want with data store. 简短的回答是你真的无法用数据存储做你想做的事。

First up, that query will run faster on the actual Datastore. 首先,该查询将在实际数据存储上运行得更快。

  1. Using GQL or Filters is basically the same. 使用GQL或过滤器基本相同。

  2. When using the Datastore you should first define the functionality you need. 使用数据存储区时,首先应定义所需的功能。 For example: You want to show a list of stocks with a specific order and filters. 例如:您想要显示具有特定订单和过滤器的股票列表。 Now look at any other views of the same data that your app needs. 现在查看您的应用所需的相同数据的任何其他视图。 Then decide how the data should be structured. 然后决定如何构建数据。

This is very different from an RDBMS where the database can often accommodate most functionality without changing the data model and the data is modeled in a more 'generic' way (normalization). 这与RDBMS非常不同,在RDBMS中,数据库通常可以容纳大多数功能而无需更改数据模型,并且数据以更“通用”的方式建模(规范化)。

In general, the Datastore's read performance will be optimal if you know the KEY of whatever it is you want to read and it will perform at it's worst when doing queries since that always requires an index 'scan'. 通常,如果您知道要读取的任何内容的KEY,那么数据存储区的读取性能将是最佳的,并且在执行查询时它将执行最差的操作,因为它始终需要索引“扫描”。

Knowing this, I tend to use the Ancestor relationship often. 知道了这一点,我倾向于经常使用祖先的关系。 Requesting the 'children' of an Ancestor seems to perform better and is Consistent. 要求祖先的“孩子”似乎表现得更好并且是一致的。 For example, I use a query like: 例如,我使用如下查询:

SELECT * WHERE ANCESTOR IS {key}

Where {key} is the key of the ancestor (or 'parent'). 其中{key}是祖先(或“父”)的关键。 This query returns the ancestor entity and all entities that have this ancestor key in their paths. 此查询返回祖先实体以及在其路径中具有此祖先键的所有实体。 On rare occasions I use one of the filters as a parent 'value' to group objects but be careful, a key is not changeable once the entity is written (you can change the key, but it will result in a copy). 在极少数情况下,我使用其中一个过滤器作为父“值”来对对象进行分组但要小心,一旦写入实体,密钥就不会改变(您可以更改密钥,但会产生副本)。

Also, if you know the average size of a 'set'. 此外,如果你知道'集'的平均大小。 For example, Orderlines that belong to an Order. 例如,属于订单的订单行。 You could choose to keep track of each Orderline key somewhere. 您可以选择在某处跟踪每个Orderline键。 Requesting the first 20 keys in a batched read is a fast operation. 在批量读取中请求前20个键是快速操作。 This is basically the same as indexing, however the ordering and filtering could be done at 'write time' so your list only contains keys that match your filters. 这与索引基本相同,但是排序和过滤可以在“写入时间”完成,因此您的列表仅包含与过滤器匹配的键。

Avoid creating views that allow users to 'dynamically' select filters. 避免创建允许用户“动态”选择过滤器的视图。

How to optimize further: 1. Use denormalization to minimize the number of lookups or queries. 如何进一步优化:1。使用非规范化来最小化查找或查询的数量。 2. Cache (Memcache) where you can. 2.尽可能缓存(Memcache)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM