简体   繁体   中英

Most efficient way to query Data Store in App Engine

I have a data store with about 150,000 entities in it. When I query the store using filters, my queries are REALLY slow. My structure is completely flat, ie every entity is a sibling of every other.

1: Is it better to use GQL instead of filters?

2: Is this not the best use-case for Data Store, and should I use a SQL database instead?

Here's an example of my code:

// Look for a buy opportunity
dateFilter = new FilterPredicate("date", FilterOperator.EQUAL, dt);
scoreFilter = new FilterPredicate("score", FilterOperator.LESS_THAN_OR_EQUAL, 10.0);
safetyFilter = new FilterPredicate("score", FilterOperator.GREATER_THAN_OR_EQUAL, -1.0);
mainFilter = CompositeFilterOperator.and(dateFilter,scoreFilter,safetyFilter);
q = new Query("StockEntity",stockKey).setFilter(mainFilter);
q.addSort("score", Query.SortDirection.ASCENDING);

stocks = datastore.prepare(q).asList(FetchOptions.Builder.withLimit(availableSlots));

Some more details:

  1. 150,000ish records, divided amongst 500 stocks, so about 300 records per stock, one for each day in a date range.

  2. Query like that above, where a specific date is passed in, and the 500 stocks are effectively filtered based on a 'score', with the number of records desired to return is between 10 and 20 takes upwards of 30 seconds to complete, on my development machine.

Haven't tried pushing to production yet, but I guess I will try that next -- I figured that there wouldn't be a huge difference. My dev machine is quite a high spec iMac.

https://developers.google.com/appengine/docs/java/datastore/queries#Java_Restrictions_on_queries

Inequality filters are limited to at most one property

To avoid having to scan the entire index table, the query mechanism relies on all of a query's potential results being adjacent to one another in the index. To satisfy this constraint, a single query may not use inequality comparisons (LESS_THAN, LESS_THAN_OR_EQUAL, GREATER_THAN, GREATER_THAN_OR_EQUAL, NOT_EQUAL) on more than one property across all of its filters. For example, the following query is valid, because both inequality filters apply to the same property:

Short answer is that you really can't quite do what you want with data store.

First up, that query will run faster on the actual Datastore.

  1. Using GQL or Filters is basically the same.

  2. When using the Datastore you should first define the functionality you need. For example: You want to show a list of stocks with a specific order and filters. Now look at any other views of the same data that your app needs. Then decide how the data should be structured.

This is very different from an RDBMS where the database can often accommodate most functionality without changing the data model and the data is modeled in a more 'generic' way (normalization).

In general, the Datastore's read performance will be optimal if you know the KEY of whatever it is you want to read and it will perform at it's worst when doing queries since that always requires an index 'scan'.

Knowing this, I tend to use the Ancestor relationship often. Requesting the 'children' of an Ancestor seems to perform better and is Consistent. For example, I use a query like:

SELECT * WHERE ANCESTOR IS {key}

Where {key} is the key of the ancestor (or 'parent'). This query returns the ancestor entity and all entities that have this ancestor key in their paths. On rare occasions I use one of the filters as a parent 'value' to group objects but be careful, a key is not changeable once the entity is written (you can change the key, but it will result in a copy).

Also, if you know the average size of a 'set'. For example, Orderlines that belong to an Order. You could choose to keep track of each Orderline key somewhere. Requesting the first 20 keys in a batched read is a fast operation. This is basically the same as indexing, however the ordering and filtering could be done at 'write time' so your list only contains keys that match your filters.

Avoid creating views that allow users to 'dynamically' select filters.

How to optimize further: 1. Use denormalization to minimize the number of lookups or queries. 2. Cache (Memcache) where you can.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM