简体繁体 English

如何构建一个灵活的计数器，其中包含1000多行但在Google App Engine中读取的内容很少？

[英]How do I build a flexible counter with 1000+ rows but few reads in Google App Engine?

原文 2010-06-29 00:24:15 4 2 python/ google-app-engine/ count/ google-cloud-datastore/ counter

I have a list of users that only administrators can see (= few reads). 我有一个只有管理员可以看到的用户列表（=几个读取）。 This list also displays a count of the number of users in the datastore. 此列表还显示数据存储中的用户数。 Because the list could grow larger than 1000 my first thought was to avoid a normal count() and instead use a sharded counter. 因为列表可能会大于1000，我首先想到的是避免正常计数（）而是使用分片计数器。

However, the problem is that the admins also have access to various search filters (in the GUI), such as only viewing male/female users and so on. 然而，问题在于管理员还可以访问各种搜索过滤器（在GUI中），例如仅查看男性/女性用户等。 It's important that the count reflects these filters, so that they can get the number of female users, male users and a myriad of other combinations. 计数反映这些过滤器非常重要，这样他们就可以获得女性用户，男性用户和无数其他组合的数量。

Because of this, sharded counters and high concurrency counters without sharding don't seem like a good idea, because I would need to create a counter for every combination of search filters. 因此，分片计数器和高分辨率计数器没有分片似乎不是一个好主意，因为我需要为搜索过滤器的每个组合创建一个计数器。

Should I simply create a loop of count() methods, such as described here or is this very bad practice? 我应该简单地创建一个count（）方法循环，如此处所述，还是这种非常糟糕的做法？ How would I do it otherwise? 我怎么办呢？

Note that this counter is for an admin interface and would have a very limited number of reads. 请注意，此计数器用于管理界面，并且读取次数非常有限。 This is really a case of when I would like to sacrifice some read performance for flexibility and accuracy. 这实际上是我希望牺牲一些读取性能以获得灵活性和准确性的情况。 Although it should be able to grow beyond 1000, it's not expected to grow larger than 10 000. 虽然它应该能够增长到1000以上，但预计不会超过10 000。

2 个解决方案

"Loop of counts" is slow, but these days you can make it a bit better with cursors . “计数循环”很慢，但是现在你可以用游标改进它。 Normally I would recommend denormalizing into all the "filtered" counters you need, but that slows down user addition and deletion (and probably demographic changes as well), so, given your particular use case with a very low volume of reads, you can probably get away with the "loop of counts" approach (plus cursors;-). 通常我会建议将所有“过滤”的计数器进行非规范化，但这会减慢用户的添加和删除速度（以及可能的人口统计变化），所以，鉴于您的特定用例具有非常低的读取量，您可能摆脱“计数循环”的方法（加上游标;-)。

I've tried two approaches: 我尝试了两种方法：

1) Write my own task that queries the data store (the query is a key descending query) with a fixed limit of entities (say 50). 1）编写我自己的查询数据存储的任务（查询是一个关键的降序查询），具有固定的实体限制（比如50）。 It then enqueues the next task to start querying where it left off. 然后它将下一个任务排队以开始查询它停止的位置。 Each task enqueues the next one passing it two parameters (where it last left off like a cursor and a running total of the number of entities it has seen). 每个任务都会将下一个任务排入队列，并将两个参数传递给它（它最后一个像光标一样离开的位置和它所看到的实体数量的总和）。

2) This approach is much easier - and that is to use the mapreduce library provided by google for appengine. 2）这种方法更容易 - 也就是使用谷歌提供的mapreduce库进行appengine。 It runs totally in user space so you just have to download and build the library and include it in your project. 它完全在用户空间中运行，因此您只需下载并构建库并将其包含在项目中。 Basically, it will handle iterating through all the entities you specify and lets you write a handler for what to do with each one (like incrementing a counter). 基本上，它将处理您指定的所有实体的迭代，并允许您编写处理程序以处理每个实体（如递增计数器）。 See the details here: mapreduce.appspot.com - they even have a sample app that does just what you are asking for. 在这里查看详细信息：mapreduce.appspot.com - 他们甚至有一个样本应用程序可以满足您的要求。 THe only problem with this is that the results will appear in your browser and not necessarily stored in the datastore unless you do that yourself. 唯一的问题是结果将出现在您的浏览器中，并不一定存储在数据存储区中，除非您自己这样做。