简体繁体 English

Google Cloud DataStore。如何提供数据？

[英]Google Cloud DataStore. How to serve data?

原文 2016-04-29 00:12:56 2 2 google-app-engine/ google-cloud-datastore/ nosql-aggregation/ nosql

Like many, I'm no new the NoSQL world. 与许多人一样，我不是NoSQL的新世界。 I did a lot of research, but I still lack only one point, which I can't find proper answer for. 我做了很多研究，但是我仍然只缺少一点，我找不到合适的答案。

Short description of system: 系统简短说明：

I'm building a system that collects Visitor's data on different websites. 我正在建立一个在不同网站上收集访客数据的系统。 Each visit is an Entity in the datastore, with properties like device type, IP, time of visit..etc. 每次访问都是数据存储区中的实体，具有设备类型，IP，访问时间等属性。

There will be millions of visits in the datastore. 数据存储中将有数百万的访问。

My Question, is how do I serve this data to clients. 我的问题是，如何将这些数据提供给客户。 My Data is setting in the datastore as "Visit" entities. 我的数据在数据存储区中设置为“访问”实体。

Now when a customer logs in, I don't want to show them millions of records. 现在，当客户登录时，我不想向他们显示数百万条记录。 I want for example to show them general stats. 例如，我想向他们显示常规统计信息。 Like number of visits on mobile device, number of visits from specific country in some time range, and stuff like that. 就像在移动设备上的访问次数，在某个时间范围内来自特定国家/地区的访问次数之类。

Now since I'm new to the NoSQL databases, I'm not sure how I should go around showing these stats in the clients' dashboard. 现在，由于我是NoSQL数据库的新手，所以我不确定如何在客户端的仪表板中显示这些统计信息。

As I know, Datastore has no support for aggregates, or getting count of query results for example. 据我所知，Datastore不支持聚合，例如不支持查询结果计数。

I looked at BigQuery, but BigQuery works on Datastore "backups", I need to serve data in real time, without needing to do backups manually. 我查看了BigQuery，但是BigQuery可以处理数据存储“备份”，我需要实时提供数据，而无需手动进行备份。

Also I read about counters, and sharding counters, is this the proper approach? 我还阅读了有关计数器和分片计数器的信息，这是正确的方法吗？ have a counter for each client for each property for each tracking group? 每个跟踪组的每个属性的每个客户都有一个计数器？ and show the total numbers this way? 并以这种方式显示总数？ Sounds like too much for a simple purpose. 听起来太简单了。

Any input or explanation that can get me in the right direction would be highly appreciated. 任何能使我朝正确方向发展的建议或解释，将不胜感激。

Best Regards 最好的祝福

2 个解决方案

As I know, Datastore has no support for aggregates, or getting count of query results for example. 据我所知，Datastore不支持聚合，例如不支持查询结果计数。

This is not true. 这不是真的。 You can get a number of entities returned by a query with one line of code. 您可以使用一行代码获得查询返回的许多实体。 The query itself can be keys-only, which is very fast and basically free. 查询本身可以是仅键的，这非常快并且基本上是免费的。

Yes, counters are a good approach to your problem in terms of performance. 是的，就性能而言，计数器是解决您的问题的好方法。 They do have some downsides though, such as storage size and the fact that each time you would like to introduce a new type of statistic, you would need to create a counter for it. 但是它们确实有一些缺点，例如存储大小以及每次您想引入一种新的统计信息时都需要为其创建计数器的事实。

In addition to your current "Visit" entities, you could opt for storing the aggregated data in Sharded Counters in the Datastore. 除了当前的“访问”实体，您还可以选择将聚合数据存储在数据存储区的分片计数器中。 These counters can be updated in real-time, or via a Task in one of your task queues. 这些计数器可以实时更新，也可以通过一个任务队列中的任务进行更新。 It would be fairly straight-forward to create a Task that would create the various counters for the current Visit entities. 创建一个Task来为当前Visit实体创建各种计数器将非常简单。

Sharding is a way of creating multiple "underlying" entities that, when combined, represent some meaningful data. 分片是一种创建多个“基础”实体的方法，这些实体组合在一起时将代表一些有意义的数据。 Sharding is done to ensure that there are no performance issues due to concurrent updates. 进行分片以确保没有由于并发更新引起的性能问题。

From the Google Documentation: 从Google文档中：

If you had a single entity that was the counter and the update rate was too fast, then you would have contention as the serialized writes would stack up and start to timeout. 如果您只有一个实体作为计数器，并且更新速率太快，那么您将产生争执，因为序列化的写操作将堆积起来并开始超时。 The way to solve this problem is a little counter-intuitive if you are coming from a relational database; 如果您来自关系数据库，则解决此问题的方法有点违反直觉。 the solution relies on the fact that reads from the App Engine datastore are extremely fast and cheap. 该解决方案依赖于这样一个事实，即从App Engine数据存储区读取数据的速度非常快且便宜。 The way to reduce the contention is to build a sharded counter – break the counter up into N different counters. 减少争用的方法是建立一个分片计数器-将计数器分成N个不同的计数器。 When you want to increment the counter, you pick one of the shards at random and increment it. 当您想增加计数器时，可以随机选择其中一个碎片并对其进行递增。 When you want to know the total count, you read all of the counter shards and sum up their individual counts. 当您想知道总数时，您可以阅读所有计数器分片并汇总其各自的计数。 The more shards you have, the higher the throughput you will have for increments on your counter. 分片越多，计数器增加的吞吐量就越高。 This technique works for a lot more than just counters and an important skill to learn is spotting the entities in your application with a lot of writes and then finding good ways to shard them. 该技术的作用不仅限于计数器，还需要学习的一项重要技能是，通过大量编写来发现应用程序中的实体，然后找到分片的好方法。